PQN and DQN: Algorithms for expression microarrays

doi:10.1016/j.jtbi.2006.06.017

Journal of Theoretical Biology

Volume 243, Issue 2, 21 November 2006, Pages 273-278

https://doi.org/10.1016/j.jtbi.2006.06.017 Get rights and content

Abstract

An ideal expression algorithm should be able to tell truly different expression levels with small false positive errors and be robust to assay changes. We propose two algorithms. PQN is the non-central trimmed mean of perfect match intensities with quantile normalization. DQN is the non-central trimmed mean of differences between perfect match and mismatch intensities with quantile normalization. The quantiles for normalization can be either empirical or theoretical. When array types and/or assay change in a study, the normalization to common quantiles at the probe set level is essential. We compared DQN, PQN, RMA, GCRMA, DCHIP, PLIER and MAS5 for the Affymetrix Latin square data and our data of two sets of experiments using the same bone marrow but different types of microarrays and different assay. We found the computation for AUC of ROC at affycomp.biostat.jhsph.edu can be improved.

Introduction

Bioinformatics plays an important role in biomedical research (Fodor et al., 1991, Chou, 2004). Oligonucleotide microarrays are widely used to detect differential RNA expression and genotypes or mutations in DNA (Lockhart et al., 1996, Hu et al., 2001, Kennedy et al., 2003, Di et al., 2005). The strengths of microarray technology are its high throughput and the small amount of required sample material. Its limitations include signal variations and the relatively narrow dynamic range in comparison with PCR-based sequencing technology. Therefore, there is constant interest in improving its sensitivity, specificity and reproducibility. Affymetrix Microarray Suite 5 (MAS5) provided signals for a single microarray (Hubbell et al., 2002). Li and Wong, 2001a, Li and Wong, 2001b first proposed to use multiple microarrays to obtain reliable estimates of expression signals. Irizarry et al., 2003a, Irizarry et al., 2003b provided the robust multi-array analysis (RMA) based on median polish for perfect match probe data. Bolstad et al. (2003) suggested the quantile normalization to change the intensity distributions of all microarrays in a study to a common distribution for better comparative analysis. Wu et al. (2004) proposed to use sequence information to make background adjustment (GCRMA). Hubbell (2003) proposed the probe logarithmic intensity error estimation (PLIER) algorithm based on minimization of a special error function to reduce the bias at the low intensity end (Affymetrix, 2003).

MAS5 is based on the biweight estimation of the differences between perfect match intensities (PM) and mismatch intensities (MM). Model-based approaches based on PM intensities such as RMA show smaller variance and some other performance better than MAS5. DCHIP and GCRMA can use either the PM intensities or the differences between PM and MM intensities.

With a simple descriptive statistic on PM intensities and a quantile normalization at the probe set level, PQN yields area under the curve (AUC) of receiver operation characteristic (ROC) similar to RMA, but smaller than GCRMA for the HG-U95A Latin square data set (Table 1), and it gives the largest AUC of ROC in the mid and high concentration ranges for the HG-U133A_tag Latin square data set (Table 2). Moreover, for data including different types of microarrays and different assay, most PM–MM based algorithms such as DQN show smaller variations than the corresponding PM based algorithms.

We also provide a new algorithm for computation of the AUC of ROC for the Latin square data.

Section snippets

Methods and results

PQN and DQN use a non-central trimmed mean, i.e. the mean between the 40th percentile and the 90th percentile. The higher-end non-central trimmed mean can eliminate the influence of low intensity probes, for example, the probes selected improperly in the microarray design stage due to incorrect information in the public genomic databases. Not using intensities over the 90th percentile helps reducing the influence of outliers due to cross hybridization or white image blemishes. We also tried

Algorithms

Let ${PM}_{ik}$ and ${MM}_{ik}$ be the raw intensities of the ith pm cell and mm cell for probe set (gene) k. The raw intensity is a real number in the interval $[1, I_{\max}]$ . The intensity upper bound $I_{\max}$ is dependent on the scanner. For PQN signals, We define the background, B, as the trimmed mean of the lowest 2% intensities of PM probes. We also define the lower bound, L, as half of the standard deviation of the lowest 2% PM intensities. They are very similar to MAS5 except that those for MAS5 are location

Discussion

From Tables 1 and 2, we can see that for the Latin square data sets, where the assay and microarray type do not change, algorithms based only on PM probes (DCHIP, RMA, GCRMA and PQN) usually have larger AUC of ROC than algorithms based on PM–MM. However, Fig. 1 shows that for identical probes on different microarrays using different assay, the algorithms based on PM–MM can show smaller variations (DQNB vs. PQNB and GCRMAdQN vs. GCRMAQN). The transformed signal DQNBT is only used for comparison

Acknowledgments

We thank Anton Belooussov, Laurent Essioux, Grant Hillman, Walter Koch, Friedmann Krause, Aki Nakao, Sunhee K. Ro and Guido Steiner for helpful discussions. We thank referees for their critics and constructive suggestions.

References (19)

Affymetrix
Guide to Probe Logarithmic Intensity Error Estimation (PLIER)
(2003)
B.M. Bolstad et al.
A comparison of normalization methods for high density oligonucleotide array data based on variance and bias
Bioinformatics
(2003)
K.-C. Chou
Structural bioinformatics and its impact to biomedical science
Current Medicinal Chemistry
(2004)
L.M. Cope et al.
A benchmark for Affymetrix GeneChip expression measures
Bioinformatics
(2004)
X. Di et al.
Dynamic model based algorithms for screening and genotyping over 100 K SNPs on oligonucleotide microarrays
Bioinformatics
(2005)
S.P. Fodor et al.
Light-directed, spatially addressable parallel chemical synthesis
Science
(1991)
D.M. Green et al.
Signal Detection Theory and Psychophysics
(1966)
G.K. Hu et al.
Predicting splice variant from DNA chip expression data
Genome Res.
(2001)
Hubbell, E., 2003. Some M-estimates for expression analysis. Affymetrix GeneChip microarray low-level workshop....

There are more references available in the full text version of this article.

Cited by (30)

Multi-agent deep reinforcement learning strategy for distributed energy
2021, Measurement: Journal of the International Measurement Confederation
Citation Excerpt :
Therefore, the neural network is used to fit the value function Q(s,a;θ)≈Q'(s,a) instead of the look-up table in traditional reinforcement learning, which speeds up the convergence while ensuring the control accuracy. In ref. [20], a deep Q network (DQN) is proposed, and the state dimension disaster problem is solved using the neural network approximation method instead of a lookup table. However, the simulation results show that the neural network is unstable and has a slow convergence speed, so a satisfactory AGC effect is not usually achieved.
The strong random disturbance issues caused by the large-scale grid connections of distributed energy, such as wind energy, photovoltaic energy storage and electric vehicles, must be resolved. In this paper, we propose a Multi-agent deep reinforcement learning strategy, namely DDQN-CDP, which deeply integrate the improved actor-critic strategy with the neural network. This approach also solves the problem of the lack of continuous action controlling ability of traditional deep reinforcement learning, and obtains an optimal solution by multi-region collaboration. By simulating the modified IEEE standard two-area load frequency control power system model and Hubei power grid model, our results indicate that the proposed strategy can solve the strong random disturbance problem caused by the large-scale grid connections of distributed energy and achieve faster convergence and better control performance than other strategies.
Molecular Subtyping of Diffuse Large B-Cell Lymphoma Using a Novel Quantitative RT-PCR Assay
2021, Journal of Molecular Diagnostics
Citation Excerpt :
Raw microarray data were preprocessed by using an internal software tool developed at Roche Molecular Systems Inc. Specifically, a special DQN (differences between perfect match and mismatch intensities with quantile normalization) signal used the quantities of β distribution with P = 1.2 and q = 3 for normalization.21 DLBCL subtype classification was determined by using a slightly modified RMSG100 algorithm (Roche Molecular Systems) that combines expression of 100 genes into a linear predictor score and assigns subtypes.22,23
Diffuse large B-cell lymphoma (DLBCL) is a heterogeneous disease. Cell-of-origin classification in DLBCL has identified activated B cell (ABC) and germinal center B cell (GCB) as two major subtypes. Patients with the ABC subtype show reduced overall survival with standard therapies. Development of a quantitative RT-PCR–based lymphoma cell-of-origin (LCOO) assay to determine ABC, GCB, and unclassifiable subtypes in formalin-fixed, paraffin-embedded tissue (FFPET) DLBCL samples is reported. The LCOO classifier was trained on two DLBCL cohorts with validation performed by using an analytical grade assay in an independent cohort of 60 FFPET DLBCL samples. In the validation cohort, LCOO classification was 88.1%, 84.7%, and 84.7% concordant with microarray, immunohistochemistry (Hans classification), and Lymphoma Subtyping Test, respectively. Importantly, LCOO and Lymphoma Subtyping Test assays commonly assigned subtypes in 17 (94.4%) of 18 ABC samples and 34 (89.5%) of 38 GCB DLBCL samples from this cohort. Progression-free survival and overall survival of ABC and GCB subtypes, as classified by all platforms, were not significantly different in the validation cohort. LCOO classification using publicly available microarray gene expression from two independent data sets (414 fresh frozen and 474 FFPET DLBCL biopsies) revealed a significantly worse outcome for the ABC subtype compared with that of the GCB subtype. Thus, a sensitive, reproducible, LCOO assay developed on an easy to standardize quantitative RT-PCR platform may be an important clinical tool for DLBCL cell-of-origin classification.
Application of tools and techniques of Big data analytics for healthcare system
2021, Applications of Big Data in Healthcare: Theory and Practice
The concept of data analysis is essential in making decisions that may affect the way a sector of the society or an industry function. Healthcare data availability has grown manifold over the years, and there is an immense amount of knowledge that can be extracted and used effectively. Furthermore, as with other specific areas of interest, healthcare also requires an exploitation of prior knowledge of the field to assist and enhance the decision making. In the past, various data analysis tools and methods have been adopted to improve the services provided in a plethora of areas. The improvements are in terms of the effectiveness of predictions and inferences drawn so that future usage may be eased. This book chapter is organized in the following way—it begins with an introduction to the affect and amount of changes that are being caused by data analysis in healthcare services followed by a discussion on the importance and motivation to pursue the work. Next, the application of data analysis and an examination of various techniques comprising of machine learning is carried out. However, the scope of this work is not limited to exploring and refining past work but also speculate the future possibilities and provide the readers with substance to further the work that has been performed.
Clinical Significance of PTEN Deletion, Mutation, and Loss of PTEN Expression in De Novo Diffuse Large B-Cell Lymphoma
2018, Neoplasia (United States)
Citation Excerpt :
Single nucleotide polymorphisms documented by the NCBI dbSNP database (build 147) have been excluded. Gene expression profiling was performed by using the Affymetrix GeneChip Human Genome HG-U133 Plus Version 2.0 Array as described previously (GSE31312) [32,42]. Microarray data were normalized for further supervised clustering analysis.
PTEN loss has been associated with poorer prognosis in many solid tumors. However, such investigation in lymphomas is limited. In this study, PTEN cytoplasmic and nuclear expression, PTEN gene deletion, and PTEN mutations were evaluated in two independent cohorts of diffuse large B-cell lymphoma (DLBCL). Cytoplasmic PTEN expression was found in approximately 67% of total 747 DLBCL cases, more frequently in the activated B-cell–like subtype. Nuclear PTEN expression was less frequent and at lower levels, which significantly correlated with higher PTEN mRNA expression. Remarkably, loss of PTEN protein expression was associated with poorer survival only in DLBCL with AKT hyperactivation. In contrast, high PTEN expression was associated with Myc expression and poorer survival in cases without abnormal AKT activation. Genetic and epigenetic mechanisms for loss of PTEN expression were investigated. PTEN deletions (mostly heterozygous) were detected in 11.3% of DLBCL, and showed opposite prognostic effects in patients with AKT hyperactivation and in MYC rearranged DLBCL patients. PTEN mutations, detected in 10.6% of patients, were associated with upregulation of genes involved in central nervous system function, metabolism, and AKT/mTOR signaling regulation. Loss of PTEN cytoplasmic expression was also associated with TP53 mutations, higher PTEN-targeting microRNA expression, and lower PD-L1 expression. Remarkably, low PTEN mRNA expression was associated with down-regulation of a group of genes involved in immune responses and B-cell development/differentiation, and poorer survival in DLBCL independent of AKT activation. Collectively, multi-levels of PTEN abnormalities and dysregulation may play important roles in PTEN expression and loss, and that loss of PTEN tumor-suppressor function contributes to the poor survival of DLBCL patients with AKT hyperactivation.
Evaluation of NF-κB subunit expression and signaling pathway activation demonstrates that p52 expression confers better outcome in germinal center B-cell-like diffuse large B-cell lymphoma in association with CD30 and BCL2 functions
2015, Modern Pathology
Nuclear factor-κB (NF-κB) is a transcription factor with a well-described oncogenic role. Study for each of five NF-κB pathway subunits was only reported on small cohorts in diffuse large B-cell lymphoma (DLBCL). In this large cohort (n=533) of patients with de novo DLBCL, we evaluated the protein expression frequency, gene expression signature, and clinical implication for each of these five NF-κB subunits. Expression of p50, p52, p65, RELB, and c-Rel was 34%, 12%, 20%, 14%, and 23%, whereas p50/p65, p50/c-Rel, and p52/RELB expression was 11%, 11%, and 3%, respectively. NF-κB subunits were expressed in both germinal center B-cell-like (GCB) and activated B-cell-like (ABC) DLBCL, but p50 and p50/c-Rel were associated with ABC-DLBCL. p52, RELB, and p52/RELB expressions were associated with CD30 expression. p52 expression was negatively associated with BCL2 (B-cell lymphoma 2) expression and BCL2 rearrangement. Although p52 expression was associated with better progression-free survival (PFS) (P=0.0170), singular expression of the remaining NF-κB subunits alone did not show significant prognostic impact in the overall DLBCL cohort. Expression of p52/RELB was associated with better overall survival (OS) and PFS (P=0.0307 and P=0.0247). When cases were stratified into GCB- and ABC-DLBCL, p52 or p52/RELB dimer expression status was associated with better OS and PFS (P=0.0134 and P=0.0124) only within the GCB subtype. However, multivariate analysis did not show p52 expression to be an independent prognostic factor. Beneficial effect of p52 in GCB-DLBC appears to be its positive correlation with CD30 and negative correlation with BCL2 expression. Gene expression profiling (GEP) showed that p52⁺ GCB-DLBCL was distinct from p52⁻ GCB-DLBCL. Collectively, our data suggest that DLBCL patients with p52 expression might not benefit from therapy targeting the NF-κB pathway.
Screening features to improve the class prediction of acute myeloid leukemia and myelodysplastic syndrome
2013, Gene
After more than three decades of intensive investigations, the underpinning mechanism of myelodysplastic syndrome (MDS) and acute myeloid leukemia (AML) pathogenesis still remains largely uncharacterized, and their diagnosis relies heavily on the subjective factors. Recently gene expression profiling technique showed significant improvement in classifying some subtypes of AML, but the model's discriminating power of MDS from AML is still in its infancy. Feature selection plays an important role in the classification of the samples on the basis of the gene expression profiles. Our hypothesis explains that a better choice of features could improve the classification of the diseased and normal stage samples, and the potential application of feature screening to produce feature sets, with better accuracies and lowest number of embedded features. The observed results suggest that feature selection proves to be an essential and affirmative step in the biomedical data mining models based on gene expression profiles.

View all citing articles on Scopus

View full text

PQN and DQN: Algorithms for expression microarrays

Abstract

Introduction

Section snippets

Methods and results

Algorithms

Discussion

Acknowledgments

Guide to Probe Logarithmic Intensity Error Estimation (PLIER)

A comparison of normalization methods for high density oligonucleotide array data based on variance and bias

Bioinformatics

Structural bioinformatics and its impact to biomedical science

Current Medicinal Chemistry

A benchmark for Affymetrix GeneChip expression measures

Bioinformatics

Dynamic model based algorithms for screening and genotyping over 100 K SNPs on oligonucleotide microarrays

Bioinformatics

Light-directed, spatially addressable parallel chemical synthesis

Science

Signal Detection Theory and Psychophysics

Predicting splice variant from DNA chip expression data

Genome Res.