Figure 1.
Evidence for confounding in microarray analysis of DBA models. (A) Diagram of hypothetical relationships between exposure, outcome, and putative confounding variable in the O’Brien et al study. (B) Deconvolution of early (CD235a−) and late (CD235a+) erythroid maturation stages by CIBERSORT, a support vector regression approach, in O’Brien et al samples based upon normal erythroid maturation from GSE22552 (P = .000057 from likelihood ratio test). Error bars represent standard errors. (C) Although similarly sorted for CD235a−, DBA samples are composed of different mixtures of maturation stages than unaffected control samples. DBA due to RP or indeterminate (RP/I) samples are on average more mature (P = .017 from likelihood ratio test), whereas DBA due to GATA1 samples are less mature (P = .012 from likelihood ratio test). Error bars represent standard errors. (D) Kernel density plots for heme biosynthesis, ribosome biogenesis, and curated (cur.) GATA1 target genes. Heme biosynthesis and ribosome biogenesis are significantly associated with erythroid maturation stage (P < 10−10 for both by Kruskal-Wallis), whereas curated GATA1 target genes are only significant by more sensitive pairwise GSE analysis tests.13 Ribosome biogenesis was chosen as a more general measurement of “translational machinery” because small nucleolar RNAs were not measured in all microarrays. (E) Similar to panel D, except for the 3 groups investigated in O’Brien et al. Differences between genotype (RP/I, GATA1, or unaffected) are much smaller than between stages (Kruskal-Wallis P > .5 for all comparisons, but significant by pairwise GSE analysis13 ). (F) Similar GSE analysis to that reported by O’Brien et al is shown. Synthetic stage-matched normal samples were created using the estimate from panel B of the percentage of each erythroid stage present. GSE analysis indicates that these synthetic normals have equivalent or stronger GSE results when compared directly to the original samples. Black bars indicate genes that are ranked according to expression differences between DBA (RP/I) and unaffected controls and normalized enrichment scores (NES) are reported. All data presented in this figure were RMA-normalized (see analysis in Ulirsch13 for SCAN-normalized data). CFU-E, colony-forming unit erythroid; exp., experimental; exprs., expression; Int-E, intermediate erythroblast; late-E, late erythroblast; pheno, phenotype; Pro-E, pro-erythroblast.