While microarray analysis of global gene expression yields enormous amounts of data, there are concerns about standardization and validity of findings. Consequently, we wanted to determine the variability in gene expression studies of human bone marrow in the literature and study the factors that account for these differences. We also wanted to determine if certain genes were consistently and differentially enriched in human bone marrow stem cells. A total of 64 individual datasets were collected from gene expression omnimbus (GEO) database for our analysis (2001–2006). Most of the datasets had been used as controls in studies of hematological malignancies. 13 datasets were hybridized to the Affymetrix U95 chip, 38 analyzed by the Affymetrix human U133A chip and 13 by the U133 plus 2.0 platform. RNA for these studies was derived from purified normal CD34+ cells in 48 cases and from unsorted normal bone marrow mononuclear cells in 16 cases. To merge data from different platforms, we converted individual probe Sequence_ids to RefSeq gene IDs and analyzed them by SAS (SAS Institute, Cary, NC) and Arrayassist software package (Stratagene©). A total of 23686 unique gene IDs were obtained for analysis after the data were normalized, and a KNN algorithm was used to fill the gaps in the data. Our results reveal that there is marked variability in gene expression patterns in this cohort. The data sets clustered together primarily on the basis of the laboratory that performed the assays. (Hierarchical clustering based on average Euclidean distances). Clustering was further defined by the type of chip/platform used for the analysis. Interestingly, the similarity between CD34+ sorted and ununsorted whole BM samples was greater than interplatform similarity between the same phenotypes of cells examined. Notwithstanding the variability in gene expression, there were a novel set of genes that were differentially enriched in all 64 samples. These genes included transcription factors (Kruppel like factor 6), translational proteins (eukaryotic translation initiation factor 4A, isoform 1, ribosomal proteins) and other proteins not previously implicated in hematopoeisis (guanine nucleotide binding protein (GNAS), Calnexin, HLA associated proteins, dUTP pryophosphatase etc.) Mouse homologues of several of these proteins were found to be overexpressed in a previous well respected study of mouse hematopoeitic stem cells (

Ramalho-Santos et al, Science 2002;298(5593)
). To further validate these findings, we performed gene expression array analysis on primary bone marrow cells using a completely different platform (Nimblegen 37K arrays) and demonstrated enrichment of majority of these genes. Thus, we provide a blueprint for conducting similar meta-analysis across various microarray platforms and our findings disclose tremendous platform and lab dependant differences in microarray gene expression patterns. In spite of this variability, data mining of discrete datasets can be a useful tool for gene discovery. Finally, we are in the process of constructing a publicly searchable database of normal human bone marrow gene expression which may serve as a source of controls for gene expression studies of hematopoeitic malignancies by various investigators.

Disclosure: No relevant conflicts of interest to declare.

Author notes

*

Corresponding author

Sign in via your Institution