Abstract
While microarray analysis of global gene expression yields enormous amounts of data, there are concerns about standardization and validity of findings. Consequently, we wanted to determine the variability in gene expression studies of human bone marrow in the literature and study the factors that account for these differences. We also wanted to determine if certain genes were consistently and differentially enriched in human bone marrow stem cells. A total of 64 individual datasets were collected from gene expression omnimbus (GEO) database for our analysis (2001–2006). Most of the datasets had been used as controls in studies of hematological malignancies. 13 datasets were hybridized to the Affymetrix U95 chip, 38 analyzed by the Affymetrix human U133A chip and 13 by the U133 plus 2.0 platform. RNA for these studies was derived from purified normal CD34+ cells in 48 cases and from unsorted normal bone marrow mononuclear cells in 16 cases. To merge data from different platforms, we converted individual probe Sequence_ids to RefSeq gene IDs and analyzed them by SAS (SAS Institute, Cary, NC) and Arrayassist software package (Stratagene©). A total of 23686 unique gene IDs were obtained for analysis after the data were normalized, and a KNN algorithm was used to fill the gaps in the data. Our results reveal that there is marked variability in gene expression patterns in this cohort. The data sets clustered together primarily on the basis of the laboratory that performed the assays. (Hierarchical clustering based on average Euclidean distances). Clustering was further defined by the type of chip/platform used for the analysis. Interestingly, the similarity between CD34+ sorted and ununsorted whole BM samples was greater than interplatform similarity between the same phenotypes of cells examined. Notwithstanding the variability in gene expression, there were a novel set of genes that were differentially enriched in all 64 samples. These genes included transcription factors (Kruppel like factor 6), translational proteins (eukaryotic translation initiation factor 4A, isoform 1, ribosomal proteins) and other proteins not previously implicated in hematopoeisis (guanine nucleotide binding protein (GNAS), Calnexin, HLA associated proteins, dUTP pryophosphatase etc.) Mouse homologues of several of these proteins were found to be overexpressed in a previous well respected study of mouse hematopoeitic stem cells (
Disclosure: No relevant conflicts of interest to declare.
Author notes
Corresponding author