Abstract
The malignant Reed-Sternberg cell of Hodgkin’s disease, first described a century ago, has resisted in-depth analysis due to its extreme rarity in lymphomatous tissue. To directly study its genome-wide gene expression, approximately 11,000,000 bases (27,518 cDNA sequences) of expressed gene sequence was determined from living single Reed-Sternberg cells, Hodgkin’s tissue, and cell lines. This approach increased the number of genes known to be expressed in Hodgkin’s disease by 20-fold to 2,666 named genes. The data here indicate that Reed-Sternberg cells from both nodular sclerosing and lymphocyte predominant Hodgkin’s disease were derived from an unusual B-cell lineage based on a comparison of their gene expression to approximately 40,000,000 bases (105 sequences) of expressed gene sequence from germinal center B cells (GCB) and dendritic cells. The data set of expressed genes, reported here and on the World Wide Web, forms a basis to understand the genes responsible for Hodgkin’s disease and develop novel diagnostic markers and therapies. This study of the rare Reed-Sternberg cell, concealed in its heterogenous cellular context, also provides a formidable test case to advance the limit of analysis of differential gene expression to the single disease cell.
HODGKIN’S DISEASE stands apart from other cancers by the extraordinary and unexplained scarcity of its neoplastic (Reed-Sternberg) cell in involved tissues. Because Reed-Sternberg cells are outnumbered by surrounding nonneoplastic cells by approximately 1,000:1,1-3 direct extraction approaches fail to determine either the gene expression profile or genetics of the Reed-Sternberg cell. Here, we applied a genome-wide strategy to examine regulated gene expression in single Reed-Sternberg cells and calculate its likely cell of origin.
Reed-Sternberg cells are clonal, aneuploid, and pathognomonic of Hodgkin’s disease.1-3 Reed-Sternberg cells, which are thought to be biologically active, presumably secrete peptides that elicit the surrounding inflammatory cell infiltrate and consequent systemic symptoms.1,3 Rearrangement and somatic hypermutation of their Ig heavy chain variable (VH) genes suggest that Reed-Sternberg cells are germinal center B lymphocytes (GCB) that carry nonproductive Ig genes but resist culling by apoptosis.4 Clustering of Hodgkin’s disease cases may be a clue to an infectious etiology, and Epstein-Barr virus (EBV) is present in the Reed-Sternberg cells of many, but not all, cases of Hodgkin’s disease.1-3 However, a pathogenetic relationship of EBV with Hodgkin’s disease has not been formally established.2Little genetic information is known because of the difficulty in obtaining cells for molecular or cytogenetic analysis, and pathogenetic associations with specific genes or cytogenetic abnormalities have not yet been established. Hodgkin’s disease is sometimes familial, and the genetically identical twin siblings of affected monozygous twins carry a 100-fold increased risk of Hodgkin’s disease,5 but no specific genetic locus has been identified.
Gene expression studies of Hodgkin’s disease have reported results for only approximately 100 gene products and have been largely limited to in situ microscopy, a technique restricted to those proteins and genes for which specific antibodies or nucleic acid probes have been prepared.1-3 In an attempt to globally analyze gene expression in Reed-Sternberg cells, we developed a single cell strategy whereby cDNA libraries were prepared from individual, viable Reed-Sternberg cells selected by micropipette from cell suspensions of primary tissue suitable for analysis by blot probing6 and specific polymerase chain reaction (PCR).7 Sequencing of cDNA libraries of single Reed-Sternberg cells has been combined here with sequence analysis of Hodgkin’s-derived cell lines and primary Hodgkin’s tissues and compared with putative normal cells of origin to determine the regulated gene expression profile of the Reed-Sternberg cell.
MATERIALS AND METHODS
Cell Sources
Hodgkin’s-derived cell lines.
Hodgkin’s disease tissue.
Two cDNA libraries were prepared (designated HD) from one previously fresh frozen, unfixed lymph node from a patient with nodular sclerosing Hodgkin’s disease, as described.8
Single Reed-Sternberg cells.
cDNA libraries were used that were previously prepared from four single Reed-Sternberg cells obtained by viable cell micromanipulation from two primary Hodgkin’s specimens, as reported.6 These correspond to cell numbers A1 and A14 from classical lymphocyte-predominant Hodgkin’s disease (cells 14 and 25, respectively) and cells L1 and L8 from nodular sclerosing Hodgkin’s disease (cells 34 and 41, respectively). Primary amplification of the library and the addition of restriction sites to the ends of the cDNA occurred during the first PCR reaction with 36-mers to create anXho I site at the 5′ end and an EcoRI site at the 3′ end.6 7 These were digested and ligated into λ phage using Stratagene Lambda Zap vector (Stratagene, La Jolla, CA). Reamplification was performed using a primer consisting only of the 36 nt clamp sequence without the dT24 sequence for subsequent sequencing. cDNA libraries from single cells contained actin mRNA signals by PCR and at approximately equal levels but did not contain genomic (intronic) DNA (not shown).
GCB.
The preparation of GCB cDNA libraries has been reported by the Cancer Genome Anatomy (CGAP) on the World Wide Web (www.ncbi.nlm.nih.nih.gov/CGAP) as tissue sample CGAP123.1 and libraries CGAP-GCB0 and CGAP-GCB1. A nonneoplastic human tonsil was dispersed into cell suspension and flow sorted into a GCB fraction on the basis of surface membrane phenotype, IgDneg,CD20dim. Normalized cDNA libraries were prepared from polyA-selected mRNA and sequenced. Sequences are deposited on the CGAP Web site.
Dendritic cells.
Human dendritic cells were generated from peripheral blood mononuclear cells by adherence plating and cultured for 7 days with 200 ng/mL recombinant granulocyte-macrophage colony-stimulating factor (rGM-CSF) and 200 U/mL recombinant interleukin-4 (rIL-4), as described.10 The culture was depleted of CD2+ and CD19+ cells by means of immunomagnetic beads coated with specific antibodies. This procedure gave greater than 97% pure CD1a+ and CD14− dendritic cell preparation. Cells were validated as dendritic by the following phenotype: HLA-DR+, CD14−, CD1a+, CD1b+, CD83+, CD80+, CD86+, CD19−, CD3−, and CD16−. A cDNA library was prepared.8
Sequence and Analysis
Sequencing was peformed by automated sequencers (ABI 371; Applied Biosystems, Perkin-Elmer, Norwalk, CT), and sequence information was compared against an expressed gene database consisting of approximately 2 × 106 human sequences using the BLAST algorithm, which is stored in a Sybase relational database, and computational analyses performed on central servers, including a convex computer and Sun SPARC 200 computers.8 We have previously successfully applied a data search study for the detection and cloning of tissue-specific genes, eg, a prostate-specific gene, NKX3.1, was cloned and mapped to chromosome 8p21.8 To estimate distinct genes, the following calculations were made. Because expressed sequence tags (ESTs) with different GenBank accession matches may represent one gene, all matched GenBank sequences were compared with each other, after masking for repetitive elements, (RepeatMasker program;http://ftp.genome.washington.edu/RM/RepeatMasker.html). Any two GenBank sequences with a BLASTN match11 that included an ungapped stretch of at least 48 of 50 identically matching nucleotides were grouped together as a cluster representing a distinct gene. GenBank numbers and occurrence frequency can be found on the World Wide Web atwww.hodgkins.georgetown.edu(Fig 1).
Library Comparisons
ESTs sampled from different cell cDNA libraries were compared with those of Hodgkin’s disease cells and scored for relative abundance according to the following formula: R(g) = [f1(g)/M]/[f2(g)/N], where library 1 consists of M distinct genes, f1(g) is the count of ESTs encoding a gene g in library 1, N is the count of distinct genes in library 2, and f2(g) is the count of ESTs representing gene g in library 2. The proportion of gene g transcripts in cell source 1 is f1(g)/M; for source 2, the estimate is f2(g)/N. When f1(g) and f2(g) are large, the 95% confidence interval of R(g) narrows and the data more precisely estimate the abundance ratio. If the 95% confidence interval for R(g) is (L,U) where L and U are the lower and upper endpoints, respectively, then when L is greater than 1, there is 95% confidence that R(g) is greater than 1 and the gene is overexpressed in source 1 relative to source 2; if U is less than 1, then the ratio is less than 1 with 95% confidence and there is 95% confidence that the gene is overexpressed in source 2 relative to source 1. To consider the effect of small counts, genes were ranked by confidence interval in the accompanying tables according to L when R(g) = 1 and by U when R(g) < 1.
RESULTS AND DISCUSSION
Expressed Genes of Hodgkin’s Disease
ESTs were sequenced from a total of 27,518 cDNA clones from libraries prepared from Hodgkin’s disease sources. To determine the number of distinct genes in the Hodgkin’s libraries, sequences were compared with a nonredundant database of all human nucleotide sequences in GenBank. In all, 11,072 sequences had GenBank assignments, comprising 3,784 different GenBank accession numbers that encompassed 2,666 distinct, named genes.
Expressed sequences obtained from single cells (n = 4,618) had a broad distribution of genes with no unexpected overrepresentation of any particular sequence (see table “Hodgkin’s single cells vs. Hodgkin’s cell lines” at www.hodgkins.georgetown.edu and Fig 1). Taken together with the gene expression seen by hybridization and PCR,6 7 the sequence data confirm the general representationality of the single cell expressed sequence libraries. To expand the basis of gene expression among Hodgkin’s-derived cells, 11,109 sequences were determined from cDNA libraries of two cell lines. Few genes were abundantly expressed by single cells that were not found in cell lines (see the accompanying table “Hodgkin’s single cells vs. Hodgkin’s cell lines” at www.hodgkins.georgetown.edu). Excluding unique sequences of Igs, histocompatibility antigens, and repetitive endogenous retroviral sequences, there were only four distinct genes found with more than two occurrences in single cells but not observed in cell lines. Thus, sequences from single Reed-Sternberg cells and the cell lines were grouped together for subsequent analysis.
Examples of expressed sequences with relative overrepresentation in the Hodgkin’s libraries are shown in Table 1. Ig was a frequent message in the single Reed-Sternberg cells and accounted for as much as 19% of the messages sequenced in one case, consistent with a B-cell phenotype. Further support of a B-cell lineage in single cells and cell lines was abundant Ig messages and B-cell–associated genes BL34,12 B7.1-CD80,13and CD20.14 The known Reed-Sternberg cell expression of a number of genes was confirmed, eg, tumor necrosis factor β (TNFβ), CD30, and nuclear factor κ-B (NFκB).1-3 However, based on prior reporting of the expression of approximately 100 genes in Hodgkin’s disease, greater than 95% of the 2,666 named genes reported here were not previously known to be expressed in Hodgkin’s disease.
GenBank . | Gene Name . | Templates . | HD:Other R(g) . | 95% Confidence Interval . | |
---|---|---|---|---|---|
HD . | Other . | ||||
U10687 | MAGE-4a | 13 | 1 | 711 | 93, 5435 |
M77844 | Oculorhombin (aniridia PAX6) | 5 | 0 | 91 | 22, 381 |
AF044197 | B-lymphocyte chemoattractant | 3 | 0 | 383 | 20, 7411 |
L43400 | Chromosome 5 P1 clone 792C12 | 3 | 0 | 383 | 20, 7411 |
X86174 | SSX1 | 4 | 3 | 73 | 16, 326 |
U54777 | MSH6 | 8 | 13 | 34 | 14, 81 |
AF026692 | Frizzled-related protein frpHE | 9 | 15 | 33 | 14, 75 |
U41206 | MSH2 | 2 | 0 | 274 | 13, 5696 |
U83171 | Macrophage-derived chemokine (MDC) | 19 | 73 | 14 | 9, 24 |
U03187 | IL-12 receptor | 6 | 19 | 17 | 7, 43 |
U90582 | Chromosome 11p15.5 | 3 | 6 | 27 | 7, 109 |
GenBank . | Gene Name . | Templates . | HD:Other R(g) . | 95% Confidence Interval . | |
---|---|---|---|---|---|
HD . | Other . | ||||
U10687 | MAGE-4a | 13 | 1 | 711 | 93, 5435 |
M77844 | Oculorhombin (aniridia PAX6) | 5 | 0 | 91 | 22, 381 |
AF044197 | B-lymphocyte chemoattractant | 3 | 0 | 383 | 20, 7411 |
L43400 | Chromosome 5 P1 clone 792C12 | 3 | 0 | 383 | 20, 7411 |
X86174 | SSX1 | 4 | 3 | 73 | 16, 326 |
U54777 | MSH6 | 8 | 13 | 34 | 14, 81 |
AF026692 | Frizzled-related protein frpHE | 9 | 15 | 33 | 14, 75 |
U41206 | MSH2 | 2 | 0 | 274 | 13, 5696 |
U83171 | Macrophage-derived chemokine (MDC) | 19 | 73 | 14 | 9, 24 |
U03187 | IL-12 receptor | 6 | 19 | 17 | 7, 43 |
U90582 | Chromosome 11p15.5 | 3 | 6 | 27 | 7, 109 |
A sample of genes not previously known to be expressed in Hodgkin’s disease selected for Hodgkin’s disease-association [relatively high R(g) values, see text] and biological interest. Templates refer to the number of times a particular sequence was detected in Hodgkin’s disease (HD) and all other human cell sequences (Other). The relative incidence of expression in Hodgkin’s disease as compared with other cells [R(g)] and 95% confidence interval were calculated as described. Genes are ranked in descending order of their lower (L) limit of the 95% confidence interval of the R(g) value. A complete list of the 2,666 distinct genes is deposited on the World Wide Web (www.hodgkins.georgetown.edu, table entitled “Hodgkin’s cells/tissues vs. entire database”).
The origin of the Reed-Sternberg cell of Hodgkin’s disease was addressed by comparing the relatedness of Hodgkin’s cell gene expression to other cell types on the basis of R(g) value and 95% confidence interval. When compared with the entire dataset of 2 × 106 sequences from more than 800 human cell cDNA libraries, several genes emerged that were overrepresented in Hodgkin’s libraries (Table 1, complete list at www.hodgkins.georgetown.edu, table entitled “Hodgkin’s cells/tissues vs. other cell types”). For example, the melanoma-associated tumor antigen, MAGE-4a,15 was encountered 20 times in the Hodgkin’s libraries, but only 5 times in all other libraries. The relative frequency [R(g)] of MAGE-4a was 219, with a 95% confidence level of 82 to 583. Several genes disproportionately expressed in Hodgkin’s disease included some that were expected, such as CD30, bcl-6, and NFκB,1-3 but other genes were encountered whose expresson in Hodgkin’s disease was not known, eg, the oculorhombin (aniridia, Pax-6) gene, a paired box transcription factor found in the developing neuroretina that is not known to be expressed in the immune system or its neoplasms.16
Reed-Sternberg and Germinal Center B Cells
The comparison of Hodgkin cell gene expression with that of GCBs was performed with approximately 49,000 sequences from cDNA libraries of cell-sorted, GCBs (library codes NCI_CGAP_GCB0 and NCI_CGAP_GCB1 atwww.ncbi.nlm. nih.gov/CGAP). The GCB libraries contained 5,139 different GenBank accession numbers that comprised an estimated 4,465 distinct genes (Table 2; complete list on the website, see table entitled “Hodgkin’s cells vs. germinal center B cells”). For the purpose of the specific comparison between Hodgkin’s cells and GCB, the whole tissues (HD) were excluded, because they were largely composed of T cells and would exaggerate differences between the two library classes.
GenBank . | Gene Name . | Templates . | HD:B R(g) . | 95% Confidence Interval . | |
---|---|---|---|---|---|
HD . | B . | ||||
X03558 | Elongation factor 1-α | 252 | 3 | 794 | 50, 12740 |
Z28407 | Ribosomal protein L8 | 102 | 1 | 195 | 17, 882 |
X635526 | Elongation factor 1-γ | 60 | 0 | 232 | 14, 3745 |
M17885 | Ribosomal phosphoprotein P0 | 56 | 2 | 54 | 13, 220 |
L19739 | Metallopanstimulin | 30 | 0 | 117 | 7, 1909 |
U83171 | Macrophage-derived chemokine (MDC) | 16 | 0 | 63 | 4, 1052 |
L15320 | Nucleophosmin B23 (NPM) | 22 | 4 | 11 | 4, 31 |
U10687 | MAGE-4a | 11 | 0 | 44 | 3, 747 |
M64241 | Wilm’s tumor-related (QM) | 10 | 0 | 40 | 2, 686 |
X07417 | Retrotransposon SINE-R11 | 9 | 0 | 36 | 2, 625 |
GenBank . | Gene Name . | Templates . | HD:B R(g) . | 95% Confidence Interval . | |
---|---|---|---|---|---|
HD . | B . | ||||
X03558 | Elongation factor 1-α | 252 | 3 | 794 | 50, 12740 |
Z28407 | Ribosomal protein L8 | 102 | 1 | 195 | 17, 882 |
X635526 | Elongation factor 1-γ | 60 | 0 | 232 | 14, 3745 |
M17885 | Ribosomal phosphoprotein P0 | 56 | 2 | 54 | 13, 220 |
L19739 | Metallopanstimulin | 30 | 0 | 117 | 7, 1909 |
U83171 | Macrophage-derived chemokine (MDC) | 16 | 0 | 63 | 4, 1052 |
L15320 | Nucleophosmin B23 (NPM) | 22 | 4 | 11 | 4, 31 |
U10687 | MAGE-4a | 11 | 0 | 44 | 3, 747 |
M64241 | Wilm’s tumor-related (QM) | 10 | 0 | 40 | 2, 686 |
X07417 | Retrotransposon SINE-R11 | 9 | 0 | 36 | 2, 625 |
Listed are examples of genes whose adjusted relative occurrence values [R(g)] in Hodgkin’s disease (HD) were high compared with germinal center B cells (B). A complete list can be found at www.hodgkins.georgetown.edu (“Hodgkin’s cells vs. germinal center B cells”). Germinal center B cell (GCB) cDNA libraries are reported at www.ncbi.nlm.nih.gov/CGAP.
A concern regarding the comparison of libraries was that GCB libraries were normalized, whereas the Hodgkin’s libraries were not. Normalization is a smoothing effect achieved by self-subtracting sequences through reassociation to reduce redundancy of very highly expressed genes and bring their levels closer to the levels of intermediately expressed genes.17 It has little effect on intermediately or rarely expressed genes and does not completely remove cDNA clones. For genes whose expression was high in Hodgkin’s cells but low in GCB (top of the list in table entitled “Hodgkin’s cells vs. germinal center B cells”), the sequences that were rare in GCB after normalization must also have been rare before. The rarest of sequences, less than 0.02% (defined as components II and III in Soares et al17), would be expected to have less than 10 counts each among the 49,000 GCB sequences. Of the top 500 sequences in the comparison, “Hodgkin’s cells vs. germinal center B cells” (high in Hodgkin’s cells relative to GCB), 490 (98%) cDNAs had fewer than 10 counts each in GCB. Thus, the comparison is valid for the vast majority of genes with high relative expression in Hodgkin’s cells. For sequences at the bottom of the table “Hodgkin’s cells vs. germinal center B cells”, ie, those high in GCB relative to Hodgkin’s cells, the data are unlikely to be influenced by normalization, because normalization only increases the average frequency of rare species by about 50%.17 Thus, a Hodgkin’s cell:GCB ratio of 0.10 would only be modestly increased to 0.15, which is still a significant difference between the two library classes. These assumptions are conservative, because the GCB libraries were subjected to only one round of normalization, compared with two in Soares et al.17 Therefore, the effect of normalization on the comparison of GCB and Hodgkin’s cell libraries is likely less than estimated above.
Examples of genes with overexpression in Hodgkin’s disease as compared with GCB are displayed in Table 2 (taken from website table “Hodgkin’s cells vs. germinal center B cells”). A marked difference was the frequent expression of macrophage-derived chemokine (MDC) in Hodgkin’s cells, whereas this sequence did not occur in GCB libraries. MDC was thought to have differentially regulated expression restricted to dendritic cells.18Several B-cell lineage genes expressed in GCB were not found in Hodgkin’s disease cells (Table 3; excerpted from the last genes listed in the website table entitled “Hodgkin’s cells vs. germinal center B cells”). Genes expressed by GCB but that were not detected in the Hodgkin’s libraries included those whose expression, when reduced or lost, might release cells from normal growth or apoptosis controls, such as Rb, BUB3, and PCD2 (Table 3).
GenBank . | Gene Name . | Templates . | |
---|---|---|---|
HD . | B . | ||
U92436 | Mutated in multiple advanced cancers protein (MMAC1) | 0 | 29 |
X66087 | a-myb | 0 | 25 |
AF047472 | Spleen mitotic checkpoint BUB3 | 0 | 18 |
M16038 | Lyn tyrosine kinase | 0 | 17 |
S78085 | Programmed cell death-2/Rp8 (PCD2) | 0 | 16 |
U34360 | Lymphoid nuclear protein (LAF-4) | 0 | 15 |
X52056 | spi-1 proto-oncogene | 0 | 12 |
L04288 | Cyclophillin-related protein | 0 | 11 |
D14540 | Mixed lineage leukemia (MLL) | 0 | 10 |
J03779 | Common acute lymphoblastic leukemia antigen (CALLA) | 0 | 10 |
L78132 | Prostate cancer tumor antigen (pcta-1) | 0 | 10 |
M27866 | Retinoblastoma protein (Rb) | 0 | 10 |
S75217 | CD79A | 0 | 4 |
M89957 | CD79B | 0 | 3 |
D83597 | RP105 signaling molecule | 0 | 8 |
M28170 | CD19 | 0 | 8 |
S76617 | Blk | 0 | 9 |
U07349 | Germinal center kinase | 0 | 5 |
GenBank . | Gene Name . | Templates . | |
---|---|---|---|
HD . | B . | ||
U92436 | Mutated in multiple advanced cancers protein (MMAC1) | 0 | 29 |
X66087 | a-myb | 0 | 25 |
AF047472 | Spleen mitotic checkpoint BUB3 | 0 | 18 |
M16038 | Lyn tyrosine kinase | 0 | 17 |
S78085 | Programmed cell death-2/Rp8 (PCD2) | 0 | 16 |
U34360 | Lymphoid nuclear protein (LAF-4) | 0 | 15 |
X52056 | spi-1 proto-oncogene | 0 | 12 |
L04288 | Cyclophillin-related protein | 0 | 11 |
D14540 | Mixed lineage leukemia (MLL) | 0 | 10 |
J03779 | Common acute lymphoblastic leukemia antigen (CALLA) | 0 | 10 |
L78132 | Prostate cancer tumor antigen (pcta-1) | 0 | 10 |
M27866 | Retinoblastoma protein (Rb) | 0 | 10 |
S75217 | CD79A | 0 | 4 |
M89957 | CD79B | 0 | 3 |
D83597 | RP105 signaling molecule | 0 | 8 |
M28170 | CD19 | 0 | 8 |
S76617 | Blk | 0 | 9 |
U07349 | Germinal center kinase | 0 | 5 |
Examples of genes whose expression was not detected in Hodgkin’s cells and cell lines but were expressed by GBC. The number of encounters is given as templates for Hodgkin’s disease (HD: single cells, cell lines) and GBC (B). Genes listed here are found in the table “Hodgkin’s cells vs. germinal center B cells,” which can be seen at the website given above (see legend for Table 2).
A Dendritic Cell Origin of the Reed-Sternberg Cell?
Dendritic cells have been proposed as the origin of Reed-Sternberg cells based on their similarities of immunological phenotype and function as antigen-presenting cells.1,3,19 From more than 50,000 sequences determined from a dendritic cell cDNA library, 25,823 had known GenBank assignments accounting for 5,516 distinct GenBank accessions and 4,399 distinct genes (deposited at the website table entitled “Hodgkin’s cells vs. dendritic cells”). The Hodgkin’s/Reed-Sternberg single cell and cell line sequence dataset (omitting the whole Hodgkin’s lymph node due to its many cell types) was notable for its differences when compared with dendritic cells. In contrast to the many encounters of Ig genes in the Hodgkin’s cells, Ig messages were absent in dendritic cells. For example, dendritic cells, but not Reed-Sternberg cells or cell lines, expressed the macrophage-dendritic cell lineage genes Mac1 (CD11b)20 and CD68.21 Hodgkin’s cells and cell lines expressed MDC, a gene enriched in dendritic cells, but not another CC chemokine, monocyte chemoattractant protein-4 precursor (MCP-4).22 Two conclusions can be drawn: (1) the single Reed-Sternberg cells are not dendritic in origin and (2) the single cells obtained by micromanipulation were not mistakenly macrophages.
The Single Cancer Cell
The search for genomic mutation and modulated gene expression accounting for the malignant state has been a painstaking process involving genetic linkage analysis, loss of heterozygosity and screening with specific probes for expression of known genes at the mRNA and protein levels. Recent advances in the capacity to address cancer cell-associated gene expression, using representational difference analysis (RDA),23 serial analysis of gene expression (SAGE),24 and EST sequencing,25 have begun to improve the efficiency of screening differential gene expression between cancer and normal cells. Still, each technique faces the same natural obstacle to the study of primary tissue; namely, cellular populations in cancer are not homogeneous. As a consequence, RNA extracted from tumor tissue is derived from cell types in addition to neoplastic cells, such as stromal, endothelial, and inflammatory cells. Although the amount of mRNA in a single cell is insufficient for either conventional poly-A mRNA purification or SAGE analysis,24 it is adequate for high throughput automated sequencing and database analysis, as shown here. The ability to explore genes at the level of a single, defined cell has broad potential for complex processes such as the nervous system, the developing organism, and pathological conditions. However, gene expression analysis of single cells is not stochastic and may be biased in its amplification of some mRNAs.26 Conversely, lack of detection of expected messages may be a consequence of mRNA copy number, stability of mRNA, lack of poly-A tails, or specificity of the sequence itself. The statistical power of single cell gene expression technology should be enhanced by the efficiency of in situ cell collection techniques, such as laser capture microdissection, once its resolution has consistently reached the single cell.27 A frequently represented message provides insight into differential gene expression in vivo. For example, the high frequency of Ig messages and restriction to a single light chain type suggests a (clonal) B cell and corroborates independent evidence of clonal Ig gene rearrangements in the genomic DNA of single Reed-Sternberg cells.4
With increasingly high throughput technologies, such as capillary-based sequencing28 and gene expression microarrays,29large numbers of sequences should be accessible from small, carefully selected cells. Defined computer search logic for comparison of gene expression to other cell types and placement of short sequences into known fuller-length cDNAs creates a platform for the targeted study of regulated genes in a pathological cell. The present investigation demonstrates a model in which a rare cell, obscured by heterogenous tissue, can be unveiled by gene sequencing and statistical analysis to disclose disease–associated genes.
ACKNOWLEDGMENT
The cell lines, L428 and KMH2, were generously provided by Drs Volker Diehl and Hiroshi Kamesaki, respectively. The authors gratefully acknowledge the technical contribution of Maria Fergusson and the helpful advice of Drs Reinhard Ebner and Steven Ruben and are particularly indebted to the staff of the HGS sequencing facility.
Supported in part by the American Cancer Society, DHP112 (to J.C.), and the O. Benwood Hunter Endowment (to J.C.).
The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. section 1734 solely to indicate this fact.
REFERENCES
Author notes
Address reprint requests to Jeffrey Cossman, MD, NW 103 Medical-Dental Bldg, Georgetown University Medical Center, 3900 Reservoir Rd, NW, Washington, DC 20007; e-mail: cossmanj@gunet.georgetown.edu; website: www.hodgkins.georgetown.edu.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal