Abstract
Pharmacogenomics has traditionally focused on the identification of inherited genetic differences that influence a patient’s response to a specific therapeutic agent. These differences can range from inherited variability in the genes that affect drug absorption, distribution, intracellular transport, metabolism, and elimination, to variability in the genes that encode either the target of the drug or components of the pathway affected by the drug. The main goal of pharmacogenomics is to improve our understanding of how these variations, either individually or collectively, influence the therapeutic response. The genetic differences inherent within cancer cells constitute the other major variable in a patient’s ultimate response to therapy. In this review, we provide an overview of high-throughput genomic methods that can be used to identify genetic lesions within cancer cells. These efforts will ultimately allow the identification of the full complement of genetic lesions that underlie the establishment and maintenance of the leukemic clone. The identification of these lesions should provide the bases for defining the molecular “Achilles heels” against which new targeted therapies can be developed.
Acute lymphoblastic leukemia (ALL) is the most common pediatric malignancy, affecting approximately 31 per million children less than 15 years old per year in the United States.1 The disease is characterized by an expansion of leukemic blasts that circulate within the blood and disseminate throughout the tissues of the body. Despite only modest variability in the appearance of leukemic cells from one patient to another, there exists significant variability in the underlying molecular pathology, with a number of distinct genetic subtypes of ALL identified. Marked differences in presentation, clinical behavior, and response to individual therapeutic agents exists among the distinct genetic ALL subtypes. Thus, in most contemporary treatment protocols the different genetic subtypes are treated using so-called risk-adapted therapy—that is, therapy in which the intensity of treatment is tailored to a patient’s relative risk of relapse.2 This approach has resulted in significant improvements in ALL cure rates over the last four decades, with present approaches providing survival rates of over 80%. Further improvements in therapy are needed to not only advance cure rates, but to also reduce the short-and long-term toxicity that results from the therapies used. One approach to achieve this will be to develop new agents that specifically act against the genetic lesions that are critical to the growth and/or survival of the leukemic clones—so-called targeted therapy. Critical to this approach will be the accurate identification of the full complement of genetic lesions within each subtype of ALL that are required for the development and maintenance of the leukemic cell.
Efforts to define the genetic lesions that underlie ALL have identified a number of different subtypes of ALL based on their lineage (T versus B cell), chromosome number, or the presence or absence of chromosomal translocations. The most common genetic subtypes include B-progenitor leukemias with t(9;22)[BCR-ABL1], t(1;19)[E2A-PBX1], t(12;21)[TEL-AML1], rearrangements of the MLL gene on chromosome 11, band q23, a hyperdiploid karyotype with > 50 chromosomes, or a hypodiploid karyotype with < 46 chromosomes; mature B-cell leukemia with rearrangements between c-MYC and the immunoglobulin heavy or light chain genes; and T-lineage leukemias (T-ALL). Collectively, these lesions account for approximately 75% of cases3 and their presence significantly influences the therapeutic approach used for treatment. The identification of these abnormalities has provided important insights into normal and leukemic hematopoiesis.2 In addition, prenatal tracking and twin studies have shown that most of these lesions are important in the initiation of the leukemia.4 However, an important observation that has emerged from attempts to model these leukemias in mice is that the individual genetic lesions alone are insufficient to generate a full leukemic phenotype. Thus, cooperating oncogenic lesions are required. Candidate gene approaches have identified the cooperating lesions in a subset of leukemia, such as deletion or epigenetic silencing of CDKN2A, and mutations of NOTCH1 in T-lineage ALL cases.5,6 However, the full complement of cooperating lesions and their distribution within the known genetic subtypes of pediatric ALL remains to be defined. The identification of these lesions should provide the basis for defining the molecular “Achilles heels” against which targeted therapy can be developed.
Genome-wide Approaches for Mutation Detection
A number of genome-wide approaches are available to identify genomic abnormalities in cancer, including spectral karyotyping,7 comparative genomic hybridization (CGH), array-CGH8,–13 and single nucleotide polymorphism (SNP) arrays.14,15 Array-based methods are now widely used and involve the hybridization of sample DNA to arrays containing thousands of bacterial artificial chromosome (BAC)8,12 or oligonucleotide probes.9,–11 The resolution of genomic coverage and information obtained is critically dependent upon the platform used. Genomic coverage ranges from a mean intermarker distance of over 1 Mb for older BAC arrays, to less than 5 kb for Affymetrix 500K SNP arrays,16 and below 100 bp for ultra-high density oligonucleotide tiling arrays.13 By comparing SNP genotype of tumor and corresponding normal samples, SNP arrays are able to infer copy-neutral loss-of-heterozygosity (LOH, or uniparental disomy), in addition to directly inferring DNA copy number from probe hybridization intensity. Uniparental disomy may indicate reduplication of a mutated or aberrantly methylated tumor suppressor gene and is thus important in the genomic assessment of cancer.
Sample processing for SNP array analysis is shown in Figure 1 (see Color Figures, page 508). In this process, high-quality genomic DNA is isolated from the leukemic cell. The genomic DNA is then digested with a restriction enzyme and the restricted DNA is ligated to adapter primers and amplified using PCR. The PCR products are then purified, fragmented and labeled with fluorescent dyes and hybridized to the arrays. Following stringent washing, the microarrays are scanned to quantitate the level of sample DNA that has hybridized to each individual oligonucleotide probe. Probe hybridization intensity and SNP genotype is then calculated. Inference of SNP genotype is relatively straightforward; however, accurate inference of DNA copy number is highly dependent on tumor characteristics (e.g., aneuploidy), technical aspects (e.g., batch effect, DNA extraction method, interlaboratory variability) and analytical factors (e.g., the choice of diploid reference samples). Several software packages are available with varying ability to handle large datasets, analyze newer arrays, combine array platforms for high resolution analysis, and simultaneously analyze LOH and copy number. These include the Affymetrix Copy Number Analysis Tool (CNAT),17 dChip-SNP,18,19 CNAG (Copy Number Analyzer for Affymetrix GeneChip Mapping 100K arrays)20 and GIM (Genome Imbalance Map).21 The free tool dChipSNP has all of the above capabilities and has been used extensively in our work on acute leukemia and has proven to be particularly robust and easy to use. One additional critical issue is that of array normalization. Appropriate array normalization is vital for accurate copy number inference and intersample comparisons, but is problematic in aneuploid samples. As a fixed mass of DNA is processed for each array, the amount of DNA derived from each chromosome will be substantially different for a grossly aneuploid sample than in a diploid sample. As the entire array may be used for normalization, irrespective of karyotype, this may result in inaccurate copy number results (e.g., a trisomic chromosome may be inferred as diploid). To avoid this problem, we have recently developed a karyotype-guided normalization algorithm that uses only those SNPs from regions known to be diploid by conventional cytogenetics. Data normalized by this approach can then be used in downstream analyses (e.g., dChipSNP) and result in greatly improved copy number estimates (unpublished data).
Mutations in Pediatric ALL Detected by Genome-wide Analysis
The application of Affymetrix SNP arrays to pediatric ALL shows the great potential of this methodology. In a recently published study using 10K SNP arrays, several regions of copy number change and LOH were identified in a small cohort of pediatric ALL cases.22 Unfortunately, the relatively low resolution of the 10K SNP arrays precluded the clear identification of the target gene(s) affected by these copy number alterations.22 We have recently extended this type of analysis by using a combination of four Affymetrix SNP arrays (50K Hind, 50K Xba, 250K Sty and 250K Nsp) that together interrogate over 615,000 genomic loci at a mean intermarker distance of 4.8 kb. Two-hundred fifty pediatric ALL samples were examined including 200 B-progenitor ALLs and 50 T-lineage cases. Corresponding remission samples were also examined for the majority of these cases. The genome-wide copy number changes identified for a representative sample of 123 ALL cases are depicted in Figure 2 (see Color Figures, page 508). Each column represents DNA copy number of a single leukemic sample compared to a reference pool of 60 remission samples, and each row shows the inferred copy number for a SNP across the sample set with SNPs grouped by chromosomal location. This analysis accurately detected known DNA copy number changes including the multiple whole chromosomal gains characteristic of high hyperdiploid B-progenitor ALL, the duplication of chromosome 1q in ALL with t(1;19)(q23;p13) as a result of the unbalanced translocation, and deletions at 6q16.2-3,23 9p21.3 (harboring the CDKN2A and CDKN2B genes),24 and 12p13.2 (ETV6).25
In addition to the identification of these known lesions, this analysis identified a number of new lesions. The most striking finding was mono-allelic deletions of genes that play a critical role in regulating B cell development and differentiation in approximately 40% of B-progenitor ALLs (Mullighan and Downing, unpublished observations). To fully understand the significance of this finding a brief review of the molecular pathways regulating B cell development and differentiation is warranted.
Transcriptional Control of Normal B Cell Development
As illustrated in Figure 3 , differentiation of lymphoid progenitors into mature B cells is a tightly regulated process coordinated by a hierarchical network of transcription factors and cytokines.26,27 This process is accompanied by the sequential rearrangement of immunoglobulin receptor genes. Productive rearrangement of immunoglobulin heavy chain enables expression of the pre-B cell receptor necessary for survival of B-cell precursors, and subsequent immunoglobulin light chain rearrangement permits expression of the mature B-cell receptor. At least seven transcription factors (PU.1, Ikaros, E2A, BCL11A, EBF, PAX5 and FOXP1) and two cytokines (FLT3 and IL-7R) are involved. PU.1 and Ikaros are required for the development of early lymphoid precursors, and Ikaros-null mice display an absence of B-, T-, and natural killer-cell lineages.28,29 Signaling through the fms-like tyrosine kinase-3 (FLK2/FLT3) and interleukin-7 receptors is important for the generation of pro-B cells. The transcription factors TCF3 (E2A), EBF, and BCL11A are crucial for the generation of early (pro-B) B cell precursors, and mice lacking E2A, EBF or BCL11A show an arrest in B cell differentiation prior to the onset of immunoglobulin heavy chain gene rearrangement.30,–33 E2A and EBF also regulate expression of downstream transcription factors such as PAX5. PAX5 is essential for B lineage commitment and differentiation, in part by activating the expression of B-cell specific genes including CD79A (MB-1), CD19 and BLNK, and repressing the expression of B-lineage inappropriate genes such as NOTCH1 and MCSFR.34 PAX5 together with FOXP1 promotes immunoglobulin heavy chain V→DJ recombination.35,36 Lack of PAX5 expression results in a block in B cell differentiation at an early pro-B cell stage, prior to V→DJ recombination.37
Mutations in Genes Regulating B-cell Development and Differentiation Are Frequent in Pediatric B-progenitor ALL
Our SNP analysis revealed copy number alteration of PAX5 in approximately 30% of B-progenitor ALL cases. PAX5 is located at 9p13, comprises 10 exons, and contains several key functional domains, including a highly conserved N-terminal DNA-binding paired domain, a homeodomain-like domain, and a C-terminal transactivation domain (Figure 4A; see Color Figures, page 508). PAX5 copy number alterations included mono-allelic loss in 53 cases, bi-allelic loss in 3 cases, and an internal amplification in 1 case (Figure 4B; see Color Figures, page 508). PAX5 deletions and the amplification were confirmed by fluorescence in situ hybridization (FISH) of leukemic blasts and/or genomic quantitative PCR, and were present in over 90% of blasts (Figure 4C–D [see Color Figures, page 508] and data not shown). Almost half of the mono-allelic deletions were confined to PAX5, with most deleting only a subset of PAX5 exons. The PAX5 lesions are predicted to result in either haploinsufficiency or the generation of hypomorphic alleles that produce proteins lacking the DNA-binding domain or the transcriptional activation domain. In addition, 4 cases were identified that contained focal 3′ deletions of PAX5 sequences secondary to cryptic translocations, including 2 cases with a PAX5-TEL translocation and 1 case each with a PAX5-FOXP1 and a PAX5-EVI3 translocation.
To determine if PAX5 was also the target of point mutations, genomic sequencing of all PAX5 exons was performed in all cases. Importantly, mutations were identified in 14 B-ALLs. These mutations included missense, frame-shift and splice-site mutations that clustered in the DNA-binding paired domain and transactivation domains. Sequence analysis of remission marrow samples from each patient revealed that these PAX5 mutations were somatically acquired. Analysis of the locations of the DNA-binding domain mutations against the solved crystal structure of PAX5 suggested that these mutations would eliminate or alter DNA binding.
Since loss or reduced PAX5 function appeared to be a common feature of the mutations, we also assessed whether methylation-induced silencing of PAX5 occurs in B-progenitor ALL. Using mass spectrometry,38 we analyzed blasts from 96 ALLs for their methylation status at two CpG rich regions within the PAX5 exon 1A promoter, and a single CpG rich region within the PAX5 exon 1B promoter. Dense methylation of the two CpG islands in the promoter region for exon 1A was observed in the majority of the T-ALL samples. By contrast, no high-level methylation was seen in any of the CpG islands in the B-progenitor ALLs examined. These data suggest that epigenetic silencing of PAX5 is a hallmark of T-ALL, but is not a major mechanism of PAX5 inactivation in B-lineage ALL.
In addition to copy number alterations of PAX5 a number of other genes involved in B cell development and differentiation were found to have alteration. Specifically, deletions were identified in EBF (8 cases), Ikaros (17 cases), Aiolos (2 cases), LEF1 (3 cases), and BLNK (2 cases). Taken together these data reveal mutations in genes regulating B cell development and differentiation in over 40% of pediatric B-progenitor ALL.
Exactly how these genetic alterations contribute to the establishment or maintenance of the leukemic clone remains to be determined. Our analysis of the PAX5 mutations suggests that they lead to a reduction in the level of PAX5 either as a result of haploinsufficiency or the generation of hypomorphic forms of the transcription factor. This finding is important in that mice null for PAX5 show a complete arrest in B cell development at the pro-B cell stage of development prior to completing immunoglobulin heavy chain rearrangement.37 Thus, the simplest interpretation of our data is that the identified mutations lead to a reduction in the functional level of PAX5 and/or other key regulators of B cell development and differentiation, and as a consequence the altered leukemic progenitor is unable to normally differentiate beyond the pro-B cell stage of development. Experiments to directly examine the functional effects of the identified PAX5 mutations on normal B-cell development will directly test this hypothesis. In addition to contributing to the block in differentiation, the mutations may also play a more direct role in collaborating with other known genetic lesions in leukemogenesis. Supporting this interpretation is the striking finding of an association of specific B cell developmental gene mutations with particular ALL subtypes. An association that is particularly noteworthy is the association of focal monoallelic PAX5 deletion in 30% of ALLs that contain a t(12;21) [TEL-AML1] translocation. Directly assessing the ability of PAX5 haploinsufficiency and selected PAX5 mutations to cooperate with TEL-AML1 in inducing leukemia will test this hypothesis.
Summary
The application of high-throughput genome-wide methods such as SNP arrays coupled with genomic resequencing will allow compilation of a comprehensive registry of genetic lesions in pediatric ALL. As demonstrated by our studies, the information generated from these studies will provide valuable insights into pathways that are critical for the development or maintenance of the transformed phenotype. Defining the gene products within these pathways that could serve as logical targets for the development of new therapeutic agents will aid in improving our ability to cure patients while simultaneously minimizing the toxicity they experience as a result of the treatment. To achieve these goals, advancements will need to be made in the resolution at which copy number changes can be detected, and in the efficiency with which point mutations can be identified. Efforts like the cancer genome sequencing project should accelerate methodological advances so that the goal of defining the full complement of genetic and epigenetic lesions within each of the different subtypes of human cancer can be achieved.
Department of Pathology, St. Jude Children’s Research Hospital, Memphis TN, 38105
This work was supported in part by National Cancer Institute grants P01 CA71907-09, CA-21765, and by the American Lebanese and Syrian Associated Charities (ALSAC) of St. Jude Children’s Research Hospital.