Abstract
Several studies indicate that the development of chronic lymphocytic leukemia (CLL) may be influenced by antigen recognition through the clonotypic B-cell receptors (BCRs). However, it is still unclear whether antigen involvement is restricted to the malignant transformation phase or whether the putative antigen(s) may continuously trigger the CLL clone and affect not only the progenitor cell but also the leukemic cells themselves. To address this issue, we conducted a large-scale subcloning study of rearranged immunoglobulin heavy variable (IGHV) genes of diverse mutational status from 71 CLL cases (total, 1496 subcloned sequences), belonging to both the common IgM/IgD variant and the rare IgG-positive variant. Although most cases showed no or low levels of intraclonal diversification (ID), we report intense ID in the IGHV genes of selected cases, especially a subgroup of 13 IgG-switched cases expressing stereotyped, mutated IGHV4-34 rearrangements (subset 4). We demonstrate that the ID evident in subset 4 cases cannot be attributed to IGHV4-34 usage, IGHV gene-mutated status, class-switch recombination, or BCR stereotypy in general; rather, it represents a unique phenomenon strongly correlated with the distinctive BCR of subset 4. In such cases, the observed ID patterns may imply a stereotyped response to an active, ongoing interaction with antigen(s).
Introduction
Over the last decade, immunophenotypical and gene expression profiling studies have provided conclusive evidence that chronic lymphocytic leukemia (CLL) malignant B cells are antigen-experienced.1-3 A strong sequence-based argumentation supporting the role of antigen in CLL was the finding that somatic hypermutations (SHMs) are present in the immunoglobulin heavy variable (IGHV) genes expressed by the leukemic B cells in more than 50% of CLL cases.4,5 The presence or absence of mutations was shown to have important clinical implications and is now widely used to segregate CLL cases into 2 clinical entities displaying significantly different outcomes.6,7
Further evidence for the role of antigen in leukemogenesis is the remarkably restricted IGHV gene repertoire of CLL, with just a handful of genes predominating (eg, IGHV1-69, IGHV4-34, IGHV3-7, IGHV3-23, and IGHV3-21).5,8,9 From a sequence perspective, however, the most striking argument for the active role of antigen in CLL is the existence of subsets of geographically distant and unrelated cases with quasi-identical, “stereotyped” B cell receptors (BCRs).10-18 Interestingly, cases expressing certain stereotyped BCRs may also share other biologic and clinical features. For instance, the IGHV3-21/IGLV3-21 subset (known as subset 2)17 has been associated with a poor prognosis irrespective of IGHV gene mutational status.16,17,19 In contrast, the IGHV4-34/IGKV2-30 subset (known as subset 4)17 appears to be associated with an indolent course of the disease.17 Furthermore, cases belonging to this subset are relatively young17 and uniformly express IgG-switched IGs.17,18
We recently demonstrated the existence of more than 100 different subsets with stereotyped heavy-chain complementarity-determining region 3 (HCDR3) sequences in a series of 1939 patients with CLL.18 A comprehensive analysis of SHMs in the same series revealed “CLL-biased” SHM patterns typified by recurrent hypermutations throughout the entire IGHV region in subsets of cases with stereotyped HCDR3s, in particular those expressing stereotyped IGHV3-21/IGLV3-21 IGs (subset 2) and IGHV4-34/IGKV2-30 IGs (subset 4).18
Although the study of IG genes in CLL has strongly implicated an antigen-driven model of CLL development, an as yet unanswered question concerns the B-cell developmental stage when leukemic transformation occurred.20,21 Furthermore, it is still unclear whether antigen involvement is restricted to the malignant transformation phase or whether the putative antigen(s) may continuously trigger the CLL clone and affect not only the progenitor cell but also the leukemic cells themselves.20,21
Evidence from several other types of B-cell lymphoproliferative disorders suggests that valuable insight into these issues may be gleaned from the study of intraclonal diversification (ID) within IG genes through ongoing mutational activity. Evidence of ID pointing to ongoing SHM has been observed in several types of B-cell malignancies,22-28 thereby indicating that the neoplastic cells interact continuously with their cognate antigen(s). On the contrary, multiple myeloma is conspicuous for a complete lack of ID within IG genes, supporting the notion that malignant transformation probably occurs at a post–germinal center (GC) stage of B-cell development.29,30
The available data on ID within IG genes in CLL are limited and conflicting. Several studies demonstrate some level of ID (ranging from limited to pronounced) leading to clonal evolution,31-36 with only one study reporting a complete lack of ID.37 We conducted the present study to address these discrepancies and systematically explore the issue as to whether CLL cells continue to acquire hypermutations after leukemic transformation. To this end, we followed a very stringent methodology and conducted a comprehensive analysis of ID in IGHV genes of diverse mutational status from 71 patients with CLL, belonging to the common IgM/IgD variant and the rare IgG-positive variant. Although the majority of cases showed no or low levels of ID, an intense and functionally driven ID process was evident in selected cases, especially those belonging to subset 4, who express highly distinctive, stereotyped IGHV4-34/IGKV2-30 BCRs.17,18 In such cases, the observed ID patterns among sets of subcloned IG sequences attest to the very precise targeting of the SHM process and may be considered as evidence for a “stereotyped response” to an active, ongoing interaction with antigen(s).
Methods
Patient group
Seventy-one patients with CLL from collaborating institutions in Scandinavia and Greece were included in the study. The main criteria for case selection were IGHV gene usage and mutational status. Thus, our study is intentionally biased for cases using the IGHV3-21 and IGHV4-34 genes (25 of 71 and 28 of 71 cases, respectively). All cases were immunophenotyped as described previously10,15,17,18 and met the recently revised diagnostic criteria of the National Cancer Institute Working Group.38 Written informed consent was obtained according to the Declaration of Helsinki, and the study was approved by the local ethics review committee of each institution.
Polymerase chain reaction amplification of rearranged IGH genes
Polymerase chain reaction (PCR) amplification was performed on either genomic DNA (gDNA) or complementary DNA (cDNA), extracted mainly from blood and bone marrow (63 cases), but also from spleen and lymph nodes in 1 and 7 cases, respectively. Reverse transcriptase (RT)–PCR or gDNA-PCR amplification of IGHV-IGHD-IGHJ rearrangements was performed using consensus primers for the HFR1 and IGHJ genes or the appropriate sense IGHV leader primer and the antisense IGHC primer (supplemental Table 1, available on the Blood website; see the Supplemental Materials link at the top of the online article), as previously described.10,15,17,18 All amplification reactions were run using the high-fidelity Accuprime Pfx polymerase (Invitrogen). Purified PCR amplicons were subjected to direct sequencing, and data obtained were analyzed using the ImMunoGeneTics (IMGT) database and tools.39,40
Subcloning
PCR amplification products were gel-purified with the QIAGEN DNA purification columns (QIAGEN), ligated into the pCR2.1 vector (Invitrogen), and subsequently transformed into Escherichia coli/TOP10F′ competent bacteria (Invitrogen). A range of 14 to 35 colonies per case (median, 21 colonies) were chosen randomly and sequenced using the −20 universal primer or M13 primers.
Sequence data analysis
The sequences obtained from subcloning were analyzed with the IMGT V-QUEST40 and ClustalW/EMBL41 tools. To avoid misidentification of mutations when HFR1 consensus primers were used in the amplification reactions, nucleotide changes in the obtained sequences were evaluated from codon 24 in IMGT-HFR1 to the end of HCDR3. For the subgroup of cases amplified using leader primers and a constant region primer, the entire V region was evaluated (from IMGT HFR1 codon 1 down to the end of HFR4). The following information was extracted: (1) IGHV gene usage, percentage of identity to germline, and HCDR3 length; (2) SHM characteristics: each nucleotide mutation in every sequence was recorded, as was the change or preservation of the corresponding amino acid (AA), identified as replacement (R) or silent (S), respectively. AA changes were characterized as “conservative” or “nonconservative” following standardized biochemical criteria, as previously described.42 To account for the fact that a mutation is more likely to occur in an HFR than an HCDR simply because of its greater length, each mutation was “weighted,” or normalized, as recently reported by our group18 ; (3) hotspot targeting: mutated sequences were also analyzed for targeting to the tetranucleotide (4-NTP) motifs RGYW/WRCY (R = A/G, Y = C/T, and W = A/T) and DGYW/WRCH (D = A/G/T, H = T/C/A) as well as the dinucleotide (2-NTP) motifs WA and TW.43,44
Definitions
Intraclonal heterogeneity in sets of subcloned sequences obtained from the same sample was assessed by examination of the sequence variation in the V domain. All “nonubiquitous” sequence changes from the germline were evaluated and further characterized as follows: (1) unconfirmed mutation (UCM), a mutation observed in only one subcloned sequence from the same specimen (“unique”); and (2) confirmed mutation (CM), a mutation observed more than once among subcloned sequences from the same specimen (“partially shared”). AA changes resulting from UCMs or CMs are designated by the abbreviations UAA or CAA, respectively.
To compare mutation counts between the different rearrangements included in the analysis, mutations were normalized to both the nucleotide length and the number of subcloned sequences for each rearrangement. Thus, the normalized mutation frequencies (NMFs) were calculated according to the following formula: Σ (CM + UCM)/number of subcloned sequences × sequence length; for instance, if 9 mutations (confirmed and/or unconfirmed) were identified in a set of 20 subcloned sequences of a certain rearrangement over a region of 300 nucleotides, the NMF for this rearrangement would be: 9/(20 × 300) = 1.5 × 10−3.
Evolutionary history of sets of subcloned sequences
The evolutionary history of the sets of subcloned sequences was inferred using the maximum parsimony method.45 The maximum parsimony trees were obtained using the close-neighbor-interchange algorithm with search level 2 in which the initial trees were obtained with the random addition of sequences (10 replicates).45 Branch lengths were calculated using the average pathway method45 and are in the units of the number of changes over the whole sequence. They are shown next to the branches. The tree is drawn to scale. All positions containing gaps and missing data were eliminated from the dataset (Complete Deletion option). Phylogenetic analyses were conducted in MEGA4.45
Statistical analysis
Descriptive statistics for discrete parameters included counts and frequency distributions. For quantitative variables, statistical measures included means, medians, SD, and ranges. Significance of bivariate relationships between factors was assessed with χ2 and Fisher exact tests. For all comparisons, a significance level of P = .05 was set, and all statistical analyses were performed with the Statistical Package SPSS, Version 12.0 (SPSS Inc).
Results
IGHV repertoire and mutational status
Productive IGHV-IGHD-IGHJ rearrangements from 71 CLL cases were included in this analysis. Detailed information on IGHV, IGHD, and IGHJ gene repertoires is provided in supplemental Table 1. Overall, 7 different IGHV genes were used: IGHV1-69, 6 of 71 cases; IGHV3-21, 25 of 71 cases; IGHV3-23, 5 of 71 cases; IGHV3-30, one case; IGHV3-7, 2 of 71 cases; IGHV4-34, 28 of 71 cases; and IGHV4-39, 4 of 71 cases. Sequences were categorized according to IGHV gene mutational status following definitions recently reported by our group18 : (1) “truly unmutated” (100% germline identity): 16 of 71 sequences (22.5%); (2) “minimally mutated” (99.0%-99.9% identity): 6 of 71 sequences (8.5%); (3) “borderline mutated” (98.0%-98.9% identity): 5 of 71 sequences (7%); and (4) “mutated” (< 98% identity): 44 of 71 sequences (62%).
Forty-six of 71 cases expressed stereotyped BCRs and were assigned to 11 different subsets, as previously described (supplemental Table 1). More specifically, 16 cases expressed IGHV3-21/IGLV3-21 BCRs (subset 2),17,18 13 of 71 cases expressed IGHV4-34/IGKV2-30 BCRs (subset 4),17,18 4 of 71 cases expressed IGHV4-39/IGKV1(D)-39 BCRs (subset 8),17,18 whereas 5 of 71 cases expressed IGHV4-34/IGKV3-20 (subset 16).17,18 Notably, subset 4 and subset 16 cases were uniformly mutated, whereas subset 2 cases exhibited significant heterogeneity with regard to mutational load (supplemental Table 1); finally, all subset 8 cases had 100% germline identity. The heavy chain isotype was the same among members of a subset (for cases with available data). All subsets expressed IgMD, except for subsets 4 (IGHV4-34 gene), 8 (IGHV4-39 gene), and 16 (IGHV4-34 gene), which included IgG-expressing cases, as previously reported.13,17,18
Intraclonal diversification analysis at the nucleotide level
Following the definitions detailed in “Definitions,” cases were allocated to one of 3 categories: (1) no intraclonal diversity: 20 cases with identical sets of subcloned sequences; (2) unconfirmed intraclonal diversity (UID): 23 cases exhibiting only UCMs in certain positions of the V domain; and (3) confirmed intraclonal diversity (CID): 28 cases exhibiting at least one CM among subcloned sequences (supplemental Table 2). Based on direct sequencing analysis, 23 of 28 cases (82%) carrying CMs belonged to the “mutated” category (< 98% germline identity).
Interesting biases with regard to the IGHV gene repertoire were observed among the 3 categories defined by ID status. In particular, 20 of 28 (71.4%) cases exhibiting CID concerned IGHV4-34–expressing cases, whereas the corresponding percentages for IGHV4-34 usage by cases belonging to the UID category and non-ID category were 26% (6 of 23) and 10% (2 of 20), respectively (χ2 test: P < .001). Notably, 12 of 16 truly unmutated rearrangements included in the present study were found to exhibit some degree of ID (CID, 4 of 16 cases [25%]; UID, 8 of 16 cases [50%]).
Among 28 cases exhibiting CMs (supplemental Table 2), 13 (46.4%) belonged to subset 4 with stereotyped IGHV4-34/IGKV2-30 BCRs (Table 1). Furthermore, a much more pronounced impact of ID was noted among subset 4 rearrangements versus all other rearrangements included in the analysis, as evidenced by significantly higher NMFs (t test: P < .002), regardless of IGHV gene usage and mutation status, BCR stereotypy, or heavy chain isotype (Figure 1; supplemental Table 2). Interestingly, the high number of CMs (and UCMs) observed in several subset 4 cases (P0103, P1422, P2451, P2920, P3551, and Swe56) could be attributed to the existence of distinct “clusters” of subcloned sequences with “cluster-specific” mutational profiles. All such clusters exhibited certain shared mutations and closely similar, if not identical, HCDR3s (including identical HCDR3 length), pointing to the emergence from a common ancestor (Figure 2; supplemental Figure 1).
Case no. . | Identity, % . | CM . | UCM . | NMF . | C-AA . | U-AA . |
---|---|---|---|---|---|---|
Swe56 | 95.1 | 27 | 11 | 3.05 × 10−3 | 9 | 7 |
Swe181 | 94.3 | 4 | 1 | 0.94 × 10−3 | 1 | 0 |
Swe193 | 93.1 | 6 | 4 | 1.65 × 10−3 | 3 | 4 |
P0103 | 95.9 | 32 | 3 | 2.89 × 10−3 | 11 | 2 |
P0907 | 93.2 | 6 | 15 | 1.68 × 10−3 | 3 | 10 |
P1422 | 91.9 | 52 | 12 | 6.27 × 10−3 | 22 | 7 |
P1939 | 94.5 | 3 | 9 | 0.91 × 10−3 | 1 | 5 |
P2451 | 91.7 | 39 | 27 | 4.99 × 10−3 | 14 | 15 |
P2920 | 93.5 | 24 | 19 | 4.38 × 10−3 | 16 | 14 |
P3020 | 90 | 1 | 4 | 0.83 × 10−3 | 1 | 4 |
P3551 | 93.3 | 52 | 3 | 7.28 × 10−3 | 25 | 3 |
P3916 | 91.2 | 3 | 3 | 0.93 × 10−3 | 1 | 1 |
P6520 | 94.4 | 1 | 8 | 1.08 × 10−3 | 1 | 6 |
Case no. . | Identity, % . | CM . | UCM . | NMF . | C-AA . | U-AA . |
---|---|---|---|---|---|---|
Swe56 | 95.1 | 27 | 11 | 3.05 × 10−3 | 9 | 7 |
Swe181 | 94.3 | 4 | 1 | 0.94 × 10−3 | 1 | 0 |
Swe193 | 93.1 | 6 | 4 | 1.65 × 10−3 | 3 | 4 |
P0103 | 95.9 | 32 | 3 | 2.89 × 10−3 | 11 | 2 |
P0907 | 93.2 | 6 | 15 | 1.68 × 10−3 | 3 | 10 |
P1422 | 91.9 | 52 | 12 | 6.27 × 10−3 | 22 | 7 |
P1939 | 94.5 | 3 | 9 | 0.91 × 10−3 | 1 | 5 |
P2451 | 91.7 | 39 | 27 | 4.99 × 10−3 | 14 | 15 |
P2920 | 93.5 | 24 | 19 | 4.38 × 10−3 | 16 | 14 |
P3020 | 90 | 1 | 4 | 0.83 × 10−3 | 1 | 4 |
P3551 | 93.3 | 52 | 3 | 7.28 × 10−3 | 25 | 3 |
P3916 | 91.2 | 3 | 3 | 0.93 × 10−3 | 1 | 1 |
P6520 | 94.4 | 1 | 8 | 1.08 × 10−3 | 1 | 6 |
All cases shown had confirmed intraclonal diversification (CID)
CM indicates confirmed mutation; UCM, unconfirmed mutation; NMF, normalized mutation frequency; C-AA, confirmed amino acid change; and U-AA, unconfirmed amino acid change.
With the exception of 2 confirmed 3-bp deletions, all observed CMs and UCMs concerned single-base changes and resulted in both silent (S) and replacement (R) mutations. Overall, 212 silent CMs/UCMs were identified, compared with 288 replacement CMs/UCMs. Nucleotide substitution analysis revealed that transitions predominated over transversions both for CMs and UCMs (175 vs 123 and 158 vs 44, respectively), in keeping with a canonical SHM process.46 The analysis for targeting of SHM hotspots over HFR1-3 revealed that only 141 of 345 (41%) mutations occurred within a hotspot motif.
Comparison of subset 4 IGHV4-34 sequences to all other sequences revealed significant differences with regard to the distribution of CMs/UCMs. The most striking difference concerned the fact that IGHV4-34 rearrangements of subset 4 were highly targeted for UCMs/CMs within HCDR3. Furthermore, a general shift of mutations to the 3′ part of the V-D-J region, in particular HCDR3-HFR4, was noted among subset 4 rearrangements (supplemental Figure 2).
Intraclonal diversification of IGHV-D-J rearrangements at the AA level
A total of 273 novel AA changes and 2 single-AA deletions were identified in 47 of 71 evaluated cases (66.2%). Following definitions presented in “Definitions,” 134 of 273 AA changes (49%) were considered as unconfirmed (UAA), whereas the remainder (139 of 273; 51%) were detected in at least 2 subcloned sequences from the same patient and, therefore, were considered as confirmed (CAA; supplemental Table 2). In addition, 2 cases using the IGHV3-21 gene were observed to carry a confirmed serine deletion at IMGT/HCDR2 codon 59 (supplemental Table 3).
CAA changes were detected across the entire IGHV gene among 27 cases using the IGHV3-21, IGHV3-23, IGHV4-34, or IGHV4-39 genes (supplemental Tables 3-6); 13 of these cases belonged to subset 4 with stereotyped IGHV4-34/IGKV2-30 BCRs. Intriguingly, analysis of the distribution of the so-called UAA changes provided further evidence for the very precise targeting of mutations introduced as part of the ID process. Thus: (1) 26 UAA changes were identified at the same codon among cases using the same IGHV gene (“confirmed by another case”); and (2) some AA changes identified in single subcloned sequences from one case were shared by all subcloned sequences of another case using the same IGHV gene, especially when expressing homologous, “stereotyped” HCDR3s (Figure 3).
AA changes in subcloned IGHV-D-J sequences of subset 4 cases
The frequency of AA changes among subcloned sequences was significantly higher among subset 4 cases than all other cases analyzed (supplemental Tables 2-6; supplemental Figure 3). Of note, however, several changes identified in sets of subcloned sequences from subset 4 cases were rather conservative in terms of AA physicochemical properties (Figure 4). Novel changes were often “stereotyped,” in the sense that different cases could share recurrent AA changes among subcloned sequences: therefore, the “options” for certain positions were remarkably restricted (Figure 4). A comprehensive list of AA changes in sets of subcloned sequences from IGHV4-34 rearrangements of subset 4 cases is given in supplemental Table 4.
A remarkable impact of ID was evident in the HCDR3 sequences of subset 4 cases, with 11 of the 20 HCDR3 codons showing CAA changes. Importantly, for several cases, the AA diversity observed intraclonally was found to reflect the overall heterogeneity observed at subset level. In particular, AAs appearing in some subcloned sequences of one case represented the predominant residues at the same position in another case (Figure 4; supplemental Table 4). For instance, the only permitted options among subcloned sequences at position 105 (following the IMGT numbering for the V domain) were A or V, with “intra-subset” variation also reflected at intraclonal level. An unprecedented high level of ID was observed in the HFR4 of subset 4 rearrangements, with 3 HFR4 positions (codons 120, 123, and 125) identified as ID “hotspots” (supplemental Table 4).
Taking into account the high number of CMs and UCMs among subset 4 IGHV4-34 rearrangements, it is remarkable that certain critical positions of the VH domain were essentially left unaltered by the ID process. For instance, the W residue at codon 7 in the HFR1 of IGHV4-34, which plays a critical role in the recognition of the I/i NAL epitope,47 was preserved in 322 of 326 subcloned sequences from 13 evaluated subset 4 cases. A similar case of sequence preservation was also observed in HCDR1. As previously shown by our group,18 2 glycine (G) residues at codons 28 and 36 of the HCDR1 of subset 4 cases were very frequently targeted for “subset-biased” substitution by acidic residues (aspartate, D, or glutamate, E). This hallmark of SHM activity for subset 4 cases was not affected by ongoing mutational activity, as all observed mutations at codon 28 involved only G-D or G-E substitutions. Indeed, the only CAA change at this position concerned the introduction of D or E residues and, in one case, substitution of D for E (supplemental Table 4). Finally, the distinctive and “subset-biased” couplet of RR or KR dipeptides at the IGHD-IGHJ junction of subset 4 IGHV4-34 rearrangements17,18 was not targeted by SHM, and the only sign of mutational activity was an unconfirmed R-K replacement (Figure 4).
Discussion
Molecular analysis of the BCR in CLL has revealed biases in IG gene repertoire suggesting a potential role for antigenic stimulation in CLL ontogeny.8,9,21 This notion has been further supported by the discovery of subsets of CLL cases carrying stereotyped BCRs.10-18 As we have recently shown, a recurrent theme in subsets carrying mutated stereotyped BCRs is the presence of stereotyped, “CLL-” and “subset-biased” mutation patterns, alluding to a very precise, functionally driven response to a restricted set of antigenic stimuli.18,48 Therefore, it can be concluded that SHM studies help to better understand and delineate critical issues in the history and biology of clonal malignant CLL cells. That notwithstanding, several important issues remain unresolved, especially with regard to the location, duration, and nature of the interactions between the selecting antigens and the CLL precursors as well as the differentiation status of the precursor cells themselves.
As previously shown for other B-cell malignancies, the study of ID of the clonotypic IG genes may provide important information that helps to address the aforementioned issues. More specifically, in some entities, such as follicular lymphoma, sporadic Burkitt lymphoma, splenic marginal zone lymphoma, and mucosa-associated lymphoid tissue lymphoma, the analysis of SHM reveals ID, indicating that the neoplastic cells further diversify their IG genes through ongoing mutational activity,22-28 either through conventional SHM in GCs or unconventional SHM outside GCs. On the contrary, molecular analyses of IG genes in multiple myeloma showed intraclonal homogeneity suggesting that the myeloma clone derived from a post-GC memory B cell.29,30
The studies devoted to clarifying the possible occurrence and hence implications of ID in CLL have reached conflicting results.31-37 These discrepancies may be attributed, at least in part, to differences in the sensitivity of various methods used for detection of mutations,31-37 usage of low-fidelity Taq polymerase,31-37 number of analyzed subcloned sequences,34 sample size, or case selection. For instance, most studies analyzed relatively few patients31-33,35,37 or even a single case.36 Furthermore, an underrepresentation of certain CLL subsets with distinctive SHM patterns, namely, cases using the IGHV3-21 and IGHV4-34 genes (especially in subsets 2 or 4), was observed throughout all previous studies. Finally, the lack of a standard criterion for ID evaluation may also lead to ambiguity among previously reported data.
Taking into consideration the aforementioned discrepancies, we conducted the present large-scale subcloning study following a strict methodologic approach to readdress ID in CLL. Our study included 71 carefully selected CLL cases, thereby enabling us to explore the presence of ID not only at cohort level, but also in subgroups defined by BCR stereotypy, IGHV gene mutational status, and heavy chain isotype. In addition, a high-fidelity polymerase was used for all analyses, thus minimizing the incidence of errors. Finally, a large number of subcloned sequences per case (median, 21; range, 14-35) were evaluated.
At cohort level, 28% (20 of 71) of cases carried sets of identical subcloned sequences. An additional 32% (23 of 71) of cases were characterized by the presence of mutations in single subcloned sequences and, following the strict definitions adopted in this study, could not be formally assigned to the intraclonally diversified category. What is important to mention, however, is that almost 40% (28 of 71) of cases carried intraclonally diversified IGHV-IGHD-IGHJ genes with CM among subclones. CMs were identified in cases assigned to all mutational categories, even in the “truly unmutated” category (ie, 100% identity to the germline). Altogether, our results strongly support the notion that the SHM mechanism remains operative in CLL and suggest that antigen stimulation may be a promoting factor not only in the development of CLL clones but also in their evolution. However, the fact that ID was generally insubstantial in leukemic clones with unmutated IGHV genes that usually follow more aggressive clinical courses may be considered as evidence that their expansion is antigen-independent or, more probably, that their unmutated status confers a clonal advantage by permitting very efficient stimulation by a variety of self-antigens that would be compromised or even abrogated on alteration of the original BCR structure.
The molecular characteristics of mutations occurring in the context of ID are compatible with the idea that the underlying causal mechanisms may generally obey the same rules as the canonical SHM.46,49 More specifically, transitions were more common than transversions, in keeping with previous reports.46,49 However, the observed distribution patterns of CMs/UCMs differed from those reported as typical for mutations introduced by canonical SHM, leading to R/S ratios not characteristic of selection by conventional, T-dependent antigens.50
The breakdown of the group of cases exhibiting ID revealed a striking bias for usage of the IGHV4-34 gene, especially in stereotyped IGHV4-34/IGKV2-30 BCRs of subset 4.18 It is also worth underscoring the fact that the majority of mutations introduced within the context of the ID process were identified in subcloned sequences of IGHV4-34 rearrangements from subset 4 cases. Therefore, one could reasonably argue that in discussing the molecular characteristics of ID in CLL we are essentially describing the operation of ongoing mutational activity in subset 4.
In keeping with our previous reports,17,18 subset 4 cases included in the present study had a young median age at diagnosis (51 years; range, 37-69 years) and were diagnosed in early clinical stages; they were uniformly CD38-negative and IgG-switched; finally, 5 of 11 cases with available data carried del(13q) as the sole cytogenetic abnormality (supplemental Table 7). As recently shown by our group,18 at a molecular level the distinctive features of this subset extend to SHM patterns in the clonotypic IG genes, including: (1) “noncanonical” distribution of R mutations, with low R/S mutation ratios in HCDRs (especially HCDR2); (2) complete lack of mutations, leading to preservation of the germline conformation, at the IGHV4-34-specific motif, which mediates superantigenic-like interactions with self-elements and exogenous elements47 ; and (3) recurrent, “stereotyped” hypermutations.18 On these grounds, we proposed that the SHM characteristics in subset 4 are indicative of a particular mode of interaction with and selection by distinct (super)antigenic elements.18
The results reported here reveal that the “subset-biased” impact of SHM extends to very distinctive patterns of ID. In particular, certain residues of the stereotyped IGHV4-34 rearrangements of subset 4 cases were identified as “hotspots” for ID; these “hotspots” were scattered throughout the V domain, including, somewhat unexpectedly, HFR4. Interestingly, the observed AA changes at these “hotspots” were generally restricted (“stereotyped”) and, with a few notable exceptions, conservative. This suggests that the permissible “options” at these positions are also restricted, perhaps by strong functional constraints for preservation of critical physicochemical properties.
From a different perspective, further evidence that SHM in subset 4 is functionally driven and also very precisely targeted is offered by our finding that certain positions across the entire VH domain remained essentially unaltered, despite intense ID in nearby positions. Perhaps the most illustrative example is provided by the couplet of basic AAs (RR, KR) located at the IGHD-IGHJ junctions of subset 4 IGHV4-34 rearrangements,18 which endow the corresponding HCDR3s with a positive charge. Prompted by the analogy to HCDR3 sequence motifs of IGHV4-34 antibodies against both apoptotic cells51,52 and DNA,53-55 we have previously postulated that the progenitors of CLL cases assigned to subset 4 may have originated as cells with reactivity against DNA or apoptotic bodies.17,18 Along these lines, it is worth mentioning that the single mutation observed at this dipeptide among 326 subcloned sequences from subset 4 rearrangements concerned a substitution of K for R. This cannot be attributed to an overall lack of ID within HCDR3, given that other HCDR3 positions were frequently targeted for CMs or UCMs.
Interestingly, this single K-to-R mutation, though “unconfirmed,” following the definitions adopted in our study, was “nonrandom”: a simple inspection of the aligned subcloned sequences justifies considering this mutation as “confirmed by another case.” Based on the findings reported in “Intraclonal diversification of IGHV-D-J rearrangements at the AA level” and also depicted graphically in Figure 3, the same argument may hold true for several other mutations detected in single subcloned sequences. Therefore, although the practice of subdividing mutations into CM or UCM categories could be considered as a “safeguard” for reliability, finding mutations “unconfirmed at case level,” which are “confirmed at subset level” raises the intriguing possibility that at least some UCMs should perhaps be counted as signs of true ID.
The strikingly different impact of ID in subset 4 cases versus all other cases analyzed in this study could be thought to reflect an inherent mutability of the IGHV4-34 gene, or, alternatively, their mutated status or IgG-switched phenotype. However, on the evidence presented here, we argue that these findings cannot be attributed to IGHV4-34 usage, IGHV gene-mutated status, class-switch recombination, or BCR stereotypy in general. Indeed, this is evident by comparison of stereotyped subset 4 IGHV4-34 rearrangements to: (1) nonsubset 4 IGHV4-34 rearrangements of the common IgMD variant; (2) stereotyped IGHV4-34 rearrangements of IgG-switched cases in subset 1618 ; and (3) both mutated and unmutated rearrangements using other IGHV genes. Therefore, the ID patterns observed in subset 4 represent a unique phenomenon strongly correlated with the distinctive IGHV4-34/IGKV2-30 BCR archetype expressed by subset 4 cases and allude to “stereotyped” interactions with the cognate antigen(s) during both the preleukemic phase but also posttransformation.
The intense ID activity in subset 4 is also evidenced by the identification of distinct “clusters” of subcloned sequences with “cluster-specific” mutational profiles. Analysis of the SHM patterns in such clusters reveals their common ancestry yet indicates an early “branching” of the leukemic clone into distinct subclones, perhaps able to evolve along similar, although separate, pathways. Taking into account the polyreactivity of the clonotypic BCRs in CLL56-59 as well as our recent report that persistent activation by Epstein-Barr virus and cytomegalovirus may be specifically implicated in the history of subset 4 leukemic clones,60 one might consider this “branching” as evidence for special, selective pressures occurring in parallel in distinct subclones and thereby fine-tuning their BCR affinities.
In conclusion, our study convincingly demonstrates that the SHM mechanism may operate continuously in certain subsets of patients with CLL, especially patients expressing stereotyped IGHV4-34 rearrangements typical of subset 4. However, it is still difficult to reach definitive conclusions with regard to the duration of exposure to and stimulation by antigen as well as the functional impact of antigen on CLL evolution. That notwithstanding, the results reported here suggest a role for persistent antigenic stimulation rather than clonal expansion promoted by nonspecific stimuli, at least for subsets of CLL cases.
The online version of this article contains a data supplement.
The publication costs of this article were defrayed in part by page charge payment. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.
Acknowledgments
The authors thank Prof Marie-Paule Lefranc and Dr Veronique Giudicelli, Laboratoire d'Immunogenetique Moleculaire, Universite Montpellier II, Montpellier, France, for their enormous support and help with the large-scale immunoglobulin sequence analysis throughout this project; Jesper Jurlander, Karin Karlsson, Mats Merup, Lyda Osorio, Göran Roos, Christer Sundström, and Juhani Vilpo for providing samples and associated data concerning Swedish, Danish, and Finnish CLL patients; and Andreas Agathagelidis, Vasilis Bikos, Maria Gounari, Ulf Thunberg, and Gerard Tobin for the sequence analysis.
This work was supported by the Swedish Cancer Society, the Swedish Medical Research Council, the Medical Faculty of Uppsala University, Uppsala University Hospital, and the Lion's Cancer Research Foundation, Uppsala, Sweden; and the BioSapiens Network of Excellence (contract no. LSHG-CT-2003-503265) and the General Secretariat for Research and Technology of Greece (Program INA-GENOME). E.K. is a recipient of a fellowship from the Propondis Foundation, Athens, Greece.
Authorship
Contribution: L.-A.S. and E.K. performed research, analyzed data, and wrote the paper; A.H. and N.D. performed research; A.A. provided samples and associated data; A.T. supervised research; and R.R. and K.S. designed and supervised the research and wrote the paper.
Conflict-of-interest disclosure: The authors declare no competing financial interests.
Correspondence: Kostas Stamatopoulos, Hematology Department and HCT Unit, G. Papanicolaou Hospital, 57010 Thessaloniki, Greece; e-mail: stavstam@otenet.gr.
References
Author notes
L.-A.S. and E.K. contributed equally to this study as first authors.