Abstract
We analyzed the immunoglobulin (Ig) variable heavy (IGHV) and variable light chain genes used by leukemia cells of 258 unrelated patients with chronic lymphocytic leukemia (CLL) found to express unmutated Ig heavy chains (IgH) encoded by a 51p1 allele of IGHV1-69 among 1846 CLL patients examined. We found each had at least 98% homology to an identified germline IGKV or IGLV gene. Within the 258 IgH, we identified heavy chain CDR3 (HCDR3) motifs encoded by certain unmutated IGHD and IGHJ genes with restricted reading frames. Frequent and restricted use of particular IGKV and IGLV genes revealed nonstochastic pairing of disparate Ig light chains (IgL) with IgH that had restricted HCDR3 motifs designated CLL69A, -B, -C, and -D. Eighty-six percent (19/22) of CLL cases that expressed motif CLL69B encoded by IGHD2-2/IGHJ6 had distinctive IgL encoded by IGKV1-39. Similarly, 83% (5/6) of samples with motif CLL69D encoded by IGHD2-2/IGHJ6 expressed IGKV3-11, 100% (25/25) with motif CLL69A encoded by IGHD3-16/IGHJ3 used IGKV3-20, and 77% (10/13) with motif CLL69C encoded by IGHD3-3/IGHJ6 expressed IGLV3-9. This study reveals nonstochastic pairing of IgH with particular IgL that is predicated upon Ig HCDR3 structure, providing compelling evidence for selection of antibodies expressed in CLL by conventional antigens.
Introduction
Several studies have demonstrated that leukemic B cells of patients with chronic lymphocytic leukemia (CLL) express a restricted immunoglobulin (Ig) heavy chain repertoire and that expression of Ig variable heavy chain (IGHV) genes in CLL is not random.1-5 One particular IGHV gene, the 51p1 allele of IGHV1-69, generally is expressed with little or no somatic mutation in 10 to 20% of all cases of this disease.6 Furthermore, CLL B cells that express 51p1 have a preferential use of certain diversity (IGHD) and junctional (IGHJ) gene segments with restricted reading frames (RF) that encode relatively long third complementarity determining regions (CDR3), which have conserved amino acid motifs that are not characteristic of the CDR3s of the Ig expressed by nonneoplastic tonsilar or blood B cells that use 51p1.3,7,8 The CDR3 is typically the most variable region of the Ig heavy chain and is generally directly involved in binding to conventional antigens. As such, the expression of such conserved Ig with restricted CDR3 argues that the Ig repertoire expressed in this disease is highly selected, suggesting a potential role for antigen in the development and/or progression of CLL
More recently we identified 15 unrelated CLL cases that expressed nearly identical Ig, with Ig heavy and light chain variable regions encoded by 51p1 and IGKV3-20 (A27), respectively.9 The heavy and light chains of these 15 samples were virtually identical to SMI, a previously characterized CLL that expressed a polyreactive IgM/κ autoantibody with low-affinity-binding activity for a variety of self-antigens.10 The CDR3 regions of both the heavy and light chains together comprise the Ig binding site for antigen. The prevalent use of the 51p1 allele of IGHV1-69 in CLL with little or no somatic mutation allowed us to examine the effect of different Ig heavy chain CDR3 (HCDR3) motifs on Ig light chain pairing in CLL cases that otherwise expressed nearly identical Ig heavy chain variable regions. For this, we examined the IGKV and IGLV gene usage and CDR3 structure of 258 CLL cases that expressed unmutated Ig heavy chains encoded by 51p1 that were identified in a large cohort of 1846 nonselected CLL cases followed by the CLL Research Consortium (CRC).
Methods
Patient material
Blood was collected from consenting patients who satisfied the diagnostic and immunophenotypic criteria for B-cell CLL11 and who presented for evaluation at the referral centers of the CRC. Institutional review board approval from each participating institution and informed consent were obtained in all cases, in accordance with the Declaration of Helsinki. Peripheral blood mononuclear cells were isolated by density gradient centrifugation using Ficoll-Hypaque 1077 (Sigma-Aldrich, St Louis, MO), washed twice, and analyzed directly or suspended in fetal calf serum containing 10% dimethylsulfoxide (DMSO; Sigma-Aldrich) for storage in liquid nitrogen. All samples contained more than 90% CLL B cells as assessed by flow cytometric analyses, and the isotype of the expressed light chains was determined by flow cytometry, as described.9 The heavy and light chain sequences of samples CLL69A1 through CLL69A15 have been described previously and were designated CLL-A through CLL-O, respectively.9
IGHV, IGKV, and IGLV gene analyses
Total cellular RNA was isolated from 5 × 106 CLL B cells using RNeasy reagents (Qiagen, Valencia, CA), per the manufacturer's instructions. First-strand cDNA was synthesized from one third of the total purified RNA using an oligo-dT primer and Superscript II RT (Invitrogen, Carlsbad, CA). The remaining RNA was removed with RNaseH, and the cDNA was purified using QIAquik purification columns (Qiagen). Purified cDNA was poly-dG-tailed using dGTP and terminal deoxytransferase (Roche, Indianapolis, IN). The IGHV gene expressed by the CLL B cells was determined by reverse transcription-polymerase chain reaction (RT-PCR) enzyme-linked immunosorbent assay (ELISA) technique.12,13 The cDNA from each sample was amplified using IGHV, IGKV, or IGLV family-specific primers for the sense strand of the gene of interest and antisense IGHM, IGKC, or IGLC consensus primers, respectively (Table S1, available on the Blood website; see the Supplemental Materials link at the top of the online article). The PCR products were size selected by electrophoresis in 2% agarose containing 0.5 μg/mL of ethidium bromide (Invitrogen), and the expected products were excised and purified using QIAquik purification columns (Qiagen). Most PCR products were sequenced directly, although in several cases the amplified products were cloned into pGEM-T (Promega, Madison, WI) and analyzed, as described.14 Nucleic acid sequence analyses were conducted using the fluorescence-dideoxy-chain-termination method and an Applied Biosystems 377 automated nucleic acid sequence analyzer (ABI, Foster City, CA). Nucleotide sequences were analyzed using DNASTAR (DNASTAR, Madison, WI) and compared with the sequences deposited in the V BASE,15 ImMunoGeneTics (IMGT),16-18 and GenBank19 sequence databases. Somatic mutations were identified by comparison to the most homologous germline IGHV, IGKV, or IGLV gene. The percentage homology was calculated by counting the number of nucleotide differences between the 5′ end of framework 1 (FW1) and the 3′ end of FW3. IGHV, IGKV, and IGLV genes with at least 98% homology with the corresponding germline IGHV, IGKV, or IGLV sequence were considered unmutated. The 51p1-related alleles of IGHV1-69 are distinguished from the 1263-like alleles based on nonconservative differences in the CDR2 region20 and correspond to IGHV1-69*01, *06, and *12 in the IMGT database. The method of Corbett et al21 was used to assign IGHD genes of the longer gene families (IGHD2 and IGHD3), and 7 consecutive nucleotides were used for the shorter IGHD gene families. HCDR3 length was determined by the method of Kabat et al22 and defined by the number of amino acids between codon 94 at the end of FW3 and the conserved Trp of position 102 at the beginning of FW4. LCDR3 length was defined by the number of amino acids between codon 89 at the end of FW3 and the conserved Phe of position 97 at the beginning of FW4. Cluster analysis of all sequences was performed using MegAlign (DNASTAR). The criteria used to define subgroups with restricted HCDR3 motifs were as follows: (1) use of the same IGHD and IGHJ germline genes, (2) use of the same IGHD RF, and (3) amino acid identity within the HCDR3 of at least 70%. HCDR3 amino acid sequences are presented using single letter abbreviations, and italicized amino acids (X) indicate the presence of that amino acid in more than 67% of samples.
Results
We examined the Ig heavy chains expressed by the CLL B cells of 1846 unrelated patients who had a single functional IgH rearrangement, as assessed by RT-PCR ELISA and nucleotide sequencing. We found that 50% (923 of 1846) were unmutated, because they had at least 98% homology to a known germline IGHV gene. Of these, 258 were encoded by a 51p1 allele of IGHV1-69, representing 14% (258 of 1846) of all patient samples and 28% (258 of 923) of the samples that expressed unmutated IGHV. In accordance with our previous studies, the samples in this cohort had restricted use of IGHD genes, such as IGHD2-2, IGHD3-3, and IGHD3-16, which together with IGHJ6 encoded HCDR3s that had conserved molecular structures and a mean HCDR3 length of 19.6 codons (19.6 ± 3.1; range, 9-30 aa). Immunophenotypic analysis of these samples by flow cytometry revealed that 70% (182 of 258) and 30% (76 of 258) expressed κ and λ light chains, respectively, compared with 65% or 35% of the entire set of 1846 CLL samples expressed κ or λ light chains, respectively.
To analyze mutational status, IGKV and IGLV gene use, and molecular characteristics of the light chain CDR3 (LCDR3), we sequenced the light chains of all 258 samples. Each CLL B-cell sample was found to have a single productive IGK or IGL rearrangement and expressed a light chain that was not somatically mutated, because all were greater than 98% homologous to the most similar germline IGKV or IGLV gene.
Twenty-three functional IGKV genes were identified in the 182 IGKV-J rearrangements analyzed (Table 1). Of the 182 samples that expressed a κ light chain, 45 (25%) used the IGKV3-20 (A27) gene, 39 (22%) used IGKV1-33/1D-33 (O8/O18), 27 (15%) used IGKV1-39/1D-39 (O2/O12), 15 (8%) expressed IGKV3-15 (L2), 12 (7%) used IGKV1-5 (L12), and 8 (4%) used IGKV3-11 (L6). The IGKV genes IGKV2-28 (A19), IGKV2-30 (A17), and IGKV4-1 (B3) were each used by 4 (2%) samples, whereas IGKV1-12 (L5) and IGKV2-24 (A23) were each used 3 times (2%). IGKV1-6 (L11), IGKV1-9 (L8), IGKV1D-13 (L18), IGKV1-27 (A20), and IGKV2-29 (A18) were each used twice (1%), and IGKV1-8 (L9), IGKV1D-8 (L24), IGKV1D-12 (L19), IGKV1-16 (L1), IGKV1D-17 (L14), IGKV2D-29 (A2), and IGKV3D-15 (L16) were each used only once (1%). Six IGKV genes accounted for 80% (146 of 182) of the IGKV-J rearrangements. Further restriction was evident in the use distribution of IGKV genes from the proximal versus distal cluster of the IGK locus. Fourteen genes in the proximal cluster accounted for 57% (104 of 182) of the κ light chain rearrangements, whereas only 6 distal IGKV genes accounted for only 7 (4%) kappa light chain rearrangements. For 70 (38%) cases, the position of the IGKV gene could not be identified, because several pairs of genes in the proximal and distal cluster have identical coding regions and cannot be distinguished, including 2 of the more frequently used, IGKV1-33/1D-33 and IGKV1-39/1D-39. Several IGKV genes were expressed at different frequencies in this study compared with that observed in a recent study of light chain use in random CLL cases that used disparate IGHV genes by Stamatopoulos et al.23 Three IGKV genes, IGKV1-33, IGKV3-15, and IGKV3-20, were used in our selected set of 182 κ+ CLL cases at frequencies that appeared higher than that noted in the 179 κ+ CLL samples analyzed by Stamatopoulos et al23 or even in the 89 of these 179 cases that expressed unmutated Ig (Figure 1A). Conversely, 6 IGKV genes, IGKV1-5, IGKV1-8, IGKV2-28, IGKV2-30, IGKV3-11, and IGKV4-1, were each used less frequently in our set of 182 cases than in the 179 previously studied κ+ cases or even in the 89 of these 179 cases that expressed unmutated Ig.
Fifteen functional IGLV genes were identified in the 76 IGLV-J rearrangements analyzed (Table 2). Of the 76 CLL samples analyzed that expressed a λ light chain, 12 (16%) used IGLV3-9 (3j), 11 (15%) used IGLV3-21 (3h), and 8 (11%) each used IGLV1-51 (1b) and IGLV3-25 (3m). The IGLV genes IGLV1-47 (1g), IGLV3-1 (3r), and IGLV4-60 (4a) were each used by 6 (8%) samples, IGLV1-44 (1c) was found in 5 (7%) samples, and IGLV3-10 (3p) was used by 4 (5%) samples. The IGLV2-24 (2a2) gene was expressed by 3 samples (4%), IGLV1-40 (1e) and IGLV2-11 (2e) were each used twice (3%), and IGLV2-8 (2c), IGLV2-23 (2b2), and IGLV4-69 (4b) each were found once (1%). Four IGLV genes accounted for 51% (39 of 76) of the IGLV-J rearrangements. Of note, 5 IGLV genes (IGLV1-47, IGLV3-9, IGLV3-10, IGLV3-25, and IGLV4-60) were each expressed in this selected cohort of 76 λ+ CLL cases at a greater frequency than that noted in the 97 λ+ CLL samples analyzed by Stamatopoulos et al23 or even in the 53 of these 97 cases that expressed unmutated Ig (Figure 1B). Conversely, 3 other IGLV genes (IGLV1-40, IGLV2-8, and IGLV3-21) were each used in our cohort of cases at a frequency lower than that noted in these 97 previously studied λ+ cases or even in the 53 of these 97 cases that used unmutated Ig (Figure 1B).
We analyzed the light chain CDR3 (LCDR3) of all 258 samples. The mean LCDR3 length for the 182 Ig κ light chains analyzed was 9.2 codons (9.2 ± 1.0; range, 5-12 aa), and the average LCDR3 length for the 76 Ig λ light chains was 10.3 codons (10.3 ± 1.1; range, 8-13 aa). In the samples that expressed κ light chains, IGKJ4 was the most prevalent, being used in 26% (48) of the 182 κ+ cases. IGKJ1 and IGKJ2 were the next most common, each being used by approximately 24% (45 and 43, respectively) of the κ+ cases. IGKJ3 and IGKJ5 were each used by 13% (23) and 11% (19) of such cases. The identical IGLJ2/3 genes were used by 66% (50 of 76) of the expressed λ light chains. IGLJ1 was used by 33% (25) of λ+ cases, and IGLJ7 was found only once (1%).
The frequent and restricted use of certain IGKV and IGKJ, or IGLV and IGLJ, resulted in similar or nearly identical LCDR3 for many of the sequences analyzed in our cohort. As such, we analyzed the amino acid sequence of the LCDR3 and correlated each with the HCDR3 structure of its corresponding Ig heavy chain. Cluster analysis of the amino acid sequences of the HCDR3 and LCDR3 of all 258 samples identified 4 subgroups of samples, designated CLL69A, -B, -C, and -D, that expressed highly restricted and nearly identical Ig heavy and light chains (Table 3). Of the 27 CLL samples that expressed κ light chains encoded by IGKV1-39, 19 (70%) also expressed heavy chains with HCDR3 encoded by IGHD2-2 and IGHJ6 (Table 3, CLL69B). Five of the 8 (63%) CLL samples that expressed κ light chains encoded by IGKV3-11 also expressed heavy chains with HCDR3 encoded by IGHD2-2 and IGHJ6 (CLL69D). In addition, 10 of the 12 (83%) samples that expressed IGLV3-9 (CLL69C) coexpressed 51p1-encoded heavy chains with HCDR3 encoded by IGHD3-3 and IGHJ6. Finally, 25 of the 45 (56%) samples that expressed IGKV3-20 genes had a HCDR3 encoded by IGHD3-16 and IGHJ3 (CLL69A). The difference in the frequency of expression of IGKV1-39, IGKV3-11, IGLV3-9, and IGKV3-20 with 51p1-encoded heavy chains that contain the respective associated HCDR3 motif, compared with 51p1-encoded heavy chains that do not contain the HCDR3 motif, is statistically significant for all 4 HCDR3 subgroups and associated light chains (P < .001, Fisher exact test; Table 3).
The restricted use of the genes in each group lead to conserved amino acid motifs that could be seen in the CDR3 of both the light and heavy chains. Each of the 19 samples that had light chains encoded by IGKV1-39 and heavy chains encoded by IGHD2-2 and IGHJ6 had the HCDR3 amino acid motif PDIVVVPAAIXYYYGMDV encoded by the third reading frame (RF3) of IGHD2-2 (Figure 2A, CLL69B). For the HCDR3 of each sample, 6 to 9 amino acids were encoded by IGHD2-2 and 6 to 8 codons were encoded by IGHJ6. The remaining amino acids were encoded by nontemplated nucleotides (N segments) generated during the process of Ig rearrangement. All samples had 3 amino acids at the IGHV-D junction. The first 2 were not highly conserved and were probably encoded by N segments, although 11 of the samples had either Glu or Asp in the first codon, which may be the result of a single N segment added to the last 2 nucleotides, “GA,” at the end of FW3. Remarkably, 17 of the 19 samples had a Pro in the third position, probably the result of insertion of cytidine by terminal deoxynucleotidyl transferase (TdT). N segments and exonuclease activity also probably were responsible for 1 or 2 amino acids at the IGHD-J junction that accounted for the primary differences between the 19 heavy chains. Of these 19 samples that had the common HCDR3 motif, 17 also had nearly identical light chains with a LCDR3 9 amino acid residues in length and the consensus motif QQSYSTPRT. The other 2 light chains had LCDR3 that were 8 and 10 amino acids in length but still virtually identical to the other LCDR3 in this group, because the first 7 codons for 15 of the 19 samples were encoded by IGKV1-39. N segment additions probably encoded 1 or 2 amino acids at the IGKV-J junction for 12 and 3 of the 19 sequences, respectively. Four rearrangements apparently lacked N segments, because all amino acids probably were encoded by IGKV or IGKJ.
Five of the 6 (83%) samples that had light chains encoded by IGKV3-11 and heavy chains encoded by IGHD2-2 and IGHJ6 also had nearly identical HCDR3 encoded by the third reading frame of IGHD2-2 and IGHJ6 (Figure 2B, CLL69D). All 5 of the heavy chains had virtually identical HCDR3 of 20 amino acids in length with the common sequence motif GGDIVVVPAAMXYYYGMDV. All 5 of these cases had virtually identical light chains with LCDR3 of 6 amino acids in length and the sequence motif QQRSNT. For the HCDR3 of each sample, 9 and 8 amino acids were encoded by IGHD2-2 and IGHJ6, respectively. One amino acid at the IGHD-J junction apparently resulted from exonuclease activity and nucleotide addition, except for CLL69D3, which appeared to be a direct juxtaposition of IGHD2-2 with IGHJ6. Each of the 5 samples had 2 amino acids at the IGHV-D junction that apparently resulted from N sequence insertions and exonuclease activity. Four of the 5 had the same 2 amino acids (Gly-Gly) in the first 2 positions of CDR3, compared with the fifth case that had Gly-Pro. In addition, exonuclease activity probably removed the last nucleotide of FW3, resulting in a change from Arg to Ser/Thr in 4 of the 5 samples. The HCDR3 motif of subgroup CLL69D was nearly identical to the HCDR3 of subgroup CLL69B, as both had HCDR3 encoded by RF3 of IGHD2-2 and IGHJ6. Although the samples from subgroups CLL69B and CLL69D had subtle differences at the IGHD-J junction, the presence of an additional Tyr at the IGHD-J junction in CLL69D and the 2 Gly residues at the IGHV-D junction in CLL69D, compared with 3 amino acids for CLL69B, apparently influenced what light chain was expressed by each of these samples. The 5 samples of CLL69D each expressed a light chain with a unique LCDR3, which apparently was the result of exonuclease activity that removed the last 2 or 3 codons of IGKV3-11. This created an LCDR3 of only 6 amino acids in length that was quite distinct from that used by the IGKV1-39-encoded light chains expressed by CLL69B.
Subgroup CLL69C contains 10 samples that expressed a light chain encoded by IGLV3-9 and HCDR3 encoded by IGHD3-3 and IGHJ6 (Figure 2C). All 10 samples had the 18-amino-acid HCDR3 motif YDFWSGYYPNYYYYGMDV. All HCDR3 had 8 amino acids encoded by IGHD3-3 and IGHJ6, except for CLL69C7. The IGHD-J junction of each HCDR3 contained 2 amino acids (Pro-Asn) that were remarkably conserved considering they apparently were the result of N segment insertion. The Pro was probably due to the addition of cytidine by TdT, whereas the conserved Asn appeared to result from addition of 2 adenosine to the sequence encoded by IGHJ6. Seven of these 10 samples had nearly identical LCDR3 of 9 amino acid residues with the motif QVWDSSTEV. The other 3 had LCDR3 with either 8 or 10 amino acids residues that had virtually identical amino acid sequence motifs, because7 of the LCDR3 amino acid residues were encoded by IGLV3-9. Seven of the 10 also had 1 amino acid at the IGLV-J junction that apparently was generated by N segment insertion to the final codon encoded by IGLJ.
Finally, 25 samples expressed light chains encoded by IGKV3-20 and HCDR3 encoded by IGHD3-16 and IGHJ3 (Figure 2D, CLL69A). Each had a 19-amino-acid HCDR3 motif GGXYDYIWGSYRPNDAFDI. Fifteen of the samples included here were previously described.9 Here we present 10 additional samples with HCDR3 that are nearly identical to the HCDR3 of the previous 15 samples and all used nearly identical light chains. These samples represent 25 of the 45 (56%) samples in our cohort that expressed light chains encoded by IGKV3-20 and all 25 samples that expressed HCDR3 with this signature amino acid sequence motif.
Discussion
Analysis of the VL genes and CDR3 structures of 258 CLL cases that expressed unmutated IgH encoded by 51p1 demonstrated skewed usage of IGKV and IGLV genes, revealing LCDR3 with conserved molecular structures and nonstochastic pairing of Ig heavy and light chains that was predicated upon the HCDR3.
In the present cohort of 258 CLL B-cell samples that expressed unmutated IgH encoded by 51p1, we observed the same restricted use of heavy chain genes previously noted, because Ig heavy chains with HCDR3 encoded by IGHD2-2, IGHD3-3, and IGHD3-16 account for 60% (155 of 258) of the CLL B-cell samples analyzed. Many of the samples that expressed these specific D segments also had conserved HCDR3 motifs. Similar restriction also was observed for the use of particular VL genes. IGKV3-20, IGKV1-33, IGKV1-39, and IGKV3-15 and IGLV3-9 were used at a high frequency by CLL B cells that used unmutated heavy chains encoded by 51p1. Moreover, each except IGKV1-39 appeared disproportionately overexpressed in this cohort of cases compared with that of a previously described cohort of CLL cases not selected for use of any particular Ig heavy chain.23
The frequent use of some IGKV and IGLV genes by the 258 CLL B-cell samples analyzed in this study revealed nonstochastic pairing of disparate Ig light chains with 51p1-encoded Ig heavy chains that had distinctive HCDR3. We noted light chains encoded by a particular IGKV or IGLV were each paired with 51p1-encoded IgH that had distinctive HCDR3. For example, IGKV1-39-encoded κ light chains were paired with IgH with HCDR3 encoded by IGHD2-2 and IGHJ6. Similarly, light chains encoded by IGKV3-11 were paired with IgH that had HCDR3 encoded by IGHD2-2, IGKV3-20 with IGHD3-16, and IGLV3-9 with IGHD3-3. Conversely, most of the IgH that had a particular HCDR3 motif were paired with nearly identical light chains. Of the 22 samples that expressed the CLL69B HCDR3 motif encoded by IGHD2-2 and IGHJ6, 19 (86%) also had a distinctive light chain encoded by IGKV1-39. This also was true for the 5 (83%) samples that expressed IGKV3-11 and IGHD2-2 (CLL69D), the 25 (100%) samples that expressed IGKV3-20 and IGHD3-16 (CLL69A), and the 10 (77%) that expressed IGLV3-9 and IGHD3-3 (CLL69C).
It should be noted that the Ig light chains encoded by any of the IGKV or IGLV genes identified in this study were not restricted to pairing with 51p1-encoded IgH with distinctive HCDR3. Indeed, light chains encoded by these genes also were found associated with disparate Ig heavy chains that lacked any of the 4 conserved HCDR3 motifs described in this study. In the case of the IGKV1-39 and 51p1-encoded IgH that used IGHD2-2, 2 samples did not have the characteristic Ig heavy chain CDR3 motif. However, 1 of the 2 samples did express a light chain encoded by IGKV1-39 that had the characteristic LCDR3 motif associated with subgroup CLL69B. In addition, 6 other CLL samples that had HCDR3 that were encoded by IGHD genes other than IGHD2-2, and 1 that used IGHD2-2 with a different RF, expressed light chains encoded by IGKV1-39. Similar cases were apparent for IGKV3-20 and IGLV3-9. Twenty of the 45 CLL samples that expressed light chains encoded by IGKV3-20 had Ig heavy chains that did not have the HCDR3 motif of CLL69A. This was not unexpected, because IGKV3-20 and IGKV1-39 are 2 of the most commonly used light chain variable region genes, each being used by more than 10% of normal B cells.24 Therefore, expression of Ig light chains encoded by such VL genes or that have such restricted LCDR3 is not unique to CLL B cells, but rather the nonstochastic pairing of these particular light chains with specific heavy chains contributes to their higher use frequency by CLL cells that express 51p1-encoded Ig. This was apparent by comparing the use frequency of particular IGKV or IGLV genes that encoded Ig light chains that were paired with 51p1-encoded Ig heavy chains that had conserved HCDR3 motifs versus the frequency of expression of the same IGKV or IGLV gene when used by Ig light chains that were paired with Ig heavy chains that lacked HCDR3 motifs. For example, expression of IGKV3-20 in this group of CLL cases that expressed 51p1-encoded IgH was 17% (45 of 258) versus 9.8% (27 of 276) for CLL cases that expressed Ig heavy chains encoded by disparate IGHV genes, and without conserved HCDR3.23 However, the frequency at which IGKV3-20 was expressed by CLL that used unmutated 51p1-encoded IgH without a conserved HCDR3 motif was 10.4% (20 of 192). This 10.4% was similar to the frequency of IGKV3-20 observed among CLL cases not selected for expression of a particular Ig heavy chain gene.
The restricted pairing of heavy and light chains that had HCDR3 or LCDR3 motifs was not absolute, because some heavy chains that had these motifs expressed light chains encoded by other IGKV or IGLV. Rather, the restricted pairing is a consequence of the HCDR3, because all of the heavy chains analyzed expressed the same variable region, encoded by 51p1 with little or no somatic mutation. However, as the definition of unmutated is having at least 98% homology to an identified germline variable region gene, we assessed the presence of nucleotide differences from the germline 51p1 allele nucleotide sequence that resulted in nonconservative changes in the expressed 51p1-encoded heavy chains. First, 182 of the 258 samples had no nucleotide differences from germline 51p1. Twenty of the remaining 76 samples had only conservative nucleotide differences in the heavy chain variable region, whereas the other 56 heavy chains had nonconservative nucleotide changes that resulted in amino acid differences. For 40 of these 56 samples, the only difference was in the last codon of FW3. Only 6 of these 40 samples belonged to 1 of the 4 subgroups identified in this study, of which 4 are part of subgroup CLL68D, as mentioned previously. Of the other 16 samples that had amino acid differences within the rest of the coding region of 51p1, 8 had a single amino acid difference in either the CDR1 or CDR2, 4 had a single difference in a FW region, and 4 samples had 2 differences. Only 3 of these 16 samples had a conserved HCDR3 motif. Secondly, we examined the distribution of 51p1 alleles of IGHV1-69 used by the cohort of 258 heavy chain sequences. There are several alleles of the IGHV1-69 locus that can be grouped into 2 major alleles based upon nonconservative differences in CDR2. Of the 5 51p1-related alleles, 4 differ by only a single conservative nucleotide change, and therefore have identical coding regions. Variant 7 differs from the other 4 51p1 alleles by a Glu>Lys substitution in FW3 and was present in 14% (36 of 258) of the samples. Fifteen of the 25 samples from CLL69A used variant 7, whereas only 1 other sample from the other 3 subgroups used this allele. The remaining 21 samples that expressed this allele used disparate IGKV and IGLV and had HCDR3 with little homology to the motifs that defined the 4 CLL69 subgroups.
Prior studies failed to reveal evidence for specific pairing of Ig heavy and either IGK or IGL subgroups in either naive or memory B cells.25,26 The random pairing of Ig heavy and light chains indicated that no specific or identifiable restrictions are imposed on the expressed Ig repertoire by favored interactions of Ig heavy and light chain pairing. However, biases in the use of individual IGKV24,27 and IGLV28-30 genes have been described for the expressed Ig light chain repertoire, particularly with regard to the light chain repertoire expressed by autoreactive B cells and recently CLL B cells.23 The restricted structures of the CLL B-cell receptors that are expressed in this study strongly suggest selection for a characteristic binding activity. Several other groups have also recently reported subsets of CLL cases that express highly homologous or “stereotyped” CDR3 that express both mutated and unmutated Ig,9,23,31-34 including a recent study by Stamatopoulos et al34 which estimates that more than 20% of patients with CLL express “stereotyped” receptors. As the probability of expressing such conserved Ig is less than 10−12, we and others have previously suggested that expression of Ig with restricted structures may influence the development of CLL. Recent immunophenotypic data suggested that CLL B cells that express mutated and unmutated Ig resemble activated and antigen-experienced B cells.35-37
Although analysis of primary sequence data are unlikely to yield definitive clues regarding the nature of the potential antigen(s) involved in the development of CLL, some clues have been derived from comparison of the molecular characteristics of the Ig expressed by CLL B cells to antibodies with known reactivity, particularly those expressed in various autoimmune diseases. CLL B cells frequently express IgM antibodies that display reactivity to self-antigens,38-40 including several 51p1-encoded Ig that are polyreactive.10,41-44 Several of the stereotyped Ig expressed by CLL B cells are highly similar in structure to antibodies with reactivity to self-antigens, including SMI, a previously characterized CLL that expressed a polyreactive IgM/κ autoantibody with low affinity for a number of self-antigens.10 SMI is encoded by Ig heavy and light chains that are nearly identical to the 25 CLL Ig that belong to subgroup CLL69A. Previous Ig structure function studies using recombinant Ig of SMI IgM/κ and other 51p1 and IGKV3-20–encoded heavy and light chains demonstrated that polyreactivity is not merely the result of the use of certain combinations of unmutated Ig heavy and light chains encoded by particular heavy and light chain genes, but rather is a selected specificity as the polyreactivity was dependent upon the CDR3 structure that was generated during Ig rearrangement.10 In addition, mutational analysis of specific amino acid residues demonstrated that certain CDR3 residues were critical for polyreactivity.38,45 In contrast, some antibodies are intrinsically autoreactive by virtue of their ability to bind to superantigens via framework region-specific residues.46 For example, the frequent expression of IgH encoded by IGHV4-34 led to speculation that superantigens may drive expansion of B cells that express Ig encoded by IGHV4-34,47 such as those that bind the N-acetyllactosamine (NAL) determinant of the I/i blood group antigen. Previous binding studies of recombinant IgM/κ molecules of encoded IGHV4-34 demonstrated that binding to NAL does not require in vivo somatic selection, because all IGHV4-34 heavy chains were able to bind to NAL independent of the HCDR3 structure and light chain.48 In contrast, the Ig expressed by CLL B cells instead reveal evidence for selection by conventional antigen and not by a superantigen, because any selection mediated by a superantigen would not result in the marked nonstochastic pairing of Ig heavy and light chains as noted in this study.
Conceivably, identification of the antigen(s) recognized by the restricted repertoire of Ig expressed in CLL could reveal factors responsible for disease pathogenesis and new targets for immune therapy of patients with this disease.
The online version of this article contains a data supplement.
The publication costs of this article were defrayed in part by page charge payment. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.
Acknowledgments
We thank the investigators who have contributed samples to the CRC Tissue Core, Andrew Greaves for management of the Tissue Core Management System, and Esther Avery for excellent technical assistance.
This work was supported in part by National Institutes of Health grants 2 PO1-CA081534 (CLL Research Consortium) and R37-CA49870 (T.J.K.).
National Institutes of Health
Authorship
Contribution: G.F.W. designed and performed research, analyzed data, and wrote the manuscript; C.J.G., T.L.T., and L.Z.R. performed research and analyzed data; W.G.W., J.C.B., M.J.K., J.G.G., and K.R.R. contributed patient samples and data; and T.J.K. designed the research, contributed patient samples, analyzed data, and wrote the manuscript.
Conflict-of-interest disclosure: The authors declare no competing financial interests.
Correspondence: Thomas J. Kipps, Moores Cancer Center, University of California, San Diego, 3855 Health Sciences Dr, #0820, La Jolla, CA 92093-0820; e-mail: tkipps@ucsd.edu.