Chronic lymphocytic leukemia (CLL) and other B-cell lymphoproliferative disorders display familial aggregation. To identify a susceptibility gene for CLL, we assembled families from the major European (ICLLC) and American (GEC) consortia to conduct a genome-wide linkage analysis of 101 new CLL pedigrees using a high-density single nucleotide polymorphism (SNP) array and combined the results with data from our previously reported analysis of 105 families. Here, we report on the combined analysis of the 206 families. Multipoint linkage analyses were undertaken using both nonparametric (model-free) and parametric (model-based) methods. After the removal of high linkage disequilibrium SNPs, we obtained a maximum nonparametric linkage (NPL) score of 3.02 (P = .001) on chromosome 2q21.2. The same genomic position also yielded the highest multipoint heterogeneity LOD (HLOD) score under a common recessive model of disease susceptibility (HLOD = 3.11; P = 7.7 × 10−5), which was significant at the genome-wide level. In addition, 2 other chromosomal positions, 6p22.1 (corresponding to the major histocompatibility locus) and 18q21.1, displayed HLOD scores higher than 2.1 (P < .002). None of the regions coincided with areas of common chromosomal abnormalities frequently observed in CLL. These findings provide direct evidence for Mendelian predisposition to CLL and evidence for the location of disease loci.

B-cell chronic lymphocytic leukemia (CLL [MIM no. 151400]) accounts for approximately 25% of all leukemias and is the most common form of lymphoid malignancy in Western countries.1  Family2-4  and epidemiologic5-9  studies provide strong support for the familial aggregation of CLL and other related B-cell lymphoproliferative disorders (LPDs) such as non-Hodgkin lymphoma (NHL [MIM no. 605027]) and Hodgkin lymphoma (HL [MIM no. 236000])

The striking multiple-case families reported in the literature provide substantive evidence for an inherited predisposition to CLL2-4  and suggest the existence of susceptibility alleles with pleiotropic effects.2,10  Case-control and cohort studies that have systematically estimated the familial risk of CLL and other LPDs have shown that most B-cell LPDs display site-specific elevated familial risks,5-9  but particularly CLL, where risks are increased 3- to 7-fold in first-degree relatives of cases. Furthermore, such studies have demonstrated that familial associations exist between the different types of B-cell LPDs with risks of NHL and HL showing 2-fold increases in relatives of CLL cases.

These observations provide a strong rationale for searching for predisposition genes for CLL through linkage searches of multiple-case families. Two genome-wide linkage scans have been conducted to date. The first reported by Goldin et al11  in 2003 used 359 microsatellite markers to genotype 18 CLL families. In 2005, a second genome-wide scan of 105 families segregating CLL with or without additional B-cell LPD cases was conducted using the Affymetrix Mapping 10Kv131 array, which contained approximately 11 500 single nucleotide polymorphisms (SNPs).12  In both studies, analyses provided evidence for susceptibility at a number of loci, but none achieved statistical significance, suggesting that a much larger familial sample was required to identify CLL predisposition loci.

To address this, we have undertaken a further genome-wide linkage scan of an additional 101 families ascertained through the International CLL Consortium (ICLLC) and the Genetic Epidemiology of CLL (GEC) consortia. This search was conducted using high-density SNP arrays, thereby allowing us to pool findings with data generated from our previous scan of 105 families and in so creating a dataset of 206 families, representing the majority of CLL families identified worldwide. Here, we report further evidence for a Mendelian predisposition to CLL and strong evidence for the location of novel disease loci.

Ascertainment and collection of families

For clarity, we refer to our previously reported genome-wide scan of 105 pedigrees reported12  as phase 1 and the current analysis of 101 pedigrees as phase 2. As for those in phase 1, phase 2 pedigrees consisted of families with B-cell CLL with or without the segregation of additional B-cell LPD cases. These families were ascertained through hematologists in the United Kingdom, United States, Norway, Israel, Italy, Germany, The Netherlands, Portugal, and Australia participating in the ICLLC (51 phase 2 families) and the GEC consortia (50 phase 2 families). The diagnoses of B-cell CLL and other B-cell LPDs in affected family members were established using accepted standard clinicopathological and immunologic criteria in accordance with current WHO classification guidelines.13  Blood samples were obtained from both the offspring and spouse of deceased affected family members wherever possible to facilitate the reconstruction of genotypes. DNA was extracted from venous blood samples using conventional methodologies. Research protocols and informed consents were obtained according to each group's institutional review board (Multi-Centre Research Ethics Committee UK; National Cancer Institute; Mayo Clinic College of Medicine; Moores Cancer Center, University of California, San Diego; University of Texas M. D. Anderson Cancer Center; University “La Sapienza,” Nepean Hospital) in accordance with the Declaration of Helsinki.

Genotyping

Prior to genotyping, all DNA samples were quantified by PicoGreen (Invitrogen, Paisley, United Kingdom). A genome-wide linkage search of the 101 families in phase 2 was undertaken using the GeneChip Mapping 10K 2.0 Xba Array containing approximately 10 200 SNP markers (Affymetrix, Santa Clara, CA). SNP genotypes were obtained by following the Affymetrix protocol for the GeneChip Mapping 10K 2.0 Xba Array. Briefly, 250 ng genomic DNA isolated from peripheral blood was digested per sample with the restriction endonuclease XbaI for 2.5 hours. Digested DNA was mixed with Xba adapters and ligated using T4 DNA ligase for 2.5 hours. Ligated DNA was added to 4 separate polymerase chain reactions (PCRs), cycled, pooled, and purified to remove unincorporated ddNTPs. The purified PCR products were then fragmented and labeled with biotin-ddATP. Biotin-labeled DNA fragments were hybridized to the arrays for 18 hours in an Affymetrix 640 hybridization oven. After hybridization, arrays were washed, stained, and scanned using an Affymetrix Fluidics Station FS450 with images obtained by use of an Affymetrix GeneChip 3000 scanner. Affymetrix GCOS software (v1.4) was used to obtain raw microarray feature intensities. Feature intensities were processed using Affymetrix GTYPE (v4.0) software to derive SNP genotypes (Affymetrix).

Data manipulation and error checking

The phase 1 genome-wide linkage scan had been undertaken using the GeneChip Mapping 10Kv131 Xba array containing 11 555 SNP markers (Affymetrix). Phase 2 samples were genotyped over 10 204 markers on version 2.0 of the Affymetrix 10K array. Pooled linkage analysis of the 206 families was based upon the 10 204 SNPs common to both arrays. The pedigree relationship-testing program PREST (release 3.0)14  was implemented to check for the detection of pedigree errors. Non-Mendelian error checking of genotypes and generation of linkage format files from raw Affymetrix array files was performed using the program ProgenyLab (Progeny, South Bend, IN). The map order and distances between SNP markers was based on the UCSC Human Genome browser (March 2006 release). The program MERLIN15  was used to further search for and remove additional unlikely genotypes consistent with potential genotyping errors.

Investigation of linkage disequilibrium

Most linkage software for multipoint analyses assumes that markers are in linkage equilibrium. However, for closely spaced SNP markers this is not always the case. To identify markers in high linkage disequilibrium (LD), we calculated the pair-wise LD measure r2 between consecutive pairs of SNP markers using the expectation-maximization algorithm to estimate 2-locus haplotype frequencies as previously described.12  A pair of SNPs was defined as being in high LD if they had a pair-wise LD measure of r2 higher than 0.16 in accordance with criteria recently advocated.16  Linkage disequilibrium was then removed by considering each set of markers in LD (defined as sets where each consecutive marker pair in the set had r2 > 0.16) and retaining one SNP from each set (the centrally positioned SNP). The impact of LD was investigated by considering linkage results calculated before and after the removal of the high-LD SNPs.

Linkage analysis

Multipoint linkage analysis was conducted by implementation of the Perl script SNPLINK,17  which performs fully automated nonparametric (mode-of-inheritance free) and parametric analyses before and after LD removal using the program ALLEGRO (v1.2).18  Although primary statistical analyses were based on NPL scores, parametric linkage in the presence of heterogeneity was assessed using heterogeneity LOD (HLOD) scores and their accompanying estimates of the proportion of linked families (α) estimated. These analyses require the specification of a disease-transmission model. We derived LOD scores under both dominant and recessive models of inheritance with reduced penetrance and 2 age categories dependent upon age at diagnosis (<65 and 65+ years). In the absence of a genetic model, we adopted a pragmatic approach to this analysis choosing values that were consistent with the population age-specific risks of CLL and compatible with the range of familial risks. The lifetime risk (defined at age 84 years) for being diagnosed with CLL in the U.S. population using the SEER registry data is estimated to be approximately 0.37%.19  We assumed an allele frequency of either 0.005 or 0.05 under the dominant models, and 0.05 and 0.20 under the recessive models. To satisfy the constraints of the lifetime risk and familial relative risks, for the dominant models the penetrance of the rare and common alleles were assumed to be 4.2% and 2.8%, respectively for individuals aged younger than 65 years and 9.0% and 6.0%, respectively, for those older than 64 years. For the recessive models, penetrance of the rare and common alleles was assumed to be 14.0% and 7.0%, and 30.0% and 15.0%, respectively, for the 2 liability classes. To allow for phenocopies, the penetrance of the normal genotypes under all models was set to 0.14% and 0.3%, respectively, for the 2 respective liability classes. All unaffected individuals were considered uninformative (ie, of unknown phenotype) in the analysis.

Heterogeneity LOD scores follow a complex statistical distribution, which can be approximated by the maximum of 2 independently distributed χ2 variables. To obtain significance estimates for HLODs, these were first converted to a χ2, where χ2 = 2 loge10 × HLOD and significance values (P1) were then derived, using the χ2 distribution with one degree of freedom. The nominal P value for the HLOD score is then given by: 0.5 × [1 − (1 − p1)(1 − p1)].20 

Results are reported in terms of an NPL statistic and its associated one-sided P value. Under the null hypothesis of no linkage, the NPL statistic is distributed asymptotically as a standard normal random variable. An estimate of the information content (IC) for each chromosome before and after high LD SNP removal was determined by use of marker set entropy information derived by MERLIN.21 

Description of families analyzed

The 206 families included in phase 1 (n = 105) and phase 2 (n = 101) comprised 155 CLL families and 51 families segregating CLL and other B-cell LPDs (Table 1). Within the 206 families, there were 487 individuals affected with CLL and 63 individuals affected with NHL or HL. A higher proportion of families in phase 2 were multigenerational compared with those in phase 1 (Table 1). The difference in composition of families between the 2 phases is not a consequence of predefined criterion for ascertainment of families, but is reflective in part of a consequence of the ongoing development of ICLLC and GEC. Overall, 42% of the 206 families contained 3 or more affected individuals.

Table 1

Characteristics of the pedigrees analyzed in phase 1, phase 2, and the combined dataset

No. of affected individuals inpedigree/generationNo. of pedigrees
Phase 1Phase 2Combined
2    
––––1 71 (10) 49 (9) 120 (19) 
––––2 — 
3    
––––1 12 7 (4) 19 (4) 
––––2 15 (5) 18 (5) 33 (10) 
––––3 — 
4    
––––1 3 (2) 4 (2) 
––––2 3 (3) 9 (6) 12 (9) 
––––3 1 (1) 2 (1) 
5    
––––1 — 
––––2 6 (4) 7 (4) 
––––3 — 2 (1) 2 (1) 
10: 1 — 1 (0) 
12: 4 1 (1) — 1 (1) 
Total 105 (19) 101 (32) 206 (51) 
No. of affected individuals inpedigree/generationNo. of pedigrees
Phase 1Phase 2Combined
2    
––––1 71 (10) 49 (9) 120 (19) 
––––2 — 
3    
––––1 12 7 (4) 19 (4) 
––––2 15 (5) 18 (5) 33 (10) 
––––3 — 
4    
––––1 3 (2) 4 (2) 
––––2 3 (3) 9 (6) 12 (9) 
––––3 1 (1) 2 (1) 
5    
––––1 — 
––––2 6 (4) 7 (4) 
––––3 — 2 (1) 2 (1) 
10: 1 — 1 (0) 
12: 4 1 (1) — 1 (1) 
Total 105 (19) 101 (32) 206 (51) 

The number of pedigrees containing individuals affected with other confirmed B-cell LPDs is shown in parentheses.

— indicates no data.

The median age at diagnosis of CLL in the 206 families was 60 years, significantly less than the median value of 72 years for age at diagnosis observed in the general white population.19  Minimum age at diagnosis within a family is likely to be a superior indicator of the potential for existence of a susceptibility gene, since it is not influenced by older sporadic cases. In our families, the minimum age of diagnosis within the families ranged from 28 years to 81 years with a median value of 56 years.

Within phase 1, 203 (85%) of 238 family members affected with CLL were genotyped together with 17 (77%) of 22 of those affected with LPD and 3 unaffected individuals. In phase 2 families, 101 (41%) of 249 individuals affected with CLL and 22 (54%) of 41 of those affected with LPD were genotyped. In addition 51 unaffected family members were typed primarily to reconstruct genotypes of unavailable affected family members.

Data quality

In addition to the 223 Affymetrix 10K131 arrays run and used in the phase 1 analyses, a total of 171 Affymetrix 10Kv2.0 arrays were processed in phase 2. A number of parameters were used throughout the study to determine data quality, and all genotypes were housed within the pedigree storage program ProgenyLab. The average SNP call rate per array for phase 2 was 98.0% compared with 92.8% for phase 1. For DNA extracted from males, it was possible to examine the 309 markers on the X chromosome for errors due to miscalls or PCR contamination. No SNPs were heterozygous in male samples. Two hundred seventy-three markers were fixed or were without a single map location, leaving 9933 usable SNPs (97.3%), of which 9690 mapped to autosomes. After LD removal 7495 (77.4%) of 9690 markers remained. Less than 0.4% of the total SNP genotypes generated were considered unlikely by ProgenyLab and/or MERLIN. All such genotypes were removed from further analyses.

Linkage analysis

The IC derived for the phase 1 analyses from using only the 10 204 SNPs contained within the Affymetrix 10Kv2.0 array was not significantly different from that obtained using all original 11 555 markers on the 10Kv131 array. It is known that the presence of LD between markers can inflate multipoint linkage statistics if the vectors of inheritance have to be inferred on the basis of allele frequencies22,23  and where founders of many of the pedigrees are not available to genotype.

Multipoint nonparametric linkage analysis of all 206 families with and without the high-LD SNPs is shown in Figure 1. The panels within Figure 1 show that inclusion of high-LD SNPs in the analysis can lead to inflated linkage statistics; however, in most cases, the overall profile of the linkage statistics remains the same. Genome-wide mean IC scores were virtually identical with and without inclusion of high-LD SNPs in phase 1, phase 2, and the combined dataset (combined dataset: 0.645 before and 0.632 after LD removal).

Figure 1

NPL scores across each chromosome. In each plot, ---- shows NPL statistics obtained using all SNPs (with LD; n = 9690), while ▔ shows NPL statistics obtained after exclusion of high-LD SNPs (n = 7495).

Figure 1

NPL scores across each chromosome. In each plot, ---- shows NPL statistics obtained using all SNPs (with LD; n = 9690), while ▔ shows NPL statistics obtained after exclusion of high-LD SNPs (n = 7495).

Close modal

Table 2 details the maximal NPL scores attained after removal of LD for all autosomes in phase 1, phase 2, and in the combined dataset. The best evidence for linkage was confined to 2q21.2, 5q23.2, 6p22.1, 11q12.1, and 18q21.1. Figure 2 shows transformed multipoint HLOD scores (− log10[P value]) generated using the most parsimonious dominant and recessive models and corresponding transformed multipoint NPL scores for these 5 chromosomes. The maximum NPL score obtained was 3.02 with a corresponding nominal P value of .001 at map position 2q21.2 (Figure 1). At the same position, a genome-wide significant HLOD of 3.11 (P = 7.7 ×10−5) under a common recessive model was obtained with 68% of families showing evidence of linkage. Support for the 2q21.2 locus was provided by both phase 1 (NPL = 1.64, HLOD = 1.26) and phase 2 (NPL = 2.60, HLOD = 1.75) data. In addition to chromosome 2, the 4 regions on chromosomes 5q23.2, 6p22.1, 11q12.1, and 18q21.1 attained significance levels compatible with thresholds recommended for genome-wide suggestive linkage24  (Tables 2,3; Figure 2). For each of the regions there was limited evidence that linkage was primarily generated by any specific families.

Table 2

Maximum NPL obtained after the removal of high LD SNPs for phase 1, phase 2, and the combined dataset of 206 pedigrees

ChromosomeAffection status = CLL + LPD, 206 families, NPL score (position Mb)
Affection status = CLL, 182 families, NPL score (position Mb)
Phase 1Phase 2CombinedPhase 1Phase 2Combined
105 101 206 91 91 182 
1.12 (91.5) 1.41 (80.3) 1.35 (80.3) 0.96 (81.6) 1.09 (80.3) 1.05 (191.1) 
1.64 (135.3) 2.60 (134.6) 3.02 (134.6) 1.67 (135.3) 2.31 (134.6) 2.84 (134.6) 
1.64 (58.0) 0.99 (193.6) 1.02 (109.5) 1.44 (57.7) 0.79 (103.8) 0.97 (144.4) 
0.03 (178.6) 2.22 (36.8) 1.35 (71.2) 0.00* 2.19 (36.8) 0.99 (67.4) 
1.95 (110.6) 2.00 (171.9) 2.12 (124.4) 1.89 (108.7) 2.18 (170.4) 2.37 (109.0) 
2.16 (18.5) 2.02 (31.6) 2.80 (29.1) 1.86 (131.4) 1.81 (29.4) 2.44 (29.1) 
0.99 (66.8) 1.55 (128.3) 1.29 (127.3) 0.58 (78.7) 2.45 (128.3) 2.00 (127.3) 
0.70 (130.6) 1.64 (50.9) 1.55 (50.9) 0.85 (27.4) 1.66 (51.6) 1.51 (50.9) 
0.02 (38.3) 1.36 (34.6) 1.03 (35.0) 0.00* 0.89 (0.2) 0.39 (0.5) 
10 1.80 (118.0) 1.27 (124.2) 1.69 (123.4) 1.59 (117.3) 1.77 (124.2) 1.93 (123.4) 
11 2.66 (48.7) 1.22 (109.1) 1.79 (59.5) 2.75 (51.2) 1.57 (109.1) 1.54 (58.4) 
12 0.89 (97.3) 2.14 (113.1) 1.77 (112.5) 0.79 (97.3) 2.02 (112.5) 1.75 (112.5) 
13 1.70 (42.2) 1.85 (101.4) 1.00 (85.1) 1.21 (42.2) 1.52 (85.1) 1.78 (85.1) 
14 1.60 (102.7) 0.30 (23.9) 0.30 (25.5) 1.28 (98.9) 0.37 (26.5) 0.74 (25.5) 
15 0.60 (59.3) 1.27 (76.8) 0.97 (60.2) 0.64 (59.2) 1.74 (55.9) 1.62 (55.9) 
16 1.51 (68.5) 1.45 (75.6) 1.75 (75.4) 1.16 (61.5) 1.69 (76.4) 1.58 (76.4) 
17 0.00* 1.26 (11.7) 0.57 (20.1) 0.00* 1.77 (3.9) 0.79 (20.1) 
18 1.28 (46.0) 2.81 (42.5) 2.66 (42.5) 1.10 (46.0) 2.29 (41.8) 2.21 (41.8) 
19 1.19 (56.2) 1.58 (39.0) 0.91 (56.2) 0.84 (56.0) 1.51 (39.0) 0.84 (56.0) 
20 0.00* 1.94 (59.2) 0.80 (55.8) 0.23 (8.5) 2.18 (10.6) 1.48 (8.5) 
21 0.00* 0.63 (23.3) 0.00* 0.00* 0.25 (23.5) 0.00* 
22 0.00* 0.77 (27.8) 0.00* 0.00* 0.44 (33.5) 0.00* 
ChromosomeAffection status = CLL + LPD, 206 families, NPL score (position Mb)
Affection status = CLL, 182 families, NPL score (position Mb)
Phase 1Phase 2CombinedPhase 1Phase 2Combined
105 101 206 91 91 182 
1.12 (91.5) 1.41 (80.3) 1.35 (80.3) 0.96 (81.6) 1.09 (80.3) 1.05 (191.1) 
1.64 (135.3) 2.60 (134.6) 3.02 (134.6) 1.67 (135.3) 2.31 (134.6) 2.84 (134.6) 
1.64 (58.0) 0.99 (193.6) 1.02 (109.5) 1.44 (57.7) 0.79 (103.8) 0.97 (144.4) 
0.03 (178.6) 2.22 (36.8) 1.35 (71.2) 0.00* 2.19 (36.8) 0.99 (67.4) 
1.95 (110.6) 2.00 (171.9) 2.12 (124.4) 1.89 (108.7) 2.18 (170.4) 2.37 (109.0) 
2.16 (18.5) 2.02 (31.6) 2.80 (29.1) 1.86 (131.4) 1.81 (29.4) 2.44 (29.1) 
0.99 (66.8) 1.55 (128.3) 1.29 (127.3) 0.58 (78.7) 2.45 (128.3) 2.00 (127.3) 
0.70 (130.6) 1.64 (50.9) 1.55 (50.9) 0.85 (27.4) 1.66 (51.6) 1.51 (50.9) 
0.02 (38.3) 1.36 (34.6) 1.03 (35.0) 0.00* 0.89 (0.2) 0.39 (0.5) 
10 1.80 (118.0) 1.27 (124.2) 1.69 (123.4) 1.59 (117.3) 1.77 (124.2) 1.93 (123.4) 
11 2.66 (48.7) 1.22 (109.1) 1.79 (59.5) 2.75 (51.2) 1.57 (109.1) 1.54 (58.4) 
12 0.89 (97.3) 2.14 (113.1) 1.77 (112.5) 0.79 (97.3) 2.02 (112.5) 1.75 (112.5) 
13 1.70 (42.2) 1.85 (101.4) 1.00 (85.1) 1.21 (42.2) 1.52 (85.1) 1.78 (85.1) 
14 1.60 (102.7) 0.30 (23.9) 0.30 (25.5) 1.28 (98.9) 0.37 (26.5) 0.74 (25.5) 
15 0.60 (59.3) 1.27 (76.8) 0.97 (60.2) 0.64 (59.2) 1.74 (55.9) 1.62 (55.9) 
16 1.51 (68.5) 1.45 (75.6) 1.75 (75.4) 1.16 (61.5) 1.69 (76.4) 1.58 (76.4) 
17 0.00* 1.26 (11.7) 0.57 (20.1) 0.00* 1.77 (3.9) 0.79 (20.1) 
18 1.28 (46.0) 2.81 (42.5) 2.66 (42.5) 1.10 (46.0) 2.29 (41.8) 2.21 (41.8) 
19 1.19 (56.2) 1.58 (39.0) 0.91 (56.2) 0.84 (56.0) 1.51 (39.0) 0.84 (56.0) 
20 0.00* 1.94 (59.2) 0.80 (55.8) 0.23 (8.5) 2.18 (10.6) 1.48 (8.5) 
21 0.00* 0.63 (23.3) 0.00* 0.00* 0.25 (23.5) 0.00* 
22 0.00* 0.77 (27.8) 0.00* 0.00* 0.44 (33.5) 0.00* 
*

Where NPLs were 0, SNP marker identifier is not shown.

Figure 2

Plots of transformed linkage statistics (−log10[P value]) after the removal of high-LD SNPs for chromosomes 2, 5, 6, 11, and 18. Transformed HLOD scores under the best dominant model are shown in black; under the best recessive model, in red; and transformed NPL values, in blue. The best recessive model for each of the 5 chromosomes was chromosomes 2, 6, and 18 (common recessive) and chromosomes 5 and 11 (rare recessive). The best dominant model for each of the 5 chromosomes was chromosomes 2, 6, and 11 (common dominant) and chromosomes 5 and 18 (rare dominant).

Figure 2

Plots of transformed linkage statistics (−log10[P value]) after the removal of high-LD SNPs for chromosomes 2, 5, 6, 11, and 18. Transformed HLOD scores under the best dominant model are shown in black; under the best recessive model, in red; and transformed NPL values, in blue. The best recessive model for each of the 5 chromosomes was chromosomes 2, 6, and 18 (common recessive) and chromosomes 5 and 11 (rare recessive). The best dominant model for each of the 5 chromosomes was chromosomes 2, 6, and 11 (common dominant) and chromosomes 5 and 18 (rare dominant).

Close modal
Table 3

Location of maximum linkage metrics defined by either an NPL score of 2.8 or higher or HLOD score of 1.8 or higher after the removal of high-LD SNP markers from analyses

ChromosomePosition, MbNPLPHLODModelP
Affection status based on CLL and B-cell LPD, n = 206 families       
––––2q21.2 134.5 3.02 .001 3.11 Common recessive 7.7×10−5 
––––5q23.2 124.4 2.12 .017 1.76 Rare recessive .004 
––––6p22.1 29.1 2.81 .003 2.22 Common recessive .001 
––––11q12.1 59.5 1.79 .037 1.95 Rare recessive .003 
––––18q21.1 42.5 2.66 .004 2.36 Rare dominant .001 
Affection status restricted to CLL, n = 182 families       
––––2q21.2 134.5 2.84 .002 2.56 Common recessive 6.0×10−4 
––––5q21.3 109.1 2.37 .009 1.36 Rare recessive .013 
––––6p22.1 29.1 2.44 .007 1.63 Common recessive .006 
––––11q12.1 58.4 1.54 .062 1.71 Rare recessive .005 
––––18q12.3 41.8 2.21 .014 1.92 Rare dominant .029 
ChromosomePosition, MbNPLPHLODModelP
Affection status based on CLL and B-cell LPD, n = 206 families       
––––2q21.2 134.5 3.02 .001 3.11 Common recessive 7.7×10−5 
––––5q23.2 124.4 2.12 .017 1.76 Rare recessive .004 
––––6p22.1 29.1 2.81 .003 2.22 Common recessive .001 
––––11q12.1 59.5 1.79 .037 1.95 Rare recessive .003 
––––18q21.1 42.5 2.66 .004 2.36 Rare dominant .001 
Affection status restricted to CLL, n = 182 families       
––––2q21.2 134.5 2.84 .002 2.56 Common recessive 6.0×10−4 
––––5q21.3 109.1 2.37 .009 1.36 Rare recessive .013 
––––6p22.1 29.1 2.44 .007 1.63 Common recessive .006 
––––11q12.1 58.4 1.54 .062 1.71 Rare recessive .005 
––––18q12.3 41.8 2.21 .014 1.92 Rare dominant .029 

α is the estimate of the proportion of families linked at a given genomic position.

For chromosome 6, the best-fitting model was attained imposing a common recessive allele with 72% of families being linked, with support coming from both phases (Table 2). HLODs for phase 1 and phase 2 were 1.35 and 1.22, respectively. For chromosomes 5 and 11, the best-fitting model was attained imposing a rare recessive allele with 85% and 82% of families being linked, respectively (Tables 2,3). Support for chromosome 5 linkage was not biased to either phase 1 or phase 2, but the region at which maximal linkage was attained was inconsistent. Similarly for chromosome 11, the majority of the support for linkage came from phase 1 data (NPL = 2.66, P = .004) and maximal linkage obtained at different chromosomal locations (Table 2). In contrast, for chromosome 18q21.1, the best-fitting model was attained imposing a rare dominant allele with 68% of families being linked, with most of the evidence coming from phase 2 data (NPL = 2.81, P = .003).

Restricting the analysis to affection status solely defined by CLL (n = 182 families) made no significant difference to the overall linkage statistics attained at 2q21.2, 5q23.2, 6p22.1, 11q12.1, and 18q21.1 (Tables 2,3).

Following publication of 2 previous linkage studies that failed to identify significant linkage, we combined the extant families from diverse institutions worldwide, and 2 existing consortia, to generate the largest collection of familial CLL to date.

Our results provide evidence for a major susceptibility locus on chromosome 2 influencing the risk of CLL—with characteristics consistent with an autosomal recessive model of inheritance. We did not find any significant evidence for linkage in the combined dataset to any of the regions of the genome commonly associated with cytogenetically detectable chromosomal losses (6q, 13q14, or 17p) or gains (trisomy 12) in CLL.25-27  In addition to linkage to 2q21.2, we found evidence of a recessively acting locus for CLL mapping to 6p22.1 and a dominantly acting locus mapping to 18q21.1 on the basis of presumptive Mendelian models of predisposition.

Here, we have made use of data generated from high-density SNP arrays to search for CLL predisposition loci by linkage. In addition to affording maximal power to detect linkage, the output from such arrays permits pooling of data from different scans to be efficiently conducted, avoiding the serious problems of microsatellite-based searches. The combined dataset of 206 families has permitted us to robustly identify a novel locus on chromosome 2 that had displayed linkage only at the 1% level in our previous search. Furthermore, we have increased evidence for linkage to chromosome 6 in a region that includes the HLA locus. Maximal evidence of linkage to 2q21.2 and 6p22.1 under assumption of recessive transmission may, however, in part be a consequence of the high proportion of the families analyzed containing affected sibships that favor recovery of a recessive model.

Although high-density SNP arrays represent a milestone in linkage analysis, the presence of LD between SNPs does, however, have the consequence of potentially inflating linkage statistics. While there is no definitive consensus on the thresholds to be used to manage the issue of LD between SNPs, we excluded SNPs with high LD, defined as those with a pair-wise linkage disequilibrium measure of r2 more than 0.16. This can be viewed as conservative but is a threshold for triaging SNPs in high LD, which has been recently recommended.16  When we originally reported analysis of the first 105 families,12  we imposed a less stringent criterion, advocated at the time of r2 more than 0.40.28  Given that high LD between SNP markers impacts on linkage statistics but does not result in loss of information content within our dataset, we strongly endorse imposing stringent thresholds when using high-density arrays for linkage analyses.

Although speculative at this juncture, several interesting candidate genes involved in aspects of the regulation of cellular proliferation and differentiation of B cells map to the regions of linkage on 2q21 and 18q21. The region identified on chromosome 2 includes the chemokine receptor gene (CXCR4) whose expression is higher in CLL cells and that is thought to be associated with disease progression.29  Levels of CXCR4 have also been associated with Rai stage30  and with survival in familial CLL.31 CXCR4 germ-line mutations are responsible for the warts, hypogammaglobulinemia, infections, and myelokathexis syndrome (WHIM; MIM no. 193670). The chromosome 18 region contains the SMAD7 gene (mothers against decapentaplegic, drosophila, homolog of, 7; MIM no. 602932) whose expression has been implicated in growth arrest and apoptosis of B-lineage cells and Ig class switching.32-34  It is also intriguing that we found support for involvement of the MHC region by virtue of linkage at 6p22.1. A support for HLA alleles in the development of B-cell LPD is provided by the observation of linkage in sibships with Hodgkin lymphoma,35  and some previous association studies have also implicated variants within or close to the MHC class II region in susceptibility to CLL.36 

Reduced expression of death-associated protein kinase 1 (DAPK1) through epigenetic silencing by promoter methylation and histone tail modification has been reported to occur in the majority of sporadic CLL cases. A rare, single-nucleotide germ-line mutation (c.1–6531A>G) upstream of DAPK1, which maps to 9q21.33, has recently been reported to segregate with CLL in a large family, suggesting that heritable predisposition to CLL may in part be mediated through germ-line variation in DAPK1.37  The contribution of inherited mutations in DAPK1 to familial risk is unclear; however, in our analyses, we found no evidence of linkage to this region of 9q21 (either in the complete dataset or in a restricted analysis based on only larger pedigrees with affection status solely defined by CLL and with 4 or more affected individuals), suggesting the contribution of this locus to the overall familial aggregation of the disease is small.

Our results suggest that more than one gene is contributing to risk of CLL in families. Such loci could be epistatic or acting independently. The observation of subclinical levels of monoclonal B-cell lymphocytosis (MBL) with an identical phenotype to indolent CLL detectable in 3% of healthy individuals but 14% of first-degree relatives in high-risk CLL families38  suggests this phenotype is a marker of genetic risk and may be an early event in the oncogenic process, consistent with a model based on epistatic interaction. As only a paucity of individuals from the 206 pedigrees have been tested for this phenotype, it was not possible to make use of MBL status in our current analysis. Future mapping studies of high-risk families incorporating data on MBL status on all available family members are therefore desirable to better characterize the model.

In conclusion, follow-up of linkage signals on 2q21, 18q21, and 6p22 is warranted along with screening of individuals for the presence of the MBL phenotype. In conjunction with conventional fine mapping of loci, as has been shown for DAPK1,37  it may be possible to also make use of expression data to identify novel disease genes. This should be possible through the ongoing collection of families from ICLLC and GEC consortia, as well as available population-based case-control collections.

An Inside Blood analysis of this article appears at the front of this issue.

The publication costs of this article were defrayed in part by page charge payment. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.

Grant support for the ICLLC and work at the Institute of Cancer Research was provided by Leukemia Research, the Arbib Foundation, and Cancer Research UK. The work of the GEC is supported by grant CA118444 from the National Cancer Institute (NCI) and by the Intramural Research Program of the NIH, National Cancer Institute.

We are grateful to all patients and their families for participation in this study. We thank all the clinicians for participating in the ICLLC and the GEC consortia, specifically, in ICLLC: Drs Robin Aitchison, Petra Antunovic, Jenny Arnold, Hasan Atrah, Martin Auger, Andrew Bell, Isaac Ben-Bassat, Alain Berrebi, Lee Bond, Mary Cahill, Silvano Capalbo, John Catalano, Claire Chapman, Patricia Chipping, Patricia Clark, Rosa Collado, Clare Dearden, Helen Dignum, Ian Douglas, Julio Esteban, Savio Fernandes, Elizabeth Gaminara, Milagros Garcia Diaz, Alfonzo Garcia de Coca, Lia Ginaldi, James Hamilton, Paul Hayes, Fredrick Jackson, Steven Johnson, Maria Junior, Eric Kanfer, Daniel Kennedy, Christopher Knechtli, Anil Lakhani, Maeve Leahy, Ray Lowenthal, Arumugam Manoharan, Leonora Mehes, Sophie Mepham, Jane Merceira, Ann Miller, Alison Milne, Philippe Mineur, Godfrey Morgenstern, Anne Morrison, Richard Murrin, Ann Nandi, Anne Parker, Kanthi Perera, Klas Quabeck, Saad Rassam, Cecil Reid, Isabel Ribeiro, Colin Rist, Richard Rosenquist, Martin Rowlands, Pinhas Stark, Rhona Stewart, Robert Stockley, Paul Stross, Geoffrey Summerfield, Helen Sykes, Daniel Thompson, Christopher Tiplady, Marilyn Treacy, Virginia Tringham, Eric Van Den Neste, David Westerman, Nicholas Wickham, James Wiley, and Barrie Woodcock; and in GEC consortia, Laura Fontaine, Fatima Abbasi, Maria Sgambati, Ola Landgren, David Ng, Jorge Toro, Mary Lou McMaster, and Joseph F. Fraumeni Jr, for their work with NCI's families. We also recognize and thank Drs James Cerhan, Celine Vachon, Neil Kay, as well as Marcia Mahlman, for their work with Mayo Clinic families. Finally, we are grateful to Emily Webb for statistical advice.

National Institutes of Health

Contribution: G.S.S. designed and performed research, analyzed and interpreted data, and drafted the paper; L.R.G. designed research, collected data, contributed families, analyzed and interpreted data, and drafted the paper; R.W.W. designed and performed research; S.L.S. and R.S.H. designed research, collected data, contributed families, analyzed and interpreted data, and drafted the paper; L.R., S.S.S., F.R.M., G.E.M., S.F., M.L., T.K., M.J.K., and T.G.C. collected data and contributed families; M.J.S.D. collected data, contributed families, and drafted the paper; D.C. and N.C. designed research, collected data, contributed families, and drafted the paper.

Conflict-of-interest disclosure: The authors declare no competing financial interests.

Correspondence: Richard Houlston, Section of Cancer Genetics, Institute of Cancer Research, 15 Cotswold Road, Sutton, Surrey, SM2 5NG, United Kingdom; e-mail: richard.houlston@icr.ac.uk; Neil Caporaso, Pharmacogenetics Section, Genetic Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, EPS 7116, 6120 Executive Blvd, Rockville, MD 20892; e-mail: caporasn@exchange.nih.gov.

1
Stevenson
 
F
Caligaris-Cappio
 
F
Chronic lymphocytic leukemia: revelations from the B-cell receptor.
Blood
, vol. 
2004
 (pg. 
4389
-
4395
)
2
Yuille
 
MR
Matutes
 
E
Marossy
 
A
Hilditch
 
B
Catovsky
 
D
Houlston
 
RS
Familial chronic lymphocytic leukaemia: a survey and review of published studies.
Br J Haematol
2000
, vol. 
109
 (pg. 
794
-
799
)
3
Jonsson
 
V
Houlston
 
RS
Catovsky
 
D
et al. 
CLL family ‘Pedigree 14′ revisited: 1947–2004.
Leukemia
2005
, vol. 
19
 (pg. 
1025
-
1028
)
4
Aoun
 
P
Zhou
 
G
Chan
 
WC
et al. 
Familial B-cell chronic lymphocytic leukemia: analysis of cytogenetic abnormalities, immunophenotypic profiles, and immunoglobulin heavy chain gene usage.
Am J Clin Pathol
2007
, vol. 
127
 (pg. 
31
-
38
)
5
Goldin
 
LR
Pfeiffer
 
RM
Li
 
X
Hemminki
 
K
Familial risk of lymphoproliferative tumors in families of patients with chronic lymphocytic leukemia: results from the Swedish Family-Cancer Database.
Blood
2004
, vol. 
104
 (pg. 
1850
-
1854
)
6
Cartwright
 
RA
Bernard
 
SM
Bird
 
CC
et al. 
Chronic lymphocytic leukaemia: case control epidemiological study in Yorkshire.
Br J Cancer
1987
, vol. 
56
 (pg. 
79
-
82
)
7
Linet
 
MS
Van Natta
 
ML
Brookmeyer
 
R
et al. 
Familial cancer history and chronic lymphocytic leukemia: a case-control study.
Am J Epidemiol
1989
, vol. 
130
 (pg. 
655
-
664
)
8
Pottern
 
LM
Linet
 
M
Blair
 
A
et al. 
Familial cancers associated with subtypes of leukemia and non-Hodgkin's lymphoma.
Leuk Res
1991
, vol. 
15
 (pg. 
305
-
314
)
9
Radovanovic
 
Z
Markovic-Denic
 
L
Jankovic
 
S
Cancer mortality of family members of patients with chronic lymphocytic leukemia.
Eur J Epidemiol
1994
, vol. 
10
 (pg. 
211
-
213
)
10
Gunz
 
FW
Gunz
 
JP
Veale
 
AM
Chapman
 
CJ
Houston
 
IB
Familial leukaemia: a study of 909 families.
Scand J Haematol
1975
, vol. 
15
 (pg. 
117
-
131
)
11
Goldin
 
LR
Ishibe
 
N
Sgambati
 
M
et al. 
A genome scan of 18 families with chronic lymphocytic leukaemia.
Br J Haematol
2003
, vol. 
121
 (pg. 
866
-
873
)
12
Sellick
 
GS
Webb
 
EL
Allinson
 
R
et al. 
A high-density SNP genomewide linkage scan for chronic lymphocytic leukemia-susceptibility loci.
Am J Hum Genet
2005
, vol. 
77
 (pg. 
420
-
429
)
13
Müller-Hermelink
 
H
Montserrat
 
E
Catovsky
 
D
Harris
 
N
Jaffe
 
ES
Harris
 
NL
Stein
 
H
Vardiman
 
JW
Chronic lymphocytic leukaemia/small lymphocytic lymphoma.
World Health Organization Classification of Tumours: Pathology and Genetics of Tumours of Haematopoietic and Lymphoid Tissues
2001
Lyon, France
IARC Press
(pg. 
127
-
130
)
14
McPeek
 
MS
Sun
 
L
Statistical tests for detection of misspecified relationships by use of genome-screen data.
Am J Hum Genet
2000
, vol. 
66
 (pg. 
1076
-
1094
)
15
Abecasis
 
GR
Cherny
 
SS
Cookson
 
WO
Cardon
 
LR
Merlin: rapid analysis of dense genetic maps using sparse gene flow trees.
Nat Genet
2002
, vol. 
30
 (pg. 
97
-
101
)
16
Boyles
 
AL
Scott
 
WK
Martin
 
ER
et al. 
Linkage disequilibrium inflates type I error rates in multipoint linkage analysis when parental genotypes are missing.
Hum Hered
2005
, vol. 
59
 (pg. 
220
-
227
)
17
Webb
 
EL
Sellick
 
GS
Houlston
 
RS
SNPLINK: multipoint linkage analysis of densely distributed SNP data incorporating automated linkage disequilibrium removal.
Bioinformatics
2005
, vol. 
21
 (pg. 
3060
-
3061
)
18
Gudbjartsson
 
DF
Jonasson
 
K
Frigge
 
ML
Kong
 
A
Allegro, a new computer program for multipoint linkage analysis.
Nat Genet
2000
, vol. 
25
 (pg. 
12
-
13
)
19
Ries
 
L
Eisner
 
M
Kosary
 
C
et al. 
SEER Cancer Statistics Review 1975–2000.
2003
Bethesda, MD
National Cancer Institute
20
Faraway
 
JJ
Distribution of the admixture test for the detection of linkage under heterogeneity.
Genet Epidemiol
1993
, vol. 
10
 (pg. 
75
-
83
)
21
Kruglyak
 
L
Daly
 
MJ
Reeve-Daly
 
MP
Lander
 
ES
Parametric and nonparametric linkage analysis: a unified multipoint approach.
Am J Hum Genet
1996
, vol. 
58
 (pg. 
1347
-
1363
)
22
Evans
 
DM
Cardon
 
LR
Guidelines for genotyping in genomewide linkage studies: single-nucleotide-polymorphism maps versus microsatellite maps.
Am J Hum Genet
2004
, vol. 
75
 (pg. 
687
-
692
)
23
Huang
 
Q
Shete
 
S
Amos
 
CI
Ignoring linkage disequilibrium among tightly linked markers induces false-positive evidence of linkage for affected sib pair analysis.
Am J Hum Genet
2004
, vol. 
75
 (pg. 
1106
-
1112
)
24
Lander
 
E
Kruglyak
 
L
Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results.
Nat Genet
1995
, vol. 
11
 (pg. 
241
-
247
)
25
Karnolsky
 
IN
Cytogenetic abnormalities in chronic lymphocytic leukemia.
Folia Med (Plovdiv)
2000
, vol. 
42
 (pg. 
5
-
10
)
26
Gozzetti
 
A
Crupi
 
R
Tozzuoli
 
D
Raspadori
 
D
Forconi
 
F
Lauria
 
F
Molecular cytogenetic analysis of B-CLL patients with aggressive disease.
Hematology
2004
, vol. 
9
 (pg. 
383
-
385
)
27
Juliusson
 
G
Faguet
 
G
Common cytogenetic abnormailities.
Chronic Lymphocytic Leukemia
, vol. 
2004
 
Totowa, NJ
Humana Press
(pg. 
163
-
171
)
28
Schaid
 
DJ
Guenther
 
JC
Christensen
 
GB
et al. 
Comparison of microsatellites versus single-nucleotide polymorphisms in a genome linkage screen for prostate cancer-susceptibility loci.
Am J Hum Genet
2004
, vol. 
75
 (pg. 
948
-
965
)
29
Burger
 
JA
Kipps
 
TJ
Chemokine receptors and stromal cells in the homing and homeostasis of chronic lymphocytic leukemia B cells.
Leuk Lymphoma
2002
, vol. 
43
 (pg. 
461
-
466
)
30
Ghobrial
 
IM
Bone
 
ND
Stenson
 
MJ
et al. 
Expression of the chemokine receptors CXCR4 and CCR7 and disease progression in B-cell chronic lymphocytic leukemia/small lymphocytic lymphoma.
Mayo Clin Proc
2004
, vol. 
79
 (pg. 
318
-
325
)
31
Ishibe
 
N
Albitar
 
M
Jilani
 
IB
Goldin
 
LR
Marti
 
GE
Caporaso
 
NE
CXCR4 expression is associated with survival in familial chronic lymphocytic leukemia, but CD38 expression is not.
Blood
2002
, vol. 
100
 (pg. 
1100
-
1101
)
32
Ishisaki
 
A
Yamato
 
K
Nakao
 
A
et al. 
Smad7 is an activin-inducible inhibitor of activin-induced growth arrest and apoptosis in mouse B cells.
J Biol Chem
1998
, vol. 
273
 (pg. 
24293
-
24296
)
33
Ishisaki
 
A
Yamato
 
K
Hashimoto
 
S
et al. 
Differential inhibition of Smad6 and Smad7 on bone morphogenetic protein- and activin-mediated growth arrest and apoptosis in B cells.
J Biol Chem
1999
, vol. 
274
 (pg. 
13637
-
13642
)
34
Sebestyen
 
A
Barna
 
G
Nagy
 
K
et al. 
Smad signal and TGFbeta induced apoptosis in human lymphoma cells.
Cytokine
2005
, vol. 
30
 (pg. 
228
-
235
)
35
Klitz
 
W
Aldrich
 
CL
Fildes
 
N
Horning
 
SJ
Begovich
 
AB
Localization of predisposition to Hodgkin disease in the HLA class II region.
Am J Hum Genet
1994
, vol. 
54
 (pg. 
497
-
505
)
36
Machulla
 
HK
Muller
 
LP
Schaaf
 
A
Kujat
 
G
Schonermarck
 
U
Langner
 
J
Association of chronic lymphocytic leukemia with specific alleles of the HLA-DR4:DR53:DQ8 haplotype in German patients.
Int J Cancer
2001
, vol. 
92
 (pg. 
203
-
207
)
37
Raval
 
A
Tanner
 
SM
Byrd
 
JC
et al. 
Downregulation of death-associated protein kinase 1 (DAPK1) in chronic lymphocytic leukemia.
Cell
2007
, vol. 
129
 (pg. 
879
-
890
)
38
Rawstron
 
AC
Yuille
 
MR
Fuller
 
J
et al. 
Inherited predisposition to CLL is detectable as subclinical monoclonal B-lymphocyte expansion.
Blood
2002
, vol. 
100
 (pg. 
2289
-
2290
)
Sign in via your Institution