Abstract
Fetal hemoglobin (HbF) can inhibit the polymerization of sickle hemoglobin, and the HbF level is an important modulator of the severity and course of sickle cell anemia. Genetic regulation of HbF levels is complex and under active investigation. Although multiple quantitative trait loci have been discovered, it is estimated that half of the genetic variance of HbF levels remains unaccounted for. Genomic copy number variations (CNVs), defined as inherited duplications or deletions of kilo-to mega-base lengths of DNA, represent a significant source of genetic heterogeneity among humans that might be involved in HbF regulation. Additionally, CNVs can significantly alter assumptions about genotype frequencies in their genomic region, and are therefore important to locate for multiple types of genetic association studies. Here, we present a novel method for the high-resolution discovery of CVNs related to HbF levels in sickle cell anemia, using genome-wide association study (GWAS) data. We used the Illumina 610K single nucleotide polymorphism (SNP) genotyping array to examine 727 adult subjects with sickle cell anemia, with or without a thalassemia, who were enrolled in the Cooperative Study of Sickle Cell Disease (CSSCD; aged 18 to 69 years, mean age 31 years; 44% male; not on hydroxyurea therapy). The Illumina array consisted of ~610K probes spread across the entire genome. At each locus, the relative amount of DNA detected was compared to a reference and expressed as the log R ratio score (LRR). Normal diploid regions of DNA have LRRs close to zero, whereas regions with CNVs have LRRs that are either higher for areas of duplication or lower for areas of deletion. Using LRR information in the context of a GWAS, we developed a novel, two-step signal-processing technique that combines CNV discovery with subsequent phenotypical association analysis. First, the distribution of LRR values at each locus is stratified using a +/− 1.5 standard deviation band-pass filter. This created three groups: a central major group comprised of people with diploid amounts of DNA, and two minor variant groups, one composed of people with elevated LRRs, suggesting >2 DNA copies, and one of people with decreased LRRs, suggesting <2 DNA copies at that locus. To reduce noise, loci without at least one minor group containing >5% of the sample were excluded from further analysis. In the second step, a two-sample Student’s t-test was used at each locus to examine the variation in distributions of HbF between the major, diploid group and any variant groups with >5% of the population. Using this method, we examined chromosomes 2, 6, and 11, which include regions known to modulate HbF in patients with sickle cell anemia, individuals with β thalassemia, and in the normal population. We successfully detected multiple clear duplications and deletions (approx. 1 per 6–22 mbp, depending on the chromosome) that showed typical CNV LRR distributions with >10% of the population exhibiting the polymorphism. Several of these were mildly related to HbF levels (p<0.05), including deletions in ASB1 on chromosome 2, and HACE1 on chromosome 6, both ankyrin motif containing proteins involved in the ubiquitin ligase system, as well as an upstream duplication and intragenic deletion involving HLA-DRB5 on chromosome 6. None of these clear CNVs, however, overlapped regions known to affect HbF concentration. Additional potential CNVs were detected throughout each chromosome, many exhibiting atypical LRR distributions not easily classified as either a normal diploid or clear CNV region. Further studies are required to confirm the presence of a CNV at these atypical loci. With this method, we were able to detect CNVs and CNV breakpoints across a population with a single-probe resolution, to within <1kb in some cases. This resolution offers a distinct advantage over other detection methods that utilize a multiple-probe, sliding-window approach to detect LRR deviations in an individual sample. In conclusion, this two-step method of high-resolution detection of CNVs followed by analysis of phenotypical associations shows promise for explaining variations in expressed protein levels, such as those typical of HbF in sickle cell anemia, and possibly for future exploration of differences in HbF responses to therapeutics in sickle cell anemia.
Disclosures: No relevant conflicts of interest to declare.
Author notes
Corresponding author