Abstract
Genome wide association studies (GWAS) are an important tool for identifying complex human disease loci. A key variable for GWAS is the degree of admixture or recent mixing of previously isolated populations. Undetected admixture can lead to spurious associations, although it may also be used as a strategy for mapping by admixture linkage disequilibrium (MALD). Development of reagents to identify admixture and to perform MALD may help to identify genetic modifiers of sickle cell disease (SCD). We have examined ancestry and genetic admixture in SCD patients living in North America. We first examined ancestry from family histories for 649 adults recruited to the Bethesda Sickle Cell Cohort Study in the eastern US, where 61% of subjects are African American, 22% African, 12% Caribbean or South American and 1.8% of other origins. We hypothesized that the high proportion of subjects from the Africa ancestry group would diminish the expected degree of admixture in the Bethesda cohort compared to other SCD cohorts.
To assess admixture at the genetic level, we identified ancestry informative markers (AIMs) from 3,804,602 SNPs genotyped by HapMap Phase III in CEU (European) and YRI (Nigeria) populations. These SNPs were combined with a published admixture mapping panel, yielding a new MALD panel of 2251 AIMs. We assessed the performance of our AIM panel by genotyping 221 HapMap individuals and 489 adults from the Bethesda cohort with an overall completion rate of 98.51% using Illumina iSelect arrays. HapMap samples had 98% concordance with publicly available data. Out of 2251 candidate markers, only 1806 successfully passed all quality controls (289 SNPs could not be genotyped; 82 were not AIMs; 74 had low concordance with HapMap). Using our HapMap genotype data, this admixture panel has a mean δ (difference in allele frequencies between populations) of 0.712 (SD 0.118) with an average inter-marker distance of 1.862 Mb. The utility of this 1806 marker panel for genome wide MALD in SCD was demonstrated with a proof of concept scan to map the known genetic locus underlying SCD in the Bethesda cohort.
To compare admixture in the Bethesda cohort to other SCD populations, we defined a subset of AIMs (n=360) that distinguish 6 major geo-ethnic groups from the Human Genome Diversity Panel (n=952 individuals). These results suggest that these 360 markers can distinguish contributions from geo-ethnic ancestry groups besides African and European populations. Using these markers in the Bethesda cohort, principle components analysis (PCA) showed African American or Caribbean/South American subjects had a wider range of admixture compared to Africans or HapMap populations (CEU, YRI and CHB). We then compared admixture proportions in the Bethesda cohort to 2 additional SCD populations, including 469 anonymous samples from a newborn screen SCD cohort in the western US and 439 SCD adults from a clinical trial in North America and Europe (WalkPhasst). Comparison of these SCD cohorts to 3 HapMap populations using PCA showed nearly 50% of the variation is explained by 2 major vectors that distinguish European/African and Asian/Native American ancestry. Finally, we compared admixture proportions across all 3 SCD cohorts using Structure. Here, we observed significant differences in ancestry proportions arising from Africa, Europe and Asia/Americas (P<0.001 by ANOVA). Specifically, African ancestry proportions were 0.640, 0.700, and 0.510 from the Bethesda cohort, WalkPhasst, and western newborns (all pairwise comparisons P<0.01), respectively. Significantly, there were differences between the cohorts from different regions of the US, where the Bethesda cohort had 0.039 Asian/Native American ancestry compared to 0.085 in the western SCD newborns (P<0.001). We conclude that there are significant admixture differences in SCD populations from eastern and western regions of the US. Overall, SCD patients in North America have a variable degree of genetic admixture that could affect interpretation of candidate gene or GWA studies. Furthermore, the degree of admixture adjustment varies for SCD subjects from different geographic regions within the US. Finally, our proof of concept studies suggest SCD may be ideal for MALD whole genome scans to identify genetic modifiers.
No relevant conflicts of interest to declare.
Author notes
Asterisk with author names denotes non-ASH members.