Abstract
Fetal hemoglobin (HbF) is strongly associated with clinical severity in the β-hemoglobinopathies, including sickle cell disease (SCD). In recent years, the three major HbF genetic loci (at BCL11A on chromosome 2p, HMIP-2 on chromosome 6q and Xmn1-HBG on chromosome 11p) have been more clearly characterized and mechanisms of the likely causal variants better defined. In this study, we have combined this new biological understanding with statistical methods to create a genetic "predictor" for HbF in SCD.
We chose 7 variants to represent the 3 HbF quantitative trait loci (QTL) to investigate their utility in predicting HbF levels, and, in turn, clinical severity of SCD. For BCL11A, we used 2 markers: rs1427407 (62kb downstream of BCL11A) localizes to an erythroid-specific enhancer (Bauer et al. Science 2013) and rs6545816 tags a second signal 58kb downstream of BCL11A. The genetic architecture of HMIP-2 as a QTL comprises two elements, A and B (Menzel et al. Ann Hum Genet 2014). We have represented HMIP-2A withthe 3bp deletion rs66650371, shown as a causal variant (Stadhouders et al. JCI 2014) plus the ethnicity marker rs9376090. HMIP-2B is less well-characterized, we selected: rs9494142 (near the MYB enhancer) and rs9494145. For the β-globin locus, we used the long-established Xmn1 marker (rs7482144) in the proximal promoter 158kb upstream of HBG2. This is likely not the variant itself, but in tight linkage disequilibrium with the causal element.
Of 892 initial patients (516 females, 376 males), we excluded 17 children aged under 5 because of the non-linear relationship between age and HbF at a young age (we confirmed this finding in our cohort). This left: 658 with HbSS, 206 with HbSC, 8 with HbSβ0 thalassemia, and 20 with HbSβ+ thalassemia. We then genotyped 666 patients with HbSS/HbSβ0 thalassemia for the 7 genetic variants. For each patient, we selected 'validated' HbF levels i.e. HbF not influenced by transfusion, drugs (especially hydroxyurea) or pregnancy. HbF levels were log-transformed (Ln).
We then used multiple linear regression models to identify variants which were independently associated with Ln-HbF levels. Using only age and sex as covariates revealed predictive power r2~10% which was orthogonal to (i.e. additive) the predictive power of the variants, and so we did not include them in subsequent analysis. Also, by adding α-globin status to the model where known (N=272), the r2 remained unchanged and is not significant for α-globin status.
We then normalized the 7 variants to take account of the mean allele count (a strongly predictive but rare variant may not explain much of the total population variance). We performed multiple linear regression to rationalize the 7 variants, and found 4 markers (rs6545816, rs1427407, rs66650371 and rs7482144) independently contributing HbF-boosting alleles (see table). Combining these 4 variants into a genetic risk model, as per the table, allows us to predict 21.8% of variability (r2) of HbF in our HbSS / HbSβ0 thalassemia patients.
We validated the 4-variant risk score first with a 5-fold cross-validation within the cohort which demonstrated a mean r2=22% for the 5 folds. We then replicated the findings in the cohort of HbSC patients (N=206) and found the 4-variant model to predict HbF with variability r2=27.5% (i.e. towards r2=44% seen in non-anemic individuals). Thus, our 4-variant model provides a robust approach to genetic prediction of HbF in SCD.
The predictive power appears to be larger for HbSC compared to HbSS (r2=27.5% vs 21.8%) which may be related to stress erythropoiesis in HbSS patients releasing immature erythrocytes as a non-genetic factor modifying HbF levels.
This process is a first step towards creating a global genetic predictive score in SCD: stratifying patients with SCD early in life would enable us to offer curative therapy (i.e. hematopoietic stem cell transplant) to those identified as genetically severe.
No relevant conflicts of interest to declare.
Author notes
Asterisk with author names denotes non-ASH members.