Key Points
Expression of the Xg blood group protein is governed by rs311103, and its minor allele disrupts a GATA motif to cause the Xg(a−) phenotype.
These data elucidate the genetic basis of the last unresolved blood group system and make genotyping for Xga status possible.
Abstract
The Xga blood group is differentially expressed on erythrocytes from men and women. The underlying gene, PBDX, was identified in 1994, but the molecular background for Xga expression remains undefined. This gene, now designated XG, partly resides in pseudoautosomal region 1 and encodes a protein of unknown function from the X chromosome. By comparing calculated Xga allele frequencies in different populations with 2612 genetic variants in the XG region, rs311103 showed the strongest correlation to the expected distribution. The same single-nucleotide polymorphism (SNP) had the most significant impact on XG transcript levels in whole blood (P = 2.0 × 10−22). The minor allele, rs311103C, disrupts a GATA-binding motif 3.7 kb upstream of the transcription start point. This silences erythroid XG messenger RNA expression and causes the Xg(a−) phenotype, a finding corroborated by SNP genotyping in 158 blood donors. Binding of GATA1 to biotinylated oligonucleotide probes with rs311103G but not rs311103C was observed by electrophoretic mobility shift assay and proven by mass spectrometry. Finally, a luciferase reporter assay indicated this GATA motif to be active for rs311103G but not rs311103C in HEL cells. By using an integrated bioinformatic and molecular biological approach, we elucidated the underlying genetic basis for the last unresolved blood group system and made Xga genotyping possible.
Introduction
Unraveling the molecular genetic bases of blood groups has facilitated implementation of novel concepts and diagnostic tools in transfusion medicine and related fields. Recent developments include the elucidation of several important blood group systems like JR,1,2 Lan,3 FORS,4 Vel,5-7 and AUG.8 Furthermore, understanding of the molecular genetic mechanism underlying P1 antigen expression was recently reported.9,10 Currently, all blood group systems except 1 have been resolved, and their polymorphic antigens have been predicted by genotyping efforts.11
Anti-Xga was first described in 1962 by Mann et al.12 Approximately 66% of men and 90% of women express the Xga antigen on their red blood cells (RBCs). Because of its skewed sex distribution, Xg was the first blood group system to be assigned to a specific chromosome: X. The underlying gene, PBDX, was identified by Ellis et al,13 but despite these landmark studies, Xg has remained the only system for which the genetic basis of antigen negativity is unknown. PBDX, now renamed XG, partly resides in pseudoautosomal region 1 (PAR1) on both sex chromosomes; the first 3 exons lie in PAR1, whereas the remaining 7 exist only on the X chromosome (Figure 1A). Thus, XG is disrupted on the Y chromosome and results in no protein product.14 Consistent with its location across PAR1, XG is one of few X-borne genes not inactivated. Despite >50 years of investigations into this enigmatic blood group, the presence or absence of Xga on RBCs cannot yet be predicted by genotyping, and the function of the Xg protein remains unknown.
Using an integrated bioinformatic and molecular biological approach, we aimed to establish the genetic basis underlying the Xg(a+) vs Xg(a−) phenotype. We hypothesized that Xga expression is transcriptionally regulated by a single SNP within the XG region, potentially disrupting an erythroid transcription factor binding site.
Study design
Calculated Xga allele frequencies in different populations were compiled from historical data based on Xga phenotyping (supplemental Table 1, available on the Blood Web site). Comparisons were made with frequencies for multiple XG variants as found in the 1000 Genomes Project.15 Expression quantitative trait loci were analyzed in the GTEx Portal.16 Transcription factor binding site analysis was performed in JASPAR.17
Phenotype/genotype correlation was performed on blood samples from 158 anonymized blood donors. An electrophoretic mobility shift assay with biotinylated oligonucleotide probes and mass spectrometric analysis of protein pulldowns were performed as described previously,9 and a luciferase reporter assay was run to assess function. Details on experiments are provided in supplemental Methods.
Results and discussion
Among 2612 investigated variants in the XG region, rs311103G/C located 3709 bp upstream of the erythroid transcription start site (Figure 1A) was identified as the SNP with the strongest correlation to the expected distribution (supplemental Table 2). Furthermore, rs311103 not only showed the best fit to the 1000 Genomes Project super populations15 (Figure 1A) but was also identified as the eQTL with the most significant impact on XG transcript levels in whole blood in the GTEx Portal (normalized effect size, −0.59 for C, the minor allele; P = 2.0 × 10−22; supplemental Table 3).16 This was in stark contrast to the other 47 tissues tested, where no effect of this SNP was noted (supplemental Figure 1). In addition, transcription factor binding site analysis identified disruption of a GATA family–binding motif by the minor allele rs311103C (supplemental Table 4), which lowered the relative binding energy score (Figure 1B). This drop is comparable to the decrease in binding score observed for c.−67T>C in the GATA1-binding site of the ACKR1 promoter, known to cause erythroid silencing of Fyb (FY*02N.01) and resistance to Plasmodium vivax invasion in individuals of African descent.18
To test if rs311103 determines Xga phenotype, 158 blood donors anonymized other than for sex were serologically typed for Xga by hemagglutination and flow cytometry. Initially, Sanger sequencing and, subsequently, a TaqMan SNP genotyping assay were used to determine rs311103 genotype, and mRNA analysis was correlated with Xga genotype and antigen expression (Figures 1C-E; supplemental Figure 2).
All female Xg(a−) samples identified (n = 13 [17.6%] of 74) were homozygous for the minor allele (C), whereas all clearly Xg(a+) samples (n = 120) regardless of sex carried at least 1 copy of the major allele (G). A sample that phenotyped Xg(a−) with 2 anti-Xga reagents (but that demonstrated weak reactivity by flow cytometry) was heterozygous, indicating that XG genotyping may overcome serological challenges with low sensitivity. Of the male Xg(a−) samples identified (n = 24 [28.6%] of 84), all carried at least 1 C allele, but in 11 (45.8%) of 24 of the samples, this was accompanied by G (Figure 1C). Because the X and Y chromosomes are assumed to be homologous in this region, the G allele in these samples is likely Y chromosome derived. However, attempts to obtain Y chromosome–specific amplicons by long-range polymerase chain reaction (∼40 kb) were unsuccessful. Real-time polymerase chain reaction was used to quantify mRNA transcripts from 59 samples (Figure 1D). Xg(a−) individuals regardless of sex had low to undetectable XG mRNA, suggesting that rs311103C prevents transcription of XG.
To determine if the GATA motif in this enhancer region is functional, electrophoretic mobility shift assay was performed. Strong binding was observed, with biotinylated oligonucleotides corresponding to rs311103G but not C (Figure 2A; supplemental Figure 3). Supershifts were noted after addition of anti-GATA1, and conversely, all binding was inhibited by addition of unlabeled probe. Oligonucleotide probe/nuclear extract complexes were analyzed by liquid chromatography–tandem mass spectrometry, and GATA1 was identified in the complex bound by the wild-type oligonucleotide probe only (supplemental Tables 5 and 6). GATA1 binding at rs311103 was further corroborated by available ChIP-seq data (supplemental Figure 4).19 Finally, we used luciferase reporter assays to show that the intact GATA1-binding motif could drive transcription of a downstream gene (Figure 2B).
Taken together, the in vitro data support the in silico prediction that the Xg(a+) blood group phenotype depends on an intact GATA1-binding motif 3.7 kb upstream of the XG transcription start site. The Xg(a−) phenotype is therefore the consequence of markedly decreased erythroid transcript levels, which in turn follow from disruption of the GATA1 site. Of other possible candidate regulatory SNPs identified, most were linked to rs311103, and none of the top candidates disrupted other potential GATA-binding motifs (supplemental Table 3). The regulatory site XGR, between XG and its neighbor gene, MIC2, was postulated as early as 1987,20 even before Xga expression was linked to XG/PBDX.13 MIC2 (now CD99) encodes CD99, which is also expressed on RBCs and other tissues.21 Interestingly, the expression of CD99 on RBCs correlates with Xga antigen status and sex.22 Although the function of Xg protein is completely unknown, it shows moderate homology with CD99, a widely distributed adhesion molecule involved in leukocyte migration and lymphocyte maturation.23
A vast majority of protein blood groups depend on SNP-based structural changes in the antigen-carrying protein,21 which leads to risk of an immune response when individuals are exposed to foreign antigens during pregnancy, transfusion, or transplantation. Anti-Xga is a relatively unusual blood group specificity, given how common Xg(a+) transfusion is to Xg(a−) individuals. Even if this were the result of weak antigenicity, our results offer a plausible explanation as to how trace amounts of Xg on RBCs, or Xg expression on nonerythroid cells, may prevent Xga immunization. Strikingly, the major antigens of the 2 last blood group systems to be resolved at the genetic level (P1 and Xga) both turned out to be quantitatively regulated to cause the antigen-negative phenotypes P2 and Xg(a−), respectively.9,10 In fact, the Erythrogene database reveals only 1 nonsynonymous XG SNP (c.178G>A; p.Asp60Asn) with a frequency ≥1% among the 2504 individuals in the 1000 Genomes database.24 Importantly, this SNP does not correlate to Xga status on RBCs (unpublished data).
We have solved a longstanding conundrum in the field of immunohematology and opened up the possibility of predicting Xga status of blood donors and transfusion recipients by rs311103 genotyping. Future studies are required to address the seemingly heterozygous men in whom prediction is hampered by the identical 5′ end of XG on the Y chromosome.
The online version of this article contains a data supplement.
The publication costs of this article were defrayed in part by page charge payment. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.
Acknowledgments
The authors acknowledge Marion Darlison for providing technical assistance.
This study was supported by the Knut and Alice Wallenberg Foundation (grant #2014.0312) (M.L.O.), the Swedish Research Council (grant #2014-71X-14251) (M.L.O.), and governmental Avtal om Läkarutbildning och Forskning grants (#ALFSKANE-446521) (M.L.O.) to University Healthcare in Skåne, Sweden.
Authorship
Contribution: M.M. performed bioinformatic analyses and interpreted data; Y.Q.L., K.V., S.K., L.B., and J.R.S. performed experiments and interpreted data; M.M., J.R.S., and M.L.O. designed the study; M.M., Y.Q.L., K.V., J.R.S., and M.L.O. wrote the paper; and all authors read, revised, and approved the manuscript.
Conflict-of-interest disclosure: The authors declare no competing financial interests.
Correspondence: Martin L. Olsson, Division of Hematology and Transfusion Medicine, Department of Laboratory Medicine, Lund University, BMC C14, SE-22184 Lund, Sweden; e-mail: martin_l.olsson@med.lu.se.