Abstract
In this study we analyzed the complete genomic sequences, except intron 1, and 2 regulatory regions of 6 common (ABO*A101, ABO*A201, ABO*B101, ABO*O01, ABO*O02, and ABO*O03) and 18 rare ABO alleles, 3 of which were new. This was done by phylogenetic analysis and correlating sequence data with the ABO phenotypes. The study revealed multiple polymorphisms in noncoding regions. The intron-based phylogenetic analysis revealed 5 main lineages: ABO*A, ABO*B, ABO*O01, ABO*O02, and ABO*O03. The genomic sequences of most rare ABO alleles differed slightly from those of the common alleles. Singular mutations or hybrid alleles were most common, but a few exhibited mosaic sequence pattern containing multiple exon and/or intron motifs from other ABO lineages. Thus, both an accumulation of mutations as well as an assortment of the mutations by recombination seems to be responsible for the ABO gene diversity. The prevalence of replacement mutations indicates positive selection for allelic diversity. Phenotype-genotype correlation showed that sequence variations within the complete coding sequence can affect A- and B-antigen expression. All variant ABO*A/B alleles and one new ABO*O03-like allele were associated with weak ABO phenotypes. These findings are suggestive of the requirement of a comprehensive coding sequence database for sequence-based phenotype prediction.
Introduction
The ABO blood group is the most important blood group system in transfusion and transplantation medicine. Its antigenic determinants are oligosaccharides located on glycoproteins and glycolipids expressed on erythrocytes and tissue cells and occur in various body fluids and secretions. Depending on an individual's ABO blood type, immunoglobulin M (IgM) antibodies directed against the missing A and/or B antigens are regularly present in serum; they constitute an immunologic barrier against incompatible blood transfusion and organ transplantation. The ABO gene codes for the glycosyltransferases that transfer specific sugar residues to H substance, resulting in the formation of blood group A and B antigens. This gene maps to chromosome 9, position 9q34.1-q34.2. It consists of 7 exons, ranging in size from 28 to 688 base pairs (bp), and 6 introns with 554 to 12 982 bp (Figure 1).1-3 The last 2 exons (6 and 7), which comprise 823 of 1062 bp of the transcribed mRNA, encode for the catalytic domain of ABO glycosyltransferases.
The 6 common ABO alleles in white individuals are ABO*A101 (A1), ABO*A201 (A2), ABO*B101 (B1), ABO*O01 (O1), ABO*O02 (O1v), and ABO*O03 (O2). In exons 6 and 7 they differ by only a few base positions. ABO*A201, which is responsible for blood group A2, is identical to ABO*A101 apart from a nonsynonymous substitution at nucleotide (nt) position 467 and a single deletion (1060delC) in exon 7. This deletion results in disruption of the stop codon and an A-transferase product with an extra 21 amino acid (AA) residue at the C-terminus. ABO*B101 is distinguishable from ABO*A101 at 7 nt positions: 3 synonymous mutations at positions 297, 657, and 930; and 4 nonsynonymous mutations at positions 526, 703, 796, and 803. The nt sequence of ABO*O01 differs from that of ABO*A101 by a single base deletion at position 261 in exon 6; this deletion shifts the reading frame, thus generating a premature stop codon. ABO*O01 is thought to be either silent or translated into a truncated and catalytically inactive peptide. In contrast, the ABO*O03 allele lacks the 216delG polymorphism but possesses nonsynonymous mutations that may abolish the protein's enzyme activity by altering the nt sugar binding site.
Eighty-three ABO alleles discriminated at 52 polymorphic sites within the coding region of the ABO gene have been reported in the literature so far.4-10 In most cases the investigators analyzed only exons 6 and 7. The number of described ABO alleles increases to 88 when nt differences within intron 6 are also considered. It has been shown that studies of the nt sequence of intron 6 are crucial for elucidation of the origin of some novel haplotypes.11-13 To our knowledge, there is no information available on sequence variation of the noncoding regions upstream from exon 6 and little data on mutations within the first 5 exons of the ABO gene and their relevance for the ABO phenotypes.14 In the present study, we therefore examined the complete exon/intron sequences (except for the huge intron 1 comprising 12 982 bp) and 2 regulatory regions of common and rare ABO alleles to evaluate the genetic diversity and diversification at the ABO locus. The genomic sequence data were first correlated with the associated ABO phenotypes then used for lineage definition.
Patients, materials, and methods
Collection of peripheral blood samples
A total of 55 peripheral blood samples were analyzed. Twenty-five blood samples were obtained from unrelated healthy blood donors with common ABO phenotypes, and 30 were obtained from unrelated patients (n = 19) or healthy blood donors (n = 11) with variant ABO blood groups. Blood was obtained with informed consent as approved by the local ethics committee of the Hannover Medical School. Samples from 18 of the latter individuals were already included in previous ABO genotyping studies.6,10 The ethnic origin of the patients/donors was predominantly German (n = 49). The remaining individuals originated from Africa (n = 1), Bosnia (n = 1), India (n = 1), Italy (n = 2), and Turkey (n = 1). Data on the patients' diagnoses were not collected.
Blood group serology
ABO typing of red blood cells (RBCs) was carried out as described previously, and the subgroups were classified according to current recommendations.6,10,15 The following commercial antisera and lectins were used: anti-A (immuClone, monoclonal; Immucor, Roedermark, Germany; Seraclone; Biotest, Dreieich, Germany; BioClone, monoclonal IgM, murine; Ortho-Clinical Diagnostics, Raritan, NJ), anti-B (immuClone, monoclonal; Immucor; BioClone, monoclonal IgM, murine; Ortho-Clinical Diagnostics), anti-AB (Seraclone Anti-AB; Biotest), Ahel from Helix pomatia (Immucor), anti-A1 from Dolichus biflorus (Lectin; Mast Diagnostics, Reinfeld, Germany), and monoclonal anti-H (Mast Diagnostics). All reagents were used according to the manufacturers' instructions. In cases where the subgroup status could not be determined due to the lack of saliva samples, the variant ABO phenotypes were assigned the more general terms Aweak and Bweak.
ABO genotyping of exons 6 and 7
The ABO genotype of all blood samples was first determined by screening exons 6 and 7 using a polymerase chain reaction method with sequence-specific primers (PCR-SSP).10 Twenty-five healthy blood donors with common ABO blood groups and genotypes ABO*A101/ABO*A101 (n = 3), ABO*A201/ABO*A201 (n = 2), ABO*A101/ABO*A201 (n = 2), ABO*B101/ABO*B101 (n = 2), ABO*A101/ABO*B101 (n = 2), ABO*O01/ABO*O01 (n = 4), ABO*O02/ABO*O02 (n = 3), ABO*O01/ABO*O02 (n = 4), and ABO*O02/ABO*O03 (n = 3) were thereby selected.
PCR amplification and sequencing
Strategy. Our aim was to analyze all genomic sequences (except intron 1) and 2 regulatory regions (promoter, from nt –118 to –1; enhancer CBF/NF-Y, a 215- or 344-bp region around nt –3800) of the ABO gene of each individual. First, blood samples homozygous for ABO*A101, ABO*B101, ABO*O01, or ABO*O02 (samples homozygous for the less frequent ABO*O03 allele were not available) were investigated using a generic sequence strategy (templates for sequence analysis were generated by amplifying both ABO alleles simultaneously) to establish an intermediate database. Then, samples heterozygous for common or rare ABO alleles were sequenced, first by generic sequencing then by haplotype-specific sequencing (templates for sequence analysis were prepared using allele-specific primers) of at least 1 of the 2 alleles to define the cis/trans linkage of the polymorphic sites. The design of the allele-specific amplification primers was based on the growing intermediate sequence database. Thus, each allele was sequenced from at least 2 independent PCR products. Most sequences were obtained by forward and reverse sequencing. All amplification and sequencing primers as well as primer pairs used for amplification are listed in Tables 1, 2, 3.
PCR amplification. The PCR reaction mixture (final volume: 20 μL) contained 200 ng genomic DNA, PCR buffer (5 × buffer: 300 mM Tris-HCl, 75 mM ammonium sulfate, 12.5 mM MgCl2, pH 9.0), 200 μM of each dNTP (deoxynucleoside triphosphate), 5.0 pmol of each amplification primer, and 0.3 units Taq DNA polymerase (Platinum Taq; GIBCO BRL, Karlsruhe, Germany). PCR amplification was carried out in a GenAmp PCR system 9600 (Perkin Elmer, Norwalk, CT). After an initial denaturation step at 94°C for 2 minutes, the samples were subjected to 30 cycles of amplification, consisting of ten 2-temperature cycles (94°C for 10 s and 65°C for 60 s) followed by twenty 3-temperature cycles (94°C for 10 s, 61°C for 50 s, and 72°C for 30 s). The amplification products were separated electrophoretically and visualized on a 2.5% agarose gel prestained with ethidium bromide (0.2 μg/mL). If 2 fragments occurred in the enhancer CBF/NF-Y region, the amplification products were purified from agarose gel for direct sequencing using the QIAquick gel extraction kit (Quiagen, Hilden, Germany) according to the manufacturer's instructions.
Sequencing of PCR products. Sequencing was carried out by dye terminator cycle sequencing chemistry (DYEnamic ET terminators, Amersham Pharmacia Biotech, Freiburg, Germany) as described previously.5,9 Briefly, the sequencing reactions were performed in a 20-μL sample volume using 4 μL dye terminator mix containing fluorescence-labeled dideoxynucleoside triphosphate (ddNTPs), dNTPs, Thermo Sequenase II DNA polymerase, and 3 pmol biotinylated sequencing primer. This mixture was subjected to twenty-five 3-temperature cycles (96°C for 10 s, 50°C for 5 s, and 60°C for 4 min). For purification, the biotinylated products were attached to streptavidin-coated paramagnetic beads (Dynal, Oslo, Norway) according to the manufacturer's instructions. Samples were washed once with 60 μL of 70% ethanol and air dried. The single-stranded sequencing fragments attached to the beads were then resuspended in 4 μL loading dye (Amersham Pharmacia Biotech). The mixture was heated at 90°C for 2 minutes, and 1.5 μL was loaded onto a 0.2-mm–thick layer of 5% polyacrylamide-7M urea gel. Electrophoresis was run at constant 2700 V for 4 hours on an ABI 377 sequencer (PE Biosystem, Foster City, CA).
Phylogenetic analysis
The intron sequences of the 110 ABO alleles examined in this study were phylogenetically analyzed. The sequences were aligned using CLUSTAL W software (European Molecular Biology Laboratory, Heidelberg, Germany) run on the graphic interface program BioEdit version 5.0.9 (North Carolina State University, NC).16,17 Phylogenetic trees showing the phylogenetic relationship between the alleles were constructed according to the neighbor-joining method using Treecon version 1.3b (Ghent University, Ghent, Belgium).18,19 The trees were unrooted to get a topology representing the relative genetic distances between the ABO alleles.
Classification and nomenclature of ABO alleles
The ABO alleles were named according to the unofficial nomenclature used in the Blood Group Antigen Gene Mutation Database (http://www.bioc.aecom.yu.edu/bgmut/abo.htm).
Results
A total of 24 different ABO alleles were analyzed in this study (Tables 4, 5). The number of alleles for each ABO allele specificity detected ranged from 1 to 21. Six common alleles (ABO*A101, ABO*A201, ABO*B101, ABO*O01, ABO*O02, ABO*O03) and 18 rare ABO alleles, including 3 new alleles (ABO*Aw07, ABO*Aw08, ABO*Bw08), were studied. Most of the ABO alleles analyzed originated from Germans. The rare alleles ABO*Aw06, ABO*Ael03, ABO*Bw07, and ABO*O06 and the common alleles ABO*A201,ABO*B101, ABO*O01, and ABO*O02 were identified in individuals from Germany and in persons from Bosnia, India, Italy, and Turkey, respectively. All rare ABO*A and ABO*B alleles, including the new alleles, were associated with weak A or B phenotypes. The number used to denote the nt positions of the exons refers to the mRNA sequence starting from the beginning of the coding region. The number used to denote the nt positions of the introns refers to each intron separately.
Sequence diversity of the coding region of the ABO gene
A total of 30 nt mutations were found in the coding region of the ABO gene. The mutations consisted of 15 transitions, 10 transversions, 3 single base deletions, and 2 single base insertions, most of which were located within exon 7. Twenty-one of the nt mutations led to alteration of the AA residue of ABO glycosyltransferase (data not shown).
Three new mutations within the nt sequence of exon 7 were identified in this study (Table 4). A new phenotypically relevant nt mutation was detected in a new ABO*A allele named ABO*Aw07. This allele was found in a blood sample categorized as Aweak and had the same nt sequence as ABO*A201 except for an additional mutation at position 592, where C was replaced by T, causing Arg198Trp substitution. Another new nt mutation, 1037A>T, was found in an ABO*B101-like allele named ABO*Bw08, causing Lys346Met substitution. The patient who carried the new ABO*B allele showed very weak B-antigen expression and had weak reacting anti-B antibodies in his serum. A very interesting new mutation, 488C>T, was found in a blood donor initially genotyped as homozygous for ABO*O03. The predicted AA substitution, Thr163Met, was associated with weak A-antigen expression that was detectable only by adsorption/elution. All 3 new ABO alleles were found in individuals originating from Germany.
All nt sequence variations detected in the coding region upstream from exon 7, including 4 single point mutations (3 nonsynonymous, 1 synonymous) and a very rare insertion in exon 2, were already known. The 4 point mutations were present in exon 3 (106G>T), exon 4 (188G>A, 189C>T), and exon 5 (220C>T) of ABO*O02-like alleles. Only the 220C>T base substitution could be detected in ABO*O03-like alleles. The insertion, 87-89insG, was identified in exon 2 of an ABO*O allele that was tentatively called ABO*O41. This insertion causes a frame shift that produces a stop codon after nt 165 and premature termination of translation after AA 55. This abnormal ABO*O allele has been described only once before.4 We detected it in a German blood group O donor.
Sequence diversity of noncoding regions of the ABO gene
The noncoding regions of the ABO gene were found to contain 61 polymorphic sites, most of which were located in introns 4 and 6. These polymorphisms consisted of simple point mutations (40 transitions, 15 transversions), one 2-bp deletion, and 5 insertions of up to 13 bp (Table 5). The nt sequences in the promoter region of all common and rare ABO alleles were identical. Allele-specific motifs were found in the enhancer CBF/NF-Y region, as was described previously.5 All non-ABO*A101- and non-ABO*O03-like alleles exhibited a 344-bp fragment containing four 43-bp repetitive motifs. The ABO*O02-like alleles and 51% of the ABO*O01-like alleles also displayed a 41G>C mutation in the first (5′) 43-bp sequence. The ABO*A101- and ABO*O03-like alleles, on the other hand, had a 215-bp enhancer CFB-NFY fragment with only a single copy of the 43-bp sequence motif bearing a 41G>A substitution.
As can be seen in Table 5, we identified 5 different allelic groups by a number of recurrent mutations, each group consisting of one major ABO allele and a few rare alleles. One ABO*A-, ABO*B-, ABO*O01-, ABO*O02-, and ABO*O03-like allele group, respectively, was distinguished, corresponding to 5 main lineages identified in the phylogenetic sequence analysis: ABO*A, ABO*B, ABO*O01, ABO*O02, and ABO*O03. These lineages were best discriminated by analyzing introns 5 and 6, both of which contained highly conserved lineage-specific sequence motifs (Figure 2). In introns 2 to 4, the ABO*B alleles clustered together with the ABO*A alleles, whereas in intron 6 they clustered with the ABO*O03 group.
Inter- and intra-lineage allelic diversity at the ABO locus
All sequence variations found in this study are summarized in Figure 3. We discriminated 5 different allelic groups, each of which displayed characteristic sequence motifs. The ABO*B allelic group differed from the ABO*A allelic group in that it had a series of additional mutations downstream from exon 5. The only exception was the rare ABO*B(A)03 allele, which lacked the characteristic ABO*B mutation in intron 5 but had a unique point mutation in intron 3. The ABO*O01-like alleles were characterized by 261delG polymorphism in exon 6 and by 9 additional point mutations, predominantly in intron 2. The ABO*O02-like alleles had multiple mutations extending from intron 2 to exon 7, including the A-glycosyltransferase activity abolishing deletion at nt position 261. The 2 ABO*O03 group alleles exhibited a patchwork sequence pattern containing ABO*A-, ABO*B-, ABO*O01-, and ABO*O02-specific exon/intron sequence motifs and additional characteristic mutations (eg, deletion of 2 bp at nt positions 1095 and 1096 in intron 3).
A number of rare alleles differed only slightly from the major ABO alleles by single exon and/or intron mutations (eg, ABO*A201, ABO*B(A)03, ABO*O011, and ABO*O06). By contrast, the 2 rare ABO alleles ABO*Ax08 and ABO*O12 seemed to be hybrid alleles that occur due to single recombination events; they were classified as ABO*101-ABO*O02 (ABO*Ax08) and as ABO*A101-ABO*O02 or ABO*B101-ABO*O02 (ABO*O12), respectively. Their crossover positions were located between nts 261 and 297 of exon 6 (ABO*Ax08) and between nt 437 of intron 2 and nt 106 of exon 3 (ABO*O12), respectively. The sequence pattern of the rare ABO*O alleles ABO*O39 and ABO*O40 were more complex. Each of these alleles exhibited 2 exon and/or intron nt sequence motifs upstream from intron 5 that were specific for ABO*A and ABO*O01 group alleles, respectively. In 2 very unusual ABO*O alleles, ABO*O21 and ABO*O41, both belonging to the ABO*A group, only singular polymorphisms were detected, including A-transferase activity abolishing mutations in exon 6 (261delG) and exon 2 (88insG), respectively.
Discussion
In the present study, we investigated the entire genomic sequence (except for intron 1) and 2 regulatory regions of 110 common and variant ABO haplotypes to identify variations in the noncoding regions and polymorphisms in the coding region possibly responsible for common and variant ABO phenotypes. Only known nt polymorphisms were detected in the coding region outside exon 7. Twenty-one known and 3 new ABO alleles were thereby identified. Each new allele was characterized by the addition of a new single base substitution in exon 7. A 592C>T substitution found on an ABO*A201 background was associated with very weak A-antigen expression (ABO*Aw07). This is the first time that a mutation relevant for ABO subgroup expression was detected between nts 548 and 641 of the coding sequence. This finding suggests that changes in this region can also affect the sugar binding domain of ABO glycosyltransferase. However, the question of whether the new mutation acts alone or in combination with typical ABO*A201 polymorphisms to yield altered A-antigen expression is still unanswered. The second nonsynonymous new mutation (1037A>T) was located near the 3′ end of the gene. The occurrence of this mutation in an ABO*B101-like allele was associated with a diminution of B-antigen expression (ABO*Bw08). This was not surprising since an alteration of B-antigen expression has already been described for a mutation at position 1036 in a variant ABO*B allele.14 The third phenotypically relevant mutation was found in a genotype with ABO*O03-defining mutations (297A>G, 526C>G, 802G>A) in homozygosity. While the mutation at position 802 is known to abolish the enzymatic activity of the ABO glycosyltransferase, the simultaneous presence of a new mutation at position 488 near the 5′ end of exon 7 seemed to reactivate the A-transferase activity. The resulting new ABO*A allele (ABO*Aw08) is the first reported ABO*O-like allele with an additional mutation that seems to be responsible for “reactivation” of A-antigen expression. Although mutations in the 5′ region of exon 7 were thought to be incapable of drastically altering enzymatic activity or specificity, our recent finding of a single base substitution at nt position 502 in 3 individuals with identical weak A phenotypes strongly suggests that changes in the AA structure in this region can have a significant effect on A-transferase activity.6
Some sequence variations in noncoding regions of the ABO gene have already been reported. These include variations in minisatellite repeats in the 5′ untranslated region (UTR) and single-nt polymorphisms in intron 6 and the 3′ UTR.11-13,20 The pattern of these variations was concordant with the major ABO alleles. Moreover, the discovery of ABO allele-specific haplotypes of intron 6 allowed for identification of break points of recombination events and localization of recombination hot spots around intron 6 and exon 7.11,13 However, allelic variations in introns 1 to 5 have not yet been examined. In this study, haplotype-specific sequence analysis of introns 2 to 6 of 24 different common and rare ABO alleles revealed extensive sequence variations in all investigated introns. While only a few of the observed intron mutations were specific for single rare alleles, highly conserved sequence motifs closely resembling the major ABO alleles were found in each intron. Accordingly, our phylogenetic analysis revealed 5 main ABO allele lineages: ABO*A, ABO*B, ABO*O01, ABO*O02, and ABO*O03. This finding is largely consistent with those of Saitou and Yamamoto,21 who phylogenetically analyzed the coding sequences of human and primate ABO alleles. However, our intron data may carry more weight than the exon data because substitutions in introns are presumably less susceptible to selection pressure, so they may reflect the phylogeny of the alleles more reliably.22
The frequency of replacement mutations (73%) indicates the occurrence of positive selection for allelic diversity. Two mechanisms are considered to be responsible for genetic diversity and diversification at the ABO locus: point mutations and genetic recombination (crossovers and gene conversion).9 By extending the sequence analysis to the entire coding region and to 5 of the 6 introns of the ABO gene, we were able to confirm the important role of both mechanisms in generating the extensive sequence variations at the ABO locus. The origin of a number of rare alleles (eg, ABO*Ael03, ABO*Bw07, ABO*O26, ABO*Aw08) can simply be explained by single mutations that occurred on the background of common ABO alleles, whereas other alleles (eg, ABO*Ax08, ABO*O40) arise mainly or solely due to one or more recombination events. Most interestingly, the ABO*O03-like alleles exhibited a mosaic structure containing large exon and/or intron sequence motifs from each of the other 4 allelic groups. Since only a few additional ABO*O03-specific polymorphisms were found, we hypothesize that ABO*O03 is the most recent allelic group and that it arose by recombination of ABO*A101, ABO*B101, ABO*O01, and ABO*O02 lineages. The fact that ABO*O03-like alleles have only been found at low frequencies in various white populations and in blacks but not yet in Chinese and American Indians supports this theory.4 Due to the apparently exceptional role of recombination in generating new alleles at the ABO locus, the lack of group-specific mutations in some rare alleles (eg, 703G>A in exon 7 of ABO*B(A)03) or the sharing of sequence motifs between lineages (eg, 261delG in exon 6 of ABO*O21) is most probably due to an inter-lineage nt sequence transfer rather than backward mutation or convergent evolution, as was previously suggested for the RH and HLA loci.23-25 Given the rarity of variant ABO alleles, the theory that such alleles are intermediate products of common ABO allele formation may be less plausible. Chi sequences and chi-like motifs in the 3′ end of intron 6 have been considered to be an initiator of crossover events.26 However, our findings in rare ABO alleles indicate that intragenic recombination events outside intron 6 may also account for genetic diversity at the ABO locus.
In conclusion, the comprehensive database of ABO sequences provided by this study indicates that there is a high level of recombinant activity in almost all parts of the ABO gene. The presence of a large number of recurrent mutations is characteristic for the considerable diversity of the ABO gene. Phenotype-genotype correlation revealed that an extensive heterogeneity underlies the molecular basis of various alleles that generate serologic ABO subgroups. ABO sequence variations also include phenotypically relevant replacement mutations in exons 2 to 5. Thus, ABO genotyping strategies would have to consider all variations distributed across the entire coding region to achieve safe phenotype prediction. Therefore, ABO genotyping remains mainly reserved as a complement to serology for determination of inherited ABO subgroups and exclusion of ABO*B allele markers in the acquired B phenotype. The data on highly conserved and lineage-specific intron sequence motifs provide a powerful base for elucidating the origin of variant ABO alleles and may prove valuable for anthropologic studies on the origins and movements of populations.
Prepublished online as Blood First Edition Paper, June 26, 2003; DOI 10.1182/blood-2003-03-0955.
The publication costs of this article were defrayed in part by page charge payment. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 U.S.C. section 1734.
We are grateful to Dr Volker Lenhard (Biotest AG, Dreieich, Germany) for supplying multiple blood samples with rare ABO phenotypes. Part of this work will appear in Anke Kollmann's doctoral thesis.