Abstract
Background. The RHD and RHCE genes direct the expression of highly immunogenic antigens in the Rh system. These genes exhibit many types of genetic variation, including single nucleotide variants (SNVs), insertions and deletions, and larger structural variants (SVs) including RHD-RHCE hybrid alleles. Thus far, DNA-based testing methods, including next generation sequencing, have been confounded by the highly homologous nature of these genes in detecting some types of uncommon and complex variants at the RH locus. We applied BloodSeq, a new targeted next generation sequencing (NGS) method with custom genomic computational tools for the systematic, unbiased characterization of the RH locus and study of the causes of C expression.
Methods. The BloodSeq NGS targeted panel captures a 269kb region spanning the RH locus. Captured DNA is multiplexed, sequenced on Illumina Hiseq machines using paired-end 100-bp sequencing, reads aligned to the human reference genome (hg19) using BWA-MEM, and SNVs assessed using GATK. An RH- customized read-depth based method detects SVs including hybrid alleles. Four WHO DNAs (NIBSC) were used as references; 1135 samples were selected from a parent study of blood donors self-identified to be of Asian American or Native American descent. Donors were previously tested for D and C by serology and for CcEe by SNVs. RH hybrid alleles predicted by NGS were validated by quantitative multiplex PCR of short fluorescent fragment (QMPSF) analysis. Gene-specific long range PCRs were developed to confirm RHD-RHCE exon 2 hybrid alleles.
Results . Analysis of BloodSeq NGS data in these 1139 DNAs found that all C+ samples (by serology) have RHCE-RHD-RHCE gene conversion events involving RHD exon 2. In total there were 822 (72.17%) samples with these events, including 439 heterozygotes and 383 homozygotes. Predicted breakpoints were largely consistent across individuals, with the most common RHCE-RHD(2)-RHCE allele having a conversion region spanning a ~4.9kb genomic relative to RHCE . Other sized RHCE-RHD(2)-RHCE conversion events were detected ranging from 1kb to 6.8kb relative to RHCE . Three multi-exonic RHCE-RHD-RHCE SV events inclusive of RHD exon 2 were also identified. Notably, C+ associated hybrid alleles did not affect exon 1, and an RHCE*02/RHCE*04 (i.e. C+ allele)exon 1 SNV, c.48G>C, showed only moderate specificity (88.6%) in predicting C+ by serology in this minority study population.
Gene-specific long range PCR confirmed the RHCE-RHD(2)-RHCE hybrid alleles identified by NGS. Informed by NGS predicted hybrid breakpoints, Sanger sequencing of cloned gene-specific long range PCR products was used to characterize the common C+ RHCE-RHD(2)-RHCE intron 1 and 2 breakpoints. C+ specific variation was identified at each breakpoint. At the intron 1 breakpoint, there is a previously unreported SNV 100% specific for C+ in this study. At the downstream hybrid intron 2 region breakpoint, there is 109bp of exogenous sequence consistent with the 109 bp intron 2 "RHCE insertion" known to predict C+ expression. In addition to these findings, numerous known and novel SNVs, indels, and SVs were identified in both RHD and RHCE.
Conclusions. This work identifies RHCE-RHD(2)-RHCE hybrid alleles as the primary cause of C+ expression. Variation in the regions affected by these hybrid alleles provides fine mapping information for the minimal genomic region sufficient to confer C+ expression. We further characterized the common RHCE-RHD(2)-RHCE hybrid allele breakpoints and identified a previously unknown C+ diagnostic SNV. This work illuminates causes of C+ expression and provides the foundation for a DNA-based, high resolution Rh blood-typing method for the detection of clinically relevant RH locus genetic variation.
No relevant conflicts of interest to declare.
Author notes
Asterisk with author names denotes non-ASH members.