Key Points
Rare variants causal of recessive hemostasis disorders have clinical consequences in carriers.
Common variants modify these consequences and are one of the reasons for different phenotypic expressivity.
Abstract
Rare genetic diseases affect millions, and identifying causal DNA variants is essential for patient care. Therefore, it is imperative to estimate the effect of each independent variant and improve their pathogenicity classification. Our study of 140 214 unrelated UK Biobank (UKB) participants found that each of them carries a median of 7 variants previously reported as pathogenic or likely pathogenic. We focused on 967 diagnostic-grade gene (DGG) variants for rare bleeding, thrombotic, and platelet disorders (BTPDs) observed in 12 367 UKB participants. By association analysis, for a subset of these variants, we estimated effect sizes for platelet count and volume, and odds ratios for bleeding and thrombosis. Variants causal of some autosomal recessive platelet disorders revealed phenotypic consequences in carriers. Loss-of-function variants in MPL, which cause chronic amegakaryocytic thrombocytopenia if biallelic, were unexpectedly associated with increased platelet counts in carriers. We also demonstrated that common variants identified by genome-wide association studies (GWAS) for platelet count or thrombosis risk may influence the penetrance of rare variants in BTPD DGGs on their associated hemostasis disorders. Network-propagation analysis applied to an interactome of 18 410 nodes and 571 917 edges showed that GWAS variants with large effect sizes are enriched in DGGs and their first-order interactors. Finally, we illustrate the modifying effect of polygenic scores for platelet count and thrombosis risk on disease severity in participants carrying rare variants in TUBB1 or PROC and PROS1, respectively. Our findings demonstrate the power of association analyses using large population datasets in improving pathogenicity classifications of rare variants.
Introduction
More than 9000 rare diseases have been described, affecting over 400 million people worldwide.1 High-throughput sequencing has enabled the resolution of the genetic etiology of over 50% of rare diseases.2 However, identification of pathogenic variants for many suspected inherited diseases, including hemostasis disorders, remains challenging—in part, because there are often no reliable metrics to distinguish between loss-of or gain-of function variants (LoF, GoF, respectively).3 Moreover, individuals carry many pathogenic variants without any obvious clinical sequelae, indicating either incomplete penetrance or incorrect variant classification.4-6
To improve classification of candidate variants for rare diseases, most diagnostic laboratories adopt standardized reporting practices in which pathogenicity evidence is considered by a multidisciplinary team (MDT), using knowledge from variant catalogs (eg, ClinVar, Human Gene Mutation Database [HGMD]) and the American College of Medical Genetics and Genomics (ACMG) guidelines.7-12 Variant pathogenicity classification within these catalogs is primarily based on published evidence, for which there are several important constraints. Firstly, most studies of rare inherited disorders are based on few pedigrees or genetically-independent cases.2 Secondly, reliable information on the minor allele frequency (MAF) remains inadequate for many variants, especially for populations of a non-European ancestry.2,13-16 Thirdly, genetic admixture remains a significant cause of inflation for variant pathogenicity.17-19 Finally, the predicted consequence of nonsynonymous single nucleotide variants on protein function, inferred from in vitro models or structural studies, may not reflect human physiological processes.20-23
Challenges to reliable variant classification adversely impact reporting in all rare diseases, and are well illustrated by hemostasis disorders. Clinical and laboratory phenotypes are well-characterized, but systematic variant reporting in large cohorts of patients with bleeding, thrombotic, and platelet disorders (BTPDs) such as the National Institute for Health and Care Research (NIHR) rare disease BioResource has yielded unequivocal identification of pathogenic or likely pathogenic variants in ∼50% of the cases.24 Initiatives like ClinGen aim to improve the accuracy of pathogenicity assignment through application of refined and disease-specific ACMG/Association for Molecular Pathology (AMP) rules; this immense, manual-curation task has so far been completed for 3 BTPD diagnostic-grade genes (DGGs): RUNX1 and the Glanzmann thrombasthenia (GT) genes ITGA2B and ITGB3.25-28
Genome-wide association studies (GWAS) offer an additional approach to understanding the genetic architecture of BTPDs, as effect-size estimates for thousands of variants with MAF ≥0.1% are now available (GWAS-variants hereafter). These GWAS studies include complete blood count parameters, such as platelet count and mean platelet volume (MPV), and have also identified hundreds of variants conferring risk for venous thromboembolism (VTE).29-31 However, using imputed genotypes reduces the power to determine the effect size of rare variants when compared with direct genotyping.15,31-33 With the release of whole exome sequencing (WES) genotypes for UK Biobank (UKB) participants, accurate rare-variant counts became available for use in association studies.15,32,34,35
Using a GWAS–like statistical framework and electronic health record (EHR) data, we calculated the clinical associations of rare variants in DGGs for inherited BTPDs, including over 100 variants for autosomal recessive (AR) platelet disorders, to improve the current knowledge about carrier phenotypes.36-43 Moreover, using an interactome of 18 410 proteins and 571 917 interactions, we illustrated how the interplay between rare variants and hundreds of GWAS-variants explains, at least partially, the variable penetrance of rare variants.2,24,44 Ultimately, these findings narrow the distinction between dominant and recessive modes of inheritance (MOI), suggesting variant effects are additive, and highlight how statistical-genomic approaches can be used to improve variant classification in clinical genetic reporting.45
Methods
Rare disease gene list
We compiled a list of 4849 genes implicated in rare Mendelian diseases, including 93 BTPD DGGs (supplemental Methods, available on the Blood website; supplemental Table 1).
Catalog of pathogenic and likely pathogenic variants
A list of 299 606 previously reported pathogenic and likely pathogenic variants (hereafter referred to as cataloged-variants), including single nucleotide variants and insertion/deletions <50 bp (indels), was compiled from: (1) ClinVar “pathogenic” or “likely pathogenic” variants, that are not also “benign” or “likely benign” (in cases of conflict of interpretation); (2) HGMD Pro 2019.4, disease-causing or probable/possible disease-causing variants; (3) NIHR BioResource (NBR)-curated variants for BTPDs and European Association for Haemophilia and Allied Disorders resources.2,24,46 The NBR-curated variants that mapped to GRCh37 were remapped to GRCh38 using AssemblyConverter (https://www.ensembl.org/Homo_sapiens/Tools/AssemblyConverter, Ensembl v.100). Using the Ensembl Variant Effect Predictor (VEP; ensembl-vep: 100.2), we extracted the impact, transcript effect, and the Combined Annotation-Depletion Dependent (CADD) scores for each variant.47,48
The UKB cohort
UKB is a prospective cohort of 500 000 British individuals aged 40 to 69 years when recruited between 2006 and 2010.49,50 Genotype and EHR data were accessed under UKB application 13745. Variant call data derived from WES results of 200 000 participants were downloaded from data-field 23151 (https://biobank.ndph.ox.ac.uk/ukb/field.cgi?id=23151; November 2020) and EHR data from “Category 2002” (March 2021). Quality control and filtering steps of UKB genotypes are described in supplemental Methods. The benefit of our analysis relies on extensive manual curation and interpretation of pathogenic or likely pathogenic variants. This began after the release of the 200 000 WES, limiting the analyses to the observed variants in this subset. After checking a subset of variants in the entire cohort, we confirmed that the expanded cohort did not affect variant interpretation or the manuscript message.
Curation of variants by MDTs
Three MDTs, each comprising a clinician, geneticist, and bioinformatician, manually curated the rare variants observed in UKB in (1) platelet, (2) thrombosis (PROC/PROS1/SERPINC1), and (3) bleeding/coagulation (VWF/F8/F9) DGGs. MDTs were blinded to EHR data and review was performed on a variant-by-variant basis. After considering factors such as the MAF compared with the prevalence of the associated disorder, each variant received a decision of “accept,” “undecided,” or “reject” as pathogenic or likely pathogenic (supplemental Table 3; supplemental Methods).
Statistical analyses
Single-variant linear and logistic regression analyses were used to calculate effect sizes (in standard deviations [SD]) and odds ratios (ORs) for relevant phenotypes in unrelated UKB participants of European ancestry (n = 131 022). Phenotype and EHR selection are described in supplemental Methods. Analyses of the continuous traits platelet count and MPV were carried out for BTPD variants in platelet disorder genes with at least 5 carriers (ie, participants with the variant under analysis), after adjusting for relevant covariates and genomic principal components, and excluding participants with major blood disorders (supplemental Methods).29 For calculating ORs for bleeding and VTE, BTPD variants in VWF/F8/F9 and PROC/PROS1/SERPINC1 respectively were included, if present in at least 10 carriers (for F8/F9, females only), with at least 1 recorded event. Covariates used in the OR estimate are in supplemental Methods. Nominal P values are reported, with significance at P < .05. Our decision was guided by the fact that we do not present a new discovery analysis here, instead testing previously reported pathogenic and likely pathogenic variants in what is methodologically more similar to a replication analysis.
Interactome and polygenic scores (PGS)
We generated a protein-protein–interaction network by combining STRING (v11.0, score >0.75) with the interactome developed by the Open Targets project (www.opentargets.org; version November 2019),51,52 a human-focused compilation of physical interactions (IntAct; https://www.ebi.ac.uk/intact/home) with causal relationships and pathways (Reactome https://reactome.org/; SIGNOR https://signor.uniroma2.it/).53-55 All nodes were mapped to Ensembl Gene Identifiers, and duplicated edges and self-loops were removed. The network-propagation analysis is described in supplemental Methods. PGS for platelet count was calculated as previously validated (PGS Catalog PGP000078); the method to estimate additive effect sizes is explained in supplemental Methods.31
Results
UKB participants carry rare pathogenic and likely pathogenic variants
Of our 299 606 cataloged-variants, less than a quarter (n = 65 503) were in both ClinVar and HGMD (Figure 1A-B), highlighting their differing deposition strategies (supplemental Methods). The majority of cataloged-variants (89.6%) had high or moderate VEP-classified impact (Figure 1C) with high CADD scores (median PHRED CADD = 24.9, Figure 1D).48 A total of 82 415 cataloged-variants were observed in at least 1 of the 140 214 unrelated UKB participants; each participant had a median of 7 variants (interquartile range 5-10) (supplemental Figure 1). These 82 415 variants were located in 4150 (85.6%) of the 4849 rare disease genes; they were significantly depleted of high-impact variants (Figure 1C; Fisher exact test, OR = 0.342; P < 2.2 × 10−16) and had lower CADD scores than the 217 191 cataloged-variants not observed in our study population (Figure 1D; Kolmogorov-Smirnov test, D = 0.137, P < 2.2 × 10−16; supplemental Methods).
We next performed a detailed analysis of the 12 765 cataloged-variants in DGGs for BTPDs, a subset of rare diseases in which we have thoroughly characterized the phenotypic features and genetic architecture (supplemental Figure 2).2,24,56 There was a positive correlation between the number of variants observed in UKB participants and the number of cataloged-variants in BTPD genes (Pearson correlation, estimate = 0.642, P = 2.76 × 10−11; Figure 2A-C). Following variant filtering (supplemental Methods), 1465 rare variants in 79 of the 93 BTPD DGGs (hereafter referred to as BTPD variants) were observed in 18 300 (13.1%) of the 140 214 unrelated UKB participants (supplemental Table 3). Similar to our observations for all cataloged-variants, these BTPD variants were depleted of high CADD score variants compared with all 12 765 cataloged-variants in BTPD DGGs (Kolmogorov-Smirnov test, D = 0.153, P < 2.2 × 10−16; supplemental Figure 3).
MDTs considered the pathogenicity likelihood for 967 of the 1465 BTPD variants, comprising all BTPD variants in platelet disorder DGGs (n = 484), and those in F8/F9/VWF (n = 311) and PROC/PROS1/SERPINC1 (n = 172), the most commonly represented DGGs for the coagulation and thrombotic disorders, respectively. In 12 367 UKB participants, 967 BTPD variants were observed (supplemental Figure 2): 12 129 (98.1%) were heterozygous, 205 were males carrying a variant in an X-linked gene (F8, F9, FLNA, WAS), and 33 had a variant in homozygosity. The MDTs accepted the pathogenic or likely pathogenic label for 67% of BTPD variants. The main reason for rejection of a pathogenic label was a MAF in any of the main ancestries of UKB participants that was incompatible with the prevalence of the disorder (supplemental Figures 2 and 4).
Variant effect sizes on platelet count and volume
There was a negative correlation between platelet count and MPV, as expected, in the 3359 UKB carriers of 128 platelet disorder variants included in the association analysis (Figure 3A).29,31 We detected significant associations for 24 variants (P < .05) with effect sizes ranging from −1.4 to +1.0 SD, equating to a change in platelet count of −83 × 109/L to +59 × 109/L, and MPV effect sizes ranging from −0.8 to +1.7 SD (−0.9 to +1.8 fL) (Figure 3A; supplemental Table 4).
Eighteen of the 128 analyzed variants were in DGGs implicated in autosomal dominant (AD) thrombocytopenia disorders. Ten had significant effects on platelet count and/or volume in carriers, including 3 variants in TUBB1, 2 in RUNX1, 1 in ETV6, and 1 in MYH9 (Figure 3B). Rare variants in GP1BA and GP1BB cause both AD macrothrombocytopenia and AR Bernard-Soulier syndrome (BSS).36,37,57,58 There were no UKB carriers of BSS-variants in homozygosity or compound heterozygosity. The GP1BA premature stop p.Gln196Ter had the largest effect sizes on both platelet count (β = −1.4, P = 8.3 × 10−5) and volume (β = 1.7 SD, P = 1.0 × 10−6) in heterozygotes, equating to an average reduction in count of 82 × 109/L and 1.8 fL increase in MPV (Figure 3B; supplemental Figure 5). We detected effect sizes >0.9 or >0.7 SD for platelet count with 5 or 10 carriers, respectively. Therefore, the effect size for the remaining 8 variants in AD thrombocytopenia disorder genes was either too modest to be detected or, contrary to their pathogenicity labels, these do not significantly affect platelet count or volume (Figure 3B).
Interestingly, we also observed significant effects on platelet count and/or volume for 14 variants causal of AR platelet disorders (Figure 3A,C). First we confirmed our previous finding that carriers of GP9 p.Asn61Ser had a reduced platelet count (Figure 3C).31 Monoallelic variants in GP9 are not currently deemed causal for AD macrothrombocytopenia, however our association analysis identified variants in all 3 BSS DGGs (GP1BA, GP1BB, GP9) that reduce the count and increase the volume of platelets in carriers. The impact of these BSS-variants was, however, generally insufficient to diagnose macrothrombocytopenia (supplemental Figure 5).
We performed a similar association analysis for 13 LoF variants in the genes for GT (GT-variants) that were heterozygous in 148 individuals in our study population (no participants had >1 GT-variant).59,60 Patients with GT typically have normal platelet counts, therefore it was interesting to observe 3 variants in ITGB3 and 2 in ITGA2B with significant effects on platelet count in carriers (β range −0.4 to −1.0), with an average reduction in the range from 26 × 109/L to 56 × 109/L (Figure 3C; supplemental Figure 6). These LoF variants are distinct from the GoF variants in ITGB3 and ITGA2B.61,62
Our analysis also revealed significant effects on platelet count for 5 out of 13 monoallelic LoF variants in MPL, which in homozygosity or compound heterozygosity cause congenital amegakaryocytic thrombocytopenia (CAMT), a disorder characterized by profound thrombocytopenia and progression to aplastic anemia (Figure 3C).2,63,64 These 5 CAMT-variants were collectively carried by 274 UKB participants. Unexpectedly, and in sharp contrast to the reduction in counts of carriers of some BSS- and GT-variants, we observed increased platelet counts, with effect sizes between 0.4 and 1.0 SD (Figure 3C), equating to an average increase of 22 × 109/L to 57 × 109/L (Figure 3D). This resulted in thrombocytosis (platelet count >450 × 109/L) for 7 carriers. For 4 of these 5 CAMT-variants, the association with increased platelet count was corroborated in an extended analysis of 383 000 UKB participants (supplemental Methods). This analysis also revealed an association between increased platelet count and heterozygosity for an additional 8 CAMT-variants (supplemental Figure 7). Of the 17 CAMT-variants on which we reported the outcome of association analysis, 4 were premature stops, 4 splice-site variants and 9 missense variants. Structural data was available for 8 missense variants and assuming that carrying 1 CAMT-variant allows expression of the mutant MPL receptor, we predicted that 5 of the 6 amino acid changes, which significantly increased platelet count in carriers, have functional consequences (Figure 3E; supplemental Figure 7). None of the CAMT-variants reported on were GoF variants, which confer an increased risk of myeloproliferative disorders.2,63,64
Subsequently, we replicated the relevant platelet count variants in the full UKB cohort and observed a good agreement between the effect sizes (Pearson R = 0.71, P = 6 × 10−6; supplemental Figure 8).
The risk of bleeding and VTE due to rare BTPD variants
Bleeding is a more heterogeneous and less-standardized phenotype than complete blood count-measured platelet count and volume. Therefore, to assess the association between BTPD variants and bleeding, we used International Statistical Classification of Diseases and Related Health Problems, Tenth revision (ICD-10) codes to capture hospital episodes associated with bleeding over 23.5 years. An additive ICD-BAT score was developed, indicating the number of bleeding episodes across 19 domains (supplemental Methods; supplemental Table 2; supplemental Figure 9). We first investigated female carriers of F8 or F9 variants, to resolve uncertainty about the extent to which these cause abnormal bleeding.65,66 Of the 12 variants amenable to analysis, 1 (F9 p.Arg449Trp) significantly increased the risk for a higher ICD-BAT score (OR = 1.89, P = .04; Figure 4A). For von Willebrand factor (VWF) variants, 1151 male and 1302 female carriers were analyzed together. In contrast with F8/F9 there was no clear directionality in the ORs for bleeding, except the inframe indel p.Thr1034del which was associated with increased bleeding in 21 carriers, none of whom had a second von Willebrand disease (VWD)-variant (OR = 2.17, P = .01; Figure 4B). Ten (47.6%) of the p.Thr1034del carriers, which since the MDT has been reported to cause AR type 3 VWD, presented to the hospital with bleeding.67
We observed a novel bleeding risk in carriers of HPS6 premature stop variants (p.Ter776ArgextTer38, p.Leu22ArgfsTer33) and an ANO6 splice-acceptor variant (supplemental Figure 10).60,68 We detected no increased risk of bleeding in carriers of variants for BSS, GT, CAMT or AD thrombocytopenia disorders.
VTE is a leading cause of death worldwide, with an estimated 25 000 cases annually in the United Kingdom.69 Of the 257 BTPD variants in thrombosis DGGs, 172 (66.9%) were in genes encoding antithrombin (SERPINC1), protein C (PROC) and protein S (PROS1).70 Single variant analysis of 24 variants in these 3 genes showed increased risk for deep vein thrombosis for 7 variants (OR = 4.43-17.42, P < .05) and for pulmonary embolism for 1 variant (OR = 4.22, P = .048) in carriers; 4 had ORs >10 (Figure 4C).
The interplay between common and rare variants
Rare variants are embedded in a complex genetic architecture that also affects traits and diseases, as shown by GWAS studies. Because this architecture may alter the penetrance and effect of rare variants, we explored the interplay between GWAS-variants and rare variants, using an interactome of 18 410 nodes and 571 917 edges (“Methods”). This interactome allowed us to evaluate whether common variants exert an effect on clinical traits using the same pathways altered by rare variants. Platelet count was regulated by 658 common GWAS-variants (MAF >0.01), explaining ∼19% of the variance.31 Summing the weighted allele counts for these variants provided a PGS for platelet count.31,71 A network-propagation analysis with the 93 BTPD DGGs as seed nodes showed that the GWAS-variants with the largest effect sizes (ie, top quartile) are enriched in nodes encoded by the 93 DGGs and their first-order interactors (Figure 5A-C). In contrast, effect sizes for GWAS-variants in genes encoding nodes on the interactome periphery were smaller (Figure 5C). A similar observation was made for the 297 GWAS-variants used to calculate the PGS for VTE risk (Figure 5B-C). We conclude that GWAS-variants with large effect sizes for platelet traits and VTE were strongly enriched in BTPD DGGs or their immediate functional interactors.
This prompted us to explore the interplay between PGS and rare BTPD variants. Effect sizes for rare platelet gene variants on platelet count (Figure 3A-C) and ORs for rare thrombosis gene variants on VTE risk (Figure 4C) are at least 2 orders of magnitude higher than those observed for GWAS-variants (supplemental Figure 11).72 To explore whether there was an additive effect between rare BTPD- and GWAS-variants, we first tested for interaction (ie, synergistic effect) between each of the 128 platelet gene variants included in the association analysis and the platelet PGS. Within the power limitation of this sample size, there were no significant interaction effects, indicating these contributions are independent and additive (supplemental Table 4). For 10 rare variants with the largest platelet-count effect sizes we combined their effect and frequency with the PGS distribution, to calculate the additive PGS contribution required to reduce platelet count below the clinical cutoff for thrombocytopenia (150 × 109/L) (Figure 6A). We estimate at least 3242 UK individuals have a count <150 × 109/L due to the combination of one of these rare variants plus an unfavorable PGS (supplemental Methods). This interplay is illustrated by the TUBB1 premature termination (p.Cys12LeufsTer23), which reduced platelet count by −1.1 SD in UKB carriers. Carriers with counts >150 × 109/L (n = 9) had favorable PGS values, while the one with thrombocytopenia (platelet count, 126 × 109/L) had an unfavorable PGS that lowered the count by −1.29 SD.
Similarly, we reasoned that given a constant shared risk for VTE due to inheritance of the same rare variant, PGS could improve the prediction of VTE events. We estimated this improvement using a binary classifier and showed increased sensitivity and specificity when including PGS compared to a model based only on rare variants (DeLong test, Z = 11.31, P = 2.2 × 10−6; Figure 6B).30 The minor improvement in the PGS model predictive capacity due to the inclusion of rare variants is expected, as rare variants are not widely shared among individuals, which hampers their use in population-scale prediction. Then, we explored whether inclusion of the PGS improved interpretation of the clinical impact of 3 rare variants with the largest ORs for VTE. Considering the 46 carriers (8 with a VTE event and 38 without a VTE event) of these 3 variants, those with favorable PGS had fewer events (Fisher exact test; P = .126). When reviewed per variant, the PGS contribution improved the distinction between carriers with and without VTE (Figure 6C). Therefore, individuals carrying rare variants with a large OR who also have an unfavorable PGS have a higher risk of VTE.
Discussion
Rare diseases collectively affect hundreds of millions worldwide.73,74 Incorporating genetic testing into clinical care has proved crucial in diagnosing patients with rare diseases, increasing the diagnostic rate to 50.8% for BTPDs.2,16,24,75,76 It can inform treatment decisions, identify affected relatives and influence the reproductive choices of families. Here we describe an approach to support and accelerate the generation of evidence to define the pathogenicity of rare variants in BTPDs. This approach leveraged WES genotypes and linked EHR data from UKB to remove possible bias inherent to discoveries and pathogenicity classifications from extreme clinical cases. To generate evidence about pathogenicity we have, for the first time, estimated effect sizes and ORs of rare BTPD variants on platelet count and volume, bleeding and VTE risk in UKB participants of European ancestry. There were sufficient UKB carriers for association analysis for 91 out of 3068 (3.0%) cataloged-variants in DGGs for AD BTPDs, including AD thrombocytopenia disorders, VWD, and deficiencies in antithrombin, protein C and S (Figures 3 and 4). There was no enrichment of variants accepted by the MDT as pathogenic or likely pathogenic among those with significant effect sizes. We observed nonsignificant effects for accepted variants and significant associations for rejected variants (Figures 3 and 4). For example, VWF p.Thr1034del was rejected because of its high MAF in individuals of African ancestry (MAF = 0.015).77 This suggests that synergizing the results of association analyses with traditional MDT decision approaches can assist with pathogenicity classifications during clinical-variant reporting.
We also systematically explored the phenotypic consequences from carrying a single BTPD variant for disorders with a recessive MOI. We showed a significant increase in bleeding in female carriers of F9 variant p.Arg449Trp (Figure 4). Our association analysis provides compelling evidence that LoF variants in several AR platelet disorder DGGs are associated with altered platelet count or increased risk of bleeding when monoallelic. We show this both in DGGs with an established mixed MOI (eg, GP1BA, GP1BB) and in those for which a carrier phenotype has not previously been described (eg, ITGA2B, ITGB3). These findings support an additive effect of rare variants in disorders traditionally understood to be recessively inherited, and narrows the distinction between dominant and recessive MOIs.
Unexpectedly, LoF variants in MPL, which cause the AR disorder CAMT, increased platelet counts of UKB carriers, sufficient in some cases to result in thrombocytosis (Figure 3). One CAMT-variant (p.Arg102Pro) has been reported in heterozygosity in a family with thrombocytosis.78 When biallelic, MPL cell surface translocation is blocked, therefore despite high circulating thrombopoietin (TPO) levels, a lack of MPL-TPO signaling suppresses megakaryopoiesis, explaining the profound thrombocytopenia in patients with CAMT.79 However, monoallelic p.Arg102Pro only reduces cell surface expression and TPO clearance; through a negative feedback loop this increases megakaryocyte proliferation and platelet production.78 TPO measurements are unavailable in UKB EHRs to support reduced clearance as the mechanism for increased platelet counts in CAMT-variant carriers.
The incomplete penetrance of pathogenic variants is recognized across rare diseases including BTPDs and may partially be explained by the modifying effect of common GWAS-variants. The effects of GWAS-variants for platelet traits and VTE are dispersed across hundreds of proteins in the interactome but can be enumerated by the PGS value. Our expansion analysis showed that GWAS-variants with the largest effect sizes are enriched in the proximity of proteins encoded by the 93 BTPD DGGs (Figure 5). This is consistent with the omnigenic model recently proposed80 and supports the idea that traits, phenotypes and diseases are a continuum regulated by common and rare variants, and interplay between them. We illustrated that the effects of the PGS for platelet count, and of rare platelet gene variants, are independent and additive in causing thrombocytopenia (Figure 6).31,81 For VTE, we observed that incorporation of both the PGS for VTE and the ORs for rare variants in thrombosis genes improved predictive models compared with the model using BTPD variants alone (Figure 6). These observations confirm the interplay between BTPD variants and PGS, and verify similar observations for other complex traits such as type 2 diabetes and hemoglobin A1C levels, familial hypercholesterolemia, and some hereditary cancers, such as Lynch syndrome.82,83 Incorporating the effect of the relevant PGS when considering the clinical impact of rare BTPD variants can be readily achieved if whole genome sequencing analysis is used in the diagnostic workup.2,76 There are other possible explanations for altered variant effect sizes, for example interaction across different genes, however, these were beyond the scope of this manuscript.
To summarize, we have reported on effect sizes and ORs of rare pathogenic and likely pathogenic variants considered causal of rare inherited hemostasis disorders. Our analysis further challenges the dogma that rare variants causal of AR disorders are silent, shrinking the distinction between dominant and recessive inheritance. We also demonstrate that PGS for platelet count and VTE risk modify the clinical penetrance of rare BTPD variants causal of AD thrombocytopenia and VTE, respectively. UKB is a representative cohort of the general UK population of sufficient size to estimate rare variant effects in individuals of European ancestry. As non-European population cohorts increase and ancestry-specific PGS become available, studies of this kind can be replicated. Many variants considered in genomics MDTs will be amenable to association analyses. The integration of these results into clinical-variant reporting will assist with the pathogenicity classification of rare variants implicated in hemostasis disorders and other rare diseases.
Acknowledgments
This study uses genotype and phenotype data generated by the National Institute for Health and Care Research (NIHR) BioResource for the rare disease program–these data are available from relevant publications. This research has been conducted using the UK Biobank Resource under application number 13745. The authors gratefully acknowledge the participation of all NIHR BioResource volunteers and thank the NIHR BioResource centre and staff for their contribution. They also thank the National Health Service Blood and Transplant. The authors are grateful to Catherine Snow (Genomics England) and Arina Puzriakova (Genomics England) for their assistance with PanelApp and for sharing their knowledge about rare disease domains, to Jim Crawley (Imperial College, London) for commenting on the discussion and to Ernest Turro (Mount Sinai), Daniel Greene (Mount Sinai) and William Astle (University of Cambridge) for their feedback and knowledge on UK Biobank data. For the AstraZeneca UK Biobank work, the authors thank the participants and investigators in the UK Biobank study who made this work possible (Resource Application Number 26041); the UKB Exome Sequencing Consortium (UKB-ESC) members AbbVie, Alnylam Pharmaceuticals, AstraZeneca, Biogen, Bristol Myers Squibb, Pfizer, Regeneron, and Takeda for funding the generation of the data and Regeneron Genetics Center for completing the sequencing and initial quality control of the exome sequencing data; the AstraZeneca Centre for Genomics Research Analytics and Informatics team for processing and analysis of sequencing data. Support for title page creation and format was provided by AuthorArranger, a tool developed at the National Cancer Institute. Figure 1A and the visual abstract were created with BioRender.com.
Research in the Ouwehand laboratory received funding from the British Heart Foundation (BHF), the International Society on Thrombosis and Haemostasis, Medical Research Council (MRC), National Health Service Blood and Transplant, and the NIHR. For his PhD period, L. Stefanucci was supported by the BHF grant from the Cambridge BHF Centre of Research Excellence (RE/18/1/34212); J.C. and M.C.S. were awarded MRC Clinical Research Training Fellowships (MR/P02002X/1 and MR/R002363/1, respectively). K. Freson was supported by the Research Council of the University of Leuven (Special Research Fund [BOF] KU Leuven, Belgium, C14/19/096) and by an unrestricted grant of Sobi. L.P., H.H., K.P., and P.P. received funding from European Molecular Biology Laboratory core funding, Open Targets (grant agreements OTAR-044, OTAR02-048, OTAR02-066) and the Wellcome Trust grant 212925/Z/18/Z. P.B. is supported by the Helmut Horten Stiftung and the ETH Zurich Foundation. W.H.O. is a senior investigator of the NIHR. D.V. is a member of the Health Protection Research Unit in Chemical and Radiation Threats and Hazards, a partnership between Public Health England and Imperial College London which is funded by the NIHR. M.F. is supported by the BHF (FS/18/53/33863) and by the National Institute for Health and Care Research Exeter Biomedical Research Centre. E.B. is supported by National Human Genome Research Institute grant U24HG003345 & Wellcome Trust grant 208349/Z/17/Z. T.J.C. was supported by funding from the National Library of Medicine (NLM T15LM009451 and T15LM007079).
The views expressed are those of the author(s) and not necessarily those of the funders, NIHR or the Department of Health and Social Care.
Authorship
Contribution: L. Stefanucci, J.C., and M.C.S. wrote the manuscript, analyzed data, participated in multidisciplinary teams (MDTs) and oversight analysis pipeline; I.B.-H., L. Sun, O.S.B., L.P., I.B.-H., N.G., T.J.C., R.A.L., P.B., J.S., and P.N.R. analyzed data and oversight analysis pipeline; I.B. and K. Fleming analyzed data; J.A.G., H.H., K.P., S.P., P.P., Q.W., K.C., X.W., E.D.A., and M.J.M. oversight analysis pipeline; M.F. reviewed the manuscript; K.G., M.L., and K.D. participated in MDTs; A.D.M. and K. Freson reviewed the manuscript and participated in MDTs; W.H.O. conceptualized the study and reviewed the manuscript; K.M. project administration, data curation, wrote the manuscript and participated in MDTs; E.B. wrote the manuscript, conceptualized the study, project administration and data curation; and D.V. wrote the manuscript, analyzed data and participated in MDT.
Conflict-of-interest disclosure: O.S.B., Q.W., K.C., P.P., K.M., and S.P. are current AstraZeneca employees and/or stockholders. L. Sun is a full-time employee at Regeneron Genetics Center, LLC. The remaining authors declare no competing financial interests.
The current affiliation for P.P. is R&D Data Office, Data Science and Artificial Intelligence, BioPharmaceuticals R&D, AstraZeneca, Cambridge, United Kingdom.
The current affiliation for K.M. is Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, United Kingdom.
A complete list of the members of the NIHR BioResource Consortium appears in the supplemental Material.
Correspondence: Dragana Vuckovic, Department of Epidemiology and Biostatistics, Imperial College London, Praed Street, London W2 1NY, United Kingdom; e-mail: d.vuckovic@imperial.ac.uk.
References
Author notes
∗L. Stefanucci and J.C. are joint first authors.
†M.C.S., I.B.-H., and L. Sun contributed equally to this study.
‡E.B. and D.V. are joint last authors.
Genotype and phenotype data are accessible at UK Biobank (https://www.ukbiobank.ac.uk/) and require an active project and application.
Data analysis scripts will be shared on reasonable request from the corresponding author, Dragana Vuckovic (d.vuckovic@imperial.ac.uk).
The online version of this article contains a data supplement.
There is a Blood Commentary on this article in this issue.
The publication costs of this article were defrayed in part by page charge payment. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal