Figure 4.
Burden of deleterious variants by categories of genes and phenotypic severity profiles. (A) The CADD score distribution of variants identified in the study cohort (red) compared with gene-level MSC scores,68 indicating the lower limit of the 90% and 95% confidence intervals for deleterious CADD scores in individual genes (gray violin plots). Vertical dotted lines represent overall CADD scores 5 and 15 and the violin plot median and interquartile ranges. Note that CADD scores rank the deleteriousness for all 9 billion single nucleotide variants, and millions of small indels and splice site variants, based on machine learning trained on diverse genomic features derived from surrounding sequence context, gene model annotations, evolutionary constraint, epigenetic measurements, and functional predictions.65-67 (B) The 5 process-level categories, respective category contributions in the study cohort, are shown for the total number of genes, and all variants,, with each box representing the category’s percentage for the row, as indicated by the heat scale. (C) The burden of deleterious (CADD >15) variants in the 5 categories of genes for each predefined phenotypic subcohort. Filled pie-chart regions quantify the number of CADD >15 variants per patient in the category (gray if pooling across all gene categories; red if gene category specific; fully filled if 1 variant per patient). The 10 variants with CADD scores of 1 to 5 (likely benign) and 36 with CADD scores between 5 and 15 (of uncertain significance) are effectively included in the remaining white pie chart regions to enhance clarity. For each set of 6 pie charts, gene category placements are consistent. The upper row indicates total number of CADD >15 variants across all categories, and the remaining upper row pie charts focus on hemorrhage (upper middle for coagulation genes, upper right for platelet genes). The lower row presents the red cell gene categories (lower left for hemoglobin genes, lower middle for erythrocyte [red cell] membrane genes, and lower right for erythrocyte enzyme genes). The set of 6 gray-lined empty pie charts indicate there were no patients in the predefined A1 category. (D) Heat maps for number of variants per patient in each phenotypic subcohort, in which the precipitated and spontaneous H1/H2, T1/T2, and A1/A2 subcohorts have been pooled. Trends highlighted in the text are denoted by a white cross or black star. *P < .0286; **P < .01.