Abstract
Background: Genetic testing is an integral part of modern diagnostics. Sequencing genomes or exomes in different consortia revealed novel aberrations with importance to hematologic classification, prognosis or therapy. However, a high number of low frequency variants were also found in healthy populations. This challenged the distinction between population variation (polymorphism) and disease associated changes based on early databases with limited extent. For diagnostic purposes, the distinction of somatic (acquired) variants from rare germline variants allows moving towards personalized genetic characterization including molecular markers for individual follow-up.
Aim: 1) present an approach to distinguish between somatic or germline variants by comparison with matched tissue (buccal swap, nails), 2) define diagnostically relevant patterns for variant classification or database use.
Patients and Methods: Variants were initially classified in a three-tier system: (A) Protein truncating variants (PTV) or changes with strong evidence in literature (e.g. JAK2V617F) were defined as actionable/disease associated. (B) Criteria for polymorphism were met, if population frequencies were available from two sources (1000 Genomes, ExAC). (C) Remaining, critical variants were sequenced in germline DNA (ACMG guidelines, Richards, 2015).
We selected 88 patients with critical variants in peripheral blood (PB, n=29) or bone marrow (BM, n=59) and available DNA from buccal swaps (n=40), nails (n=31) or both (n=17). Samples were received for routine diagnostic assessment (suspected diagnosis: myelodysplastic syndrome or chronic myelomonocytic leukemia [n=56], myeloproliferative neoplasm [n=8], acute myeloid leukemia [n=6] or B cell malignancy [n=18]). From PB or BM, 829 analysis by Sanger-, 454- (Roche, Branford, CT) or Illumina sequencing (San Diego, CA) were performed (1-49 [median 6] genes/patient).
Results: In 88 patients, we identified 74 actionable/disease associated changes, 67 polymorphisms and one or two variants (n=96) per patient that could not be classified in the previous categories, requiring matched germline DNA sequencing. We found that 35% (34/96) of these variants were also present in germline, although not listed in common polymorphism databases. Consequently, theses variants do not qualify as markers for clonality and follow-up.
Of note, 19% of nails and 14% of all buccal swabs received in our laboratory were not analyzable due to low DNA amounts (not included in cohort). Importantly, DNA from both sources can contain low levels of somatic mutations.
Next, we compared somatic and germline variants in terms of predicted effects on function, variant burden and population frequency, to identify patterns with relevance to future categorization.
Firstly, predicted as damaging by PolyPhen algorithm were significantly more somatic (92%, 49/53) than germline variants (61%, 19/31, p<0.001). Most variants (excluding PTVs) were found in TET2 (n=25). Of 11 confirmed somatic variants, 10 were located in conserved domains, while none of the germline variants was located in these domains.
Secondly, germline variants had a median burden of 50% (40-59%) or 98% in one case, which is the expected result for variants derived from either one or both alleles. For somatic variants, burdens were observed between 2% and 100% (median 40%), representing the varying degree of malignant cells in PB or BM. For comparison, disease associated variants showed a similar distribution: 3-90% (median: 40%).
Thirdly, we compared variants to ExAC data, the largest available set of exonic variants in healthy individuals (over 60,000). Only 14/34 (41%) germline variants were found in the ExAC data (population frequencies <0.1%). However, 3/62 (5%) of our somatic variants also occurred in the ExAC set.
Conclusions: A growing number of sequencing data outdated the traditional distinction between polymorphism and mutation. By comparison with DNA from buccal swabs or nails, we showed that somatic and germline variants have different global patterns (e.g. variant burden, predicted function), but the decision in individual cases based on in silico data can be misleading. Only sequencing germline DNA distinguishes somatic from germline variants on a personalized level and allows strategies to define germline variants potentially contributing to tumorigenesis in future studies.
Baer:MLL Munich Leukemia Laboratory: Employment. Nadarajah:MLL Munich Leukemia Laboratory: Employment. Haferlach:MLL Munich Leukemia Laboratory: Employment, Equity Ownership. Kern:MLL Munich Leukemia Laboratory: Employment, Equity Ownership. Haferlach:MLL Munich Leukemia Laboratory: Other: Part Owner MLL Munich Leukemia Laboratory.
Author notes
Asterisk with author names denotes non-ASH members.