Validation of the Genetically-Defined DLBCL Subtypes and Generation of a Parsimonious Probabilistic Classifier

Chapuy, Bjoern; Stewart, Chip; Wood, Timothy; Dunford, Andrew; Wienand, Kirsty; Getz, Gad; Shipp, Margaret A.

doi:10.1182/blood-2019-131250

Diffuse large B-cell lymphoma (DLBCL) is a clinically and molecularly heterogeneous disease with recognized transcriptional subtypes associated with normal cells of origin, activated B-cell (ABC) and germinal center B-cell (GCB) tumors. Emerging data suggested that additional heterogeneity existed, prompting us to comprehensively characterize genomic signatures of 304 newly diagnosed DLBCLs from patients treated with state-of-the-art therapy. We integrated recurrent mutations, somatic copy number alterations (SCNAs) and structural variants (SVs) and identified 5 genetically distinct DLBCL clusters (C1- C5 DLBCLs; Chapuy, Stewart, Dunford, et al. Nat Med 2018). Specifically, we identified two genetically distinct ABC subtypes, including favorable-risk C1 DLBCLs with features of extrafollicular origin and alterations also seen in transformed marginal zone lymphomas (NOTCH2 and NF-κB pathway member mutations and BCL6 SVs). Unfavorable-risk C5 ABC DLBCLs harbored frequent 18q/BCL2 copy gain and co-occurring CD79B and MYD88^L265P mutations. We also identified two genetically distinct GCB subtypes, including unfavorable-risk C3 DLBCLs with frequent BCL2 SVs, mutations in chromatin-modifying enzymes (CREBBP, MLL2, EZH2) and BCR/PI3K signaling pathway members (including inactivating PTEN mutations and copy loss). Favorable-risk C4 GCB DLBCLs had frequent mutations in core and linker histones and signaling intermediates (SGK1, BRAF and STAT3). Additionally, we identified an ABC/GCB-independent subtype, C2 DLBCLs, characterized by frequent bi-allelic TP53 inactivation, 9p21.23/CDKN2A copy loss and associated genomic instability reflected in recurrent SCNAs, increased genome doublings and a distinct outcome following induction therapy.

A next step in utilizing the characterized genetic substructure was to confirm it in an independent series and develop a molecular classifier that allows prospective identification of C1-C5 DLBCLs. To this end, we accessed whole exome sequencing, copy number and SV data from a recent cohort of newly diagnosed DLBCLs (39 tumor-normal pairs, 462 tumor-only samples; Schmitz et al. NEJM 2018). All samples were re-analyzed using our mutational and SCNA pipelines and our newly generated tumor-only algorithm (Chapuy, Stewart, Dunford, et al. Nat Med 2018) to avoid batch effects and harmonize the datasets. SVs were used as reported. Purity and ploidy were inferred using ABSOLUTE and samples with missing data or low purity were removed. For the combined cohort (579 samples), we assessed our previously characterized 158 genetic drivers (Chapuy, Stewart, Dunford, et al. Nat Med 2018) and confirmed equal distribution of their marginal frequencies (R=0.88, p=1.5e-51), excluding batch effects. Next, we applied non-negative matrix factorization (NNF) consensus clustering to the combined dataset (158 genetic drivers vs. 579 tumors) and confirmed the C1-C5 DLBCL genetic clusters. Notably, tumors from both series contributed at comparable frequencies to the respective C1-C5 DLBCLs. We also noted an enrichment of the alternative genetic labels from Schmitz et al. in 3 of our C1-C5 DLBCL subtypes (B2N in C1 DLBCLs, p<0.0001; EZB in C3 DLBCLs, p<0.0001; MCD in C5 DLBCLs, p<0.0001). These data confirmed the identity of the C1-C5 DLBCL clusters in an independent cohort.

Next, we developed a molecular classifier that prospectively identified C1-C5 DLBCLs using a minimum number of easy-to-measure features. The NMF-defined classes of the combined cohort were used as gold-standard training and validation datasets. We tested different models for classification and selected an artificial neural network approach which provides accurate classification of individual samples and well-calibrated confidence metrics. To minimize potential overtraining, we developed a reduced input feature set of the 22 most discriminating features, constructed confidence metrics for each sample and trained an ensemble of Feed-Forward Neural Networks via 10-fold cross validation. With this approach, our classifier had 84% accuracy for the total set and 94% accuracy for the high-confidence samples (70% of all samples).

The newly developed parsimonious classifier will allow prospective identification of the independently confirmed C1-C5 DLBCL subtypes in newly diagnosed patients, a necessity for clinical application.

Disclosures

Getz:Pharmacyclics: Research Funding; IBM: Research Funding; MuTect, ABSOLTUE, MutSig and POLYSOLVER: Patents & Royalties: MuTect, ABSOLTUE, MutSig and POLYSOLVER. Shipp:BMS: Consultancy, Honoraria, Membership on an entity's Board of Directors or advisory committees, Research Funding; Merck & Co.: Honoraria, Membership on an entity's Board of Directors or advisory committees, Research Funding; AstraZeneca: Honoraria, Membership on an entity's Board of Directors or advisory committees; Bayer: Research Funding; Gilead Sciences: Honoraria, Membership on an entity's Board of Directors or advisory committees; Takeda Pharmaceuticals: Honoraria, Membership on an entity's Board of Directors or advisory committees.

Author notes

*

Asterisk with author names denotes non-ASH members.

2019

Sign in via your Institution

Validation of the Genetically-Defined DLBCL Subtypes and Generation of a Parsimonious Probabilistic Classifier

Author notes

Cited By

Email alerts

ASH Publications

American Society of Hematology

Validation of the Genetically-Defined DLBCL Subtypes and Generation of a Parsimonious Probabilistic Classifier Free

Author notes

This feature is available to Subscribers Only

My Account

Cited By

Email alerts

ASH Publications

American Society of Hematology

This Feature Is Available To Subscribers Only

Validation of the Genetically-Defined DLBCL Subtypes and Generation of a Parsimonious Probabilistic Classifier