Background:
Pediatric acute myeloid leukemia (AML) is a highly heterogeneous malignancy with disparate risk profiles between subtypes. While the advent of single-cell technologies enabled granular evaluation of disease biology, differentiation of malignant cells from healthy analogs remains challenging. Reliable identification of malignant cells has the potential to enhance transcriptomic characterization of high-risk AML subtypes.
Methods:
We applied Support Vector Machine (SVM) learning to four single-cell RNA sequencing (scRNAseq) datasets (101 bone marrow biopsies: 75 diagnoses, 15 post-treatment, 11 healthy controls) containing 204,836 cells to develop a geneset for discrimination of malignant and non-malignant cells. AML cells were identified based on canonical gene expression, patient occupancy scoring, clustering relative to healthy bone marrow, and inferred alterations (inferCNV) for known copy number variants. AML subtype clusters that persisted in end-of-induction (EOI) and relapse were correlated with clinical outcomes and underwent differential expression to identify high-risk transcriptomic profiles. High-risk subtypes were validated by the application of characteristic gene signatures to large bulk RNA sequencing datasets from the Therapeutically Applicable Research to Generate Effective Treatments initiative and National Center for Biotechnology Information Gene Expression Omnibus.
Results:
SVM classification using a 17-geneset discriminated malignant from non-malignant cells with areas under the curve from 0.84-0.95 (ARMH1, CLEC11A, NREP, AZU1, PRAME, IFITM2, CFD, DUSP6, SCN3A, CD163, SOX4, IFI30, HOXA9, CD44, SRGN, UBE2C, CXCL8). Malignant cells grouped into 1-5 clusters of monocyte-like, hematopoietic-stem-cell-like (HSC), granulocyte-monocyte-progenitor-like (GMP) AML cells. One HSC-like and one monocyte-like AML cluster persisted through treatment with both a > 1.5-fold proportional enrichment at EOI or relapse and at least a 50% clinical relapse rate. Differential expression (log 2FC>1, adjusted p<0.05) yielded characteristic gene-sets (HSC: 20 genes, Mono: 56 genes). The high-risk HSC subtype showed increased expression of genes associated with chemoresistance (CD69, Log 2FC: 1.46, adjusted p = 3.48x10 -50), angiogenesis (MDK: Log 2FC = 1.04, adjusted p = 9.25 x 10 -50), and poor survival (BCL3: Log 2FC = 1.18, adjusted p = 1.00 x 10 -47). Patients with similar transcriptomes in bulk RNAseq had worse overall survival with hazard ratios from 1.31-3.35 across datasets (Cox PH p-values 4.38x10 -2 to 7.20x10 -3).
Conclusions:
These findings illustrate that malignant AML cells can be differentiated from healthy analogs solely based on gene expression. Application of our 17-geneset revealed two high-risk transcriptomic profiles, with the HSC subtype having particularly poor outcomes. From this foundation, future work will identify malignant transcriptomic changes amenable to established chemotherapeutics.
Disclosures
Bhasin:Anxomics LLC: Current Employment, Current equity holder in private company.