Abstract
INTRODUCTION
The huge amount of data emerging from Next-Generation Sequencing (NGS) projects is bringing a revolution in cancer medicine, leading to the discovery of a large number of new somatic alterations that are associated with the onset and/or progression of cancer. However, researchers are facing a formidable challenge in identifying oncogenic driver genes among the variants generated by NGS experiments.
METHODS
To overcome these limitations, we developed OncoScore, a bioinformatics text-mining tool capable of automatically scanning the biomedical literature by means of dynamically updatable web queries and measuring gene-specific cancer association in terms of gene citations (submitted). OncoScore is distributed as a R Bioconductor package (https://bioconductor.org/packages/release/bioc/html/OncoScore.html) and as a web tool (http://www.galseq.com/oncoscore.html).
To assess the ability of OncoScore to discriminate between cancer and non-cancer genes we generated the OncoScore estimation for the whole Cancer Genes Census (CGC; 507 genes) dataset and for a manually curated list of genes not associated to cancer (nCan; 302 genes). The distribution of OncoScore values differed significantly between the two groups (mean: 48.8 and 14.8 for CGC and nCan, respectively; p-value = 2.2e-16). The receiver operating characteristic (ROC) curve and the area under the curve (AUC) metrics confirmed the excellent capability of OncoScore in discriminating the true positive from the true negative cancer genes at different cut-off values (OncoScore cut-off threshold = 21.09; AUC1 = 90.3%, 95% CI: 88.1-92.5).
RESULTS
To test OncoScore, we generated whole-exome sequencing data on 33 Chronic Myeloid Leukemia patients, 23 in Chronic Phase (CP) and 10 in Blast Crisis (BC). In order to selectively identify somatic mutations, CP samples were filtered against the germline exome data, while BC was filtered against the corresponding CP exome (thus filtering out all passenger mutations present at diagnosis). A total of 107 and 34 nonsynonymous somatic variants were identified in CP and BC, respectively. The mean OncoScore value of the BC samples was higher than that of CP ones (mean OncoScore = 35.6 ± 4.91 SEM vs. 19.2 ± 2.07 SEM; p=0.0007). Manual inspection of the 10 BC genes with the highest OncoScore values highlighted the presence of at least 5 genes (ABL1, NRAS, ASXL1, RUNX1, IKZF1) that were demonstrated to be functionally associated with CML progression, suggesting that OncoScore was able to correctly prioritize biologically relevant cancer genes relevant to CML progression.
The 107 non-synonymous variants (range 0-11 per patient) identified in CP-CML patients at diagnosis were further analyzed. A positive correlation was observed between number of mutations and patient age (r=0.4638; p=0.0258), indicating that several events were passenger mutations being expanded by neoplastic transformation. However, when using OncoScore to weigh the oncogenic potential of each mutation, a significant correlation was observed between the Sokal score and OncoScore (r=0.6815, p=0.0003) suggesting that the identified mutations may have clinical impact.. On long term follow-up (>2 years), 17 CML patients remained on standard imatinib therapy while 6 were shifted to 2nd/3rdline therapy because of resistance to imatinib. According to OncoScore, the 6 imatinib-resistant patients had significantly higher scores at diagnosis than the 17 long-term imatinib responders (measured as the sum of all mutations with Oncoscore values >21: 120±16 SEM vs. 50±15; p=0.0098). No significant differences were instead observed in the Sokal score among the two groups (1.15±0.3 vs. 0.9±0.1, p=0.244). Among the identified genes with high OncoScore values, only ASXL1 was associated with hematological malignancies.
CONCLUSIONS
These data indicate that OncoScore is a useful prioritization tool for the annotation of somatic mutations in human cancer. Genes mutated selectively in BC samples had higher OncoScore values than genes mutated in CP at diagnosis (which contain also many passenger mutations). In addition, OncoScore can identify some genes mutated in CML patients at diagnosis which could be relevant for response to treatment, although not usually linked to hematological malignancies.
Gambacorti-Passerini:Pfizer: Consultancy, Research Funding; Bristol-Myers Squibb: Consultancy.
Author notes
Asterisk with author names denotes non-ASH members.