Background: Transparent and objective reporting of treatment toxicities in cancer clinical trials is critical to inform patient-centred, shared decision-making. Previous studies have shown that toxicity reporting is inconsistent and incomplete in randomized controlled trials (RCTs) presented at major conferences in both hematologic (Chin-Yee et al. 2022; Skorupski et al. 2022) and gastrointestinal malignancies (Yu et al. 2023). The objectives of this study were to validate an NLP-based algorithm to identify subjectively minimized toxicity language in conference abstracts, and to evaluate longitudinal changes in the prevalence of minimized language in RCTs presented at ASH.
Methods: For NLP models, data from prior systematic reviews of RCTs presented at ASH 2017-2021 were used as development (RCTs in acute leukemia: Chin-Yee et al. 2022) and validation (multiple myeloma and lymphoma: Skorupski et al. 2022) datasets. Because subjective minimizing language usually exhibits limited variability, we adopted a dictionary-based approach that is highly interpretable. Two dictionaries were developed: the first to identify subjective minimizing toxicity language; the second to identify reporting of patient experiences through Patient-Reported Outcomes (PROs) or Quality-of-Life (QOL) measures. Primary minimizing terms were defined as: “tolerable”, “manageable”, “acceptable”, and “favorable”; secondary minimizing terms were: “feasible”, “safe”, and “limited” (Chin-Yee et al. 2022). The primary outcome was F1 score (summary statistic of accuracy and precision) for identification of primary minimizing language. Based on F1 score in the development set, we operationalized our dictionary to include 3 primary minimizing terms “tolerable”, “manageable”, and “acceptable” (including relevant variants), while dropping “favorable” and all secondary minimizing terms. Precision, recall, F1 score, and accuracy were calculated for both dictionaries in each dataset (see Table 1 for definitions). Validated dictionaries for minimizing terms and for PRO/QOL measures were subsequently applied in a systematic review of RCT abstracts at ASH from 2009-2021 (representing the available time period indexed in Embase) across 3 diseases (acute leukemia, myeloma, and lymphoma) to assess changes in use of subjective minimizing language and reporting of PROs or QOL measures over a priori defined 3 major time periods: earliest available, middle, and most recent. Study inclusion/exclusion criteria are described previously (Chin-Yee et al. 2022).
Results: Study characteristics are reported in Table 1A for NLP development and validation sets.Our dictionary-based method showed a precision of 0.90, recall of 0.82, F1 of 0.86, and accuracy of 0.93 in the development set, values considered sufficient for validation of the NLP model. In the validation set, these values were 0.75, 0.75, 0.75, and 0.82, respectively. This NLP model was applied to evaluate RCTs presented at ASH from 2009-2021 across the 3 diseases (acute leukemia, myeloma, and lymphoma). Following abstract screening, inclusion criteria were met in 68 of 411, 82 of 443, and 82 of 581 acute leukemia, myeloma, and lymphoma RCTs, respectively. NLP-assessed subjective minimization was present in 89 (26.4%) of all studies from 2009-2021; and in 26 (22.0%) RCTs from 2009-2012, 29 (25.4%) RCTs from 2013-2016, 34 (32.3%) RCTs from 2017-2021 ( Table 1B). Time-series analysis and results on PROs and QOL measures will also be presented at ASH.
Discussion: NLP provides a novel, systematic, and scalable approach for evaluating use of subjective minimizing language in clinical trials. Our model showed good accuracy in identifying primary minimizing terms in RCTs presented at ASH from 2017-2021. Applying this model across a broader time period (2009-2021) suggests that use of subjectively minimized toxicity language is common in RCTs at ASH, similar across lymphoid and myeloid malignancies, and potentially increasing over time. NLP could provide an effective means of evaluating conference abstracts on a large scale with the ultimate goal of improving the objectivity of toxicity reporting in clinical trials. In conclusion, our findings suggest that the use of minimizing language is common and may be increasing over time, a concerning finding that implies downplaying of treatment toxicity by investigators.
^co-first and #co-last authors
Disclosures
Lyman:Amgen: Research Funding; Beyond Spring: Consultancy; G1 Therapeutics: Consultancy; Partner Therapeutics: Consultancy; Samsung Bioepis: Consultancy; Merck: Consultancy; Jazz: Consultancy; TEVA: Consultancy; Squibb: Consultancy; Sandoz: Consultancy; Seattle Genetics: Consultancy; Fresenius Kabi: Consultancy. Sholzberg:CSL Behring: Research Funding; Pfizer: Honoraria, Research Funding; Octapharma: Honoraria, Research Funding. Kuderer:Astra Zeneca: Consultancy; Janssen: Consultancy; Pfizer: Consultancy; BMS: Consultancy; Beyond Springs: Consultancy; G1 Therapeutics: Consultancy; Sandoz: Consultancy; Seattle Genetics: Consultancy; Fresenius: Consultancy.