Reinforcement Learning to Optimize the Treatment of Multiple Myeloma

Shain, Kenneth H.; Hart, Daniel; Siqueira Silva, Ariosto; Alugubelli, Raghunandanreddy; De Avila, Gabriel; Sudalagunta, Praneeth Reddy; Tungesvik, Alexandre; Kulkarni, Amit; Blancuicett, Carmelo; Dai, Hongyue; Nishihori, Taiga; Brayer, Jason B.; Blue, Brandon; Alsina, Melissa; Baz, Rachid; Dalton, William S.

doi:10.1182/blood-2019-132234

Over the last decade we have witnessed an explosion in the number of therapeutic options available to patients with multiple myeloma (MM). In spite of the marked improvements in patient outcomes paralleling these approvals, MM remains an incurable malignancy for the vast majority of patients following a course of therapeutic successes and failures. As such, there remains a dire need to develop new tools to improve the management of MM patients. A number of groups are leading efforts to combine big data and artificial intelligence to better inform patient care via precision medicine. At Moffitt, in collaboration with the M2Gen/ORIEN (Oncology Research Information Exchange Network), we have begun to accumulate big data in MM. Patients opt in to (consent) for collection of rich clinical data (demographics, staging, risk, complete disease course treatment data) and in the setting of bone marrow biopsy the allocation of CD138-selected cells for molecular analysis (whole exome sequencing (WES) and RNA sequencing as well as peripheral blood mononuclear cells for WES). To date, we have collected over 1000 samples for over 800 individual patients with plasma cell disorders. In the setting of oncology, the ultimate goal of model will be selection of ideal treatments. We expect that AI analysis may validate of patient response to treatments and enable cohort selection, as real patient cohorts can be selected from those predicted by the model. One approach is to utilize reinforcement learning (RL). In RL, the algorithm attempts to learn actions to optimize a type action a defined state and weight any tradeoffs for maximal reward. Our initial utilization of RL involved a relatively small cohort of 402 patients with treatment medication data. This encompassed 1692 lines of treatment with a mean of 4.21 lines of therapy per patient (Median of 4 lines per patient). This included 132 combinations of 22 myeloma therapeutics. The heterogeneity in treatment is highlighted by the fact that no pathways overlap after line 4. Each Q-value in Q-table is the current reward for an action in a state plus the discounted anticipated future reward for taking that action. Iteration helps you converge on the actual values for the future reward (can be model-free). The end result is a policy, P(s), that tells you what the ideal action is at state. There are a near infinite number of possible states, considering treatment history, age, GEP, cytogenetics, comorbidities, staging and others. We presume that action makes intuitive sense as medication (treatment) only and that reward should be some form of treatment response. We have begun the iterative process of trying different state and reward functions. Median imputation shows 5% improvement in response accuracy over listwise, but median imputation throws off practical accuracy in a binary reward case. While we found that the exercise has great potential. We found that there are possible improvements (multiple imputation). We will need to expand covariate analysis. Combinatorics need to be considered in machine learning in medium-sized data sets. Model-free machine learning is limited on medium-sized data. As such, combined resources and/or utilization of large networks such as ORIEN will be critical for the successful integration of RL or other AI tools in MM. We also learned that adding variables to the model doesn't necessarily increase accuracy. Future work will involve continued application of alternate state/reward functions. Loosen iQ-learning framework to allow for better covariate selection for state/reward functions. Improve imputation techniques to include more covariates and have more certainty in model accuracy. We may also refine accuracy metric to allow for prediction of bucketed response and temporal disease burden (M-spike vs. time). Updated data on a larger cohort will be presented at the annual meeting.

Disclosures

Shain:Adaptive Biotechnologies: Consultancy; Celgene: Membership on an entity's Board of Directors or advisory committees; Bristol-Myers Squibb: Membership on an entity's Board of Directors or advisory committees; Amgen: Membership on an entity's Board of Directors or advisory committees; Takeda: Membership on an entity's Board of Directors or advisory committees; Sanofi Genzyme: Membership on an entity's Board of Directors or advisory committees; AbbVie: Research Funding; Janssen: Membership on an entity's Board of Directors or advisory committees. Dai:M2Gen: Employment. Nishihori:Novartis: Research Funding; Karyopharm: Research Funding. Brayer:Janssen: Consultancy, Speakers Bureau; BMS: Consultancy, Speakers Bureau. Alsina:Bristol-Myers Squibb: Research Funding; Janssen: Speakers Bureau; Amgen: Speakers Bureau. Baz:Celgene: Membership on an entity's Board of Directors or advisory committees, Research Funding; Karyopharm: Membership on an entity's Board of Directors or advisory committees, Research Funding; AbbVie: Research Funding; Merck: Research Funding; Sanofi: Research Funding; Bristol-Myers Squibb: Research Funding. Dalton:MILLENNIUM PHARMACEUTICALS, INC.: Honoraria.

Author notes

*

Asterisk with author names denotes non-ASH members.

2019

Sign in via your Institution

Reinforcement Learning to Optimize the Treatment of Multiple Myeloma

Author notes

Cited By

Email alerts

ASH Publications

American Society of Hematology

Reinforcement Learning to Optimize the Treatment of Multiple Myeloma Free

Author notes

This feature is available to Subscribers Only

My Account

Cited By

Email alerts

ASH Publications

American Society of Hematology

This Feature Is Available To Subscribers Only

Reinforcement Learning to Optimize the Treatment of Multiple Myeloma