Key Points
A BM examination is the gold standard for the diagnosis of MDS, but it is invasive and subjective.
A predictive algorithm/app using data of 10 readily available parameters from 1004 subjects was developed to help diagnose/rule out MDS.
Abstract
We present a noninvasive Web-based app to help exclude or diagnose myelodysplastic syndrome (MDS), a bone marrow (BM) disorder with cytopenias and leukemic risk, diagnosed by BM examination. A sample of 502 MDS patients from the European MDS (EUMDS) registry (n > 2600) was combined with 502 controls (all BM proven). Gradient-boosted models (GBMs) were used to predict/exclude MDS using demographic, clinical, and laboratory variables. Area under the receiver operating characteristic curve (AUC), sensitivity, and specificity were used to evaluate the models, and performance was validated using 100 times fivefold cross-validation. Model stability was assessed by repeating its fit using different randomly chosen groups of 502 EUMDS cases. AUC was 0.96 (95% confidence interval, 0.95-0.97). MDS is predicted/excluded accurately in 86% of patients with unexplained anemia. A GBM score (range, 0-1) of less than 0.68 (GBM < 0.68) resulted in a negative predictive value of 0.94, that is, MDS was excluded. GBM ≥ 0.82 provided a positive predictive value of 0.88, that is, MDS. The diagnosis of the remaining patients (0.68 ≤ GBM < 0.82) is indeterminate. The discriminating variables: age, sex, hemoglobin, white blood cells, platelets, mean corpuscular volume, neutrophils, monocytes, glucose, and creatinine. A Web-based app was developed; physicians could use it to exclude or predict MDS noninvasively in most patients without a BM examination. Future work will add peripheral blood cytogenetics/genetics, EUMDS-based prospective validation, and prognostication.
Introduction
An important trend in modern medicine is to develop less-invasive diagnostic and therapeutic techniques that can replace invasive procedures, while maintaining high accuracy and efficacy.1,2 In addition, patients expect to be involved, and that their preferences be considered.3 The use of digital systems in clinical practice allows data collection, computer analysis, and machine learning as well as development of algorithms that were not possible in the past. These systems can improve diagnostic techniques and make them less invasive.4 Here, we propose a new paradigm to help in the diagnosis and exclusion of myelodysplastic syndromes (MDS).
MDS is a clonal bone marrow (BM) stem cell disorder, and the median age of onset is in the eighth decade of life.5-7 MDS is characterized by abnormal hematopoietic maturation and differentiation that leads to cytopenias, mainly symptomatic anemia, and the potential for leukemic transformation.7,8 The current gold standard for diagnosis is BM examination.8-10 Although considered a common and relatively straightforward procedure, it is still invasive, painful, and occasionally associated with infectious and bleeding complications.9,10-12 Such examination also depends on subjective interpretation of morphology. Many patients and their physicians prefer to avoid this examination. The lack of diagnosis or its delay may result in disease progression and may prevent patient access to effective treatment. In some countries, this may also prevent the patient from receiving the social and financial privileges accorded to those diagnosed with MDS.13,14
We have developed an algorithm to help in the diagnosis or exclusion of MDS based on demographic, clinical, and laboratory parameters that would obviate, in many patients, the need for a BM examination. In our previous work, we introduced a formula that incorporated 6 clinical variables (age, sex, hemoglobin [Hb], mean corpuscular volume [MCV], white blood cells [WBCs], and platelets [PLTs]). Using a logistic regression model, we were able to classify patients into 1 of 3 categories: probable MDS (pMDS), probably not MDS (pnMDS), and indeterminate.15 We performed internal validation with a new set of patients. Approximately 50% of the patients could be classified as either pMDS or pnMDS. The model was improved by increasing the number of studied individuals, adding more variables, and using a more appropriate model, the gradient-boosted model (GBM).16,17 Here, we have improved the method using the new GBM, more variables, and many more patients. A Web app has been developed that would help a clinician diagnose, and especially rule out, MDS noninvasively, without BM examination, in ≈86% of patients.
Methods: patients and model development
Patients
For the model, 502 (BM based) diagnosed MDS patients were randomly selected from the European MDS (EUMDS) registry.5,6 The criteria for MDS diagnosis in the EUMDS registry have been published earlier.8 To choose controls, we reviewed consecutive reports from the BM registry of the Tel Aviv Sourasky Medical Center (TASMC).16,17 The control group included subjects aged 50 years and older who had undergone BM examination (BME) between January 2011 and December 2018, with BM reported as normal. The indication for BME in most of these individuals was the evaluation of an unexplained anemia; some, for staging of lymphoproliferative disorders. Patients with BM involvement as a part of a hematological or other disease or with any degree of BM dysplasia could not serve as controls. The characteristics of the control group (n = 502), as well as the MDS patient study group, are both described in Table 1.
. | MDS, mean (SD) or % . | Controls, mean (SD) or % . | P . |
---|---|---|---|
Age, y | 72.5 (9.9) | 69.3 (9.8) | <10−4 |
Sex, M/F | 57/43 | 58/42 | =.85 |
Hb | 10.0 (1.9) | 11.2 (2.2) | <10−4 |
WBC | 5.1 (3.0) | 7.8 (5.4) | <10−4 |
Platelets | 205 (154) | 213 (140) | =.38 |
MCV | 97.1 (10.6) | 89.9 (9.4) | <10−4 |
Neutrophils | 3.0 (2.4) | 5.4 (4.7) | <10−4 |
Monocytes | 0.45 (0.41) | 0.65 (0.54) | <10−4 |
Glucose, g/dL | 111.6 (39.0) | 117.0 (51.9) | =.075 |
Creatinine, g/dL | 1.0 (0.42) | 1.3 (1.04) | <10−4 |
. | MDS, mean (SD) or % . | Controls, mean (SD) or % . | P . |
---|---|---|---|
Age, y | 72.5 (9.9) | 69.3 (9.8) | <10−4 |
Sex, M/F | 57/43 | 58/42 | =.85 |
Hb | 10.0 (1.9) | 11.2 (2.2) | <10−4 |
WBC | 5.1 (3.0) | 7.8 (5.4) | <10−4 |
Platelets | 205 (154) | 213 (140) | =.38 |
MCV | 97.1 (10.6) | 89.9 (9.4) | <10−4 |
Neutrophils | 3.0 (2.4) | 5.4 (4.7) | <10−4 |
Monocytes | 0.45 (0.41) | 0.65 (0.54) | <10−4 |
Glucose, g/dL | 111.6 (39.0) | 117.0 (51.9) | =.075 |
Creatinine, g/dL | 1.0 (0.42) | 1.3 (1.04) | <10−4 |
M/F, male/female ratio; SD, standard deviation.
The institutional review board of the Tel Aviv Sourasky Medical Center approved this study, which was conducted in accordance with the Declaration of Helsinki.
Model development
The clinical and laboratory variables listed in Table 1 (age, sex, Hb, MCV, WBC, PLT, neutrophil and monocyte counts, serum glucose and creatinine) were entered as explanatory variables into a logistic GBM,18,19 with case (MDS patients) or control (patient with MDS excluded) status as outcome, using the R package gbm.20 Most of the variables included in the model were selected from among those routinely measured in patients referred for BME, on the basis of their known association with MDS.8 The caret package21 was used to search for optimal model parameters and also to estimate out-of-sample model performance using 10 times 10-fold cross-validation. The final model used an interaction depth of 5, a shrinkage parameter of 0.001, and was constrained to have at least 10 observations at each terminal node. Because the caret training function requires a complete variable data set, missing values in the data were imputed using bagged tree models for each variable (using the caret function preProcess). Imputation was not required for the final model, as the gradient-boosted trees can naturally deal with missing data. As data on MDS patients (cases) and controls were obtained from separate sources, with different degrees of precision, all variables were rounded to common precision. This ensured a model fitting to the values and not to the precision. Because of the stochastic nature of a GBM, the sensitivity of the model performance to the choice of the random number seed was examined.
Positive predictive values (PPVs) and negative predictive values (NPVs) were calculated assuming a 20% prevalence of MDS within the population of patients to which the model would be applied in practice: that is, patients with unexplained anemia, in whom other causes of anemia have been excluded, who would likely undergo BM examination in clinical practice.22,23 We also examined a 2-threshold system in which the model is predictive of MDS diagnosis with high PPV above the upper threshold and predictive of MDS exclusion with high NPV below the lower threshold. We targeted a PPV of 90% for the upper threshold and NPV of 95% for the lower threshold. Finally, we repeated the analysis with pretest probabilities of 10% and 30% in addition to the main analysis with 20% probability of disease. All analyses were performed using the software package R, version 3.5.2.24
Results
In Figure 1, the distribution of scores from the GBM, stratified by known case/control status, is shown. The red bars on the right represent patients diagnosed with MDS (cases) and the green bars (left) represent patients for whom MDS has been ruled out by BME (controls). The lavender region represents the overlap between case and control patients. It is notable that there is an excellent separation between patients with and without MDS. Note that in this figure, case and control prevalence is assumed equal, to illustrate the score distributions most clearly; in practice, case prevalence is likely to be much lower (we have taken 20% as indicative in our calculations, see "Model development").
The area under the receiver operating characteristic curve (AUC) for the model fit on the full training data was 0.96 (95% confidence interval [CI], 0.95-0.97) (Figure 2).
The relative influence of each of the 10 variables in the GBM18 is shown in Figure 3. Note that the first 3 variables (in order of importance: MCV, serum creatinine, and neutrophil count) are responsible for >55% of the influence on the predictive model. Other hematologic and chemistry variables, including lactate dehydrogenase, bilirubin, and other routine laboratory parameters were tested and found to have an insignificant contribution.
The model has a sensitivity of 88% and specificity of 95%. Assuming a case (MDS) prevalence of 20% in the population of patients with unexplained anemia,23 setting a probability threshold of 0.68 to achieve an NPV of 0.95, any patient with a predicted GBM probability (GBMP) of <0.68 would be classified as predicted not to have MDS. Setting a probability threshold of 0.82 (i.e. GBMP ≥ 0.82) would classify a subject as predicted to have MDS, and would achieve a PPV of 0.90 (at which point the NPV is also 0.90). In reality, the upper and lower thresholds achieved PPV and NPV of 88.4% and 94.4% respectively (Table 2). Using these two thresholds defines three regions: (i) for GBMP ≥ 0.82 a patient is predicted to have probable MDS (pMDS, Figure 1, red vertical line, on the right), (ii) for GBMP < 0.68 a patient is predicted to be probably not MDS (pnMDS, Figure 1, green line) and (iii) for 0.68 ≤ GBMP < 0.82, no prediction is made (between the 2 lines). Here, 5% of controls and 23% of MDS patients (14% of the entire group) lie in the no-prediction zone between these 2 thresholds. For a comparison, in our earlier logistic regression model, ≈50% of the patients fell into this region.15
. | MDS, n (%) . | No MDS, n (%) . | PPV, % [95% CI] . | NPV, % [95% CI] . |
---|---|---|---|---|
Total (all patients) | 502 (100) | 502 (100) | 88.4 [79.9, 93.6] | 94.3 [93.4, 95.2] |
Cytopenia: WHO* | ||||
Anemia | 454 (90.44) | 354 (70.52) | 85.0 [74.8, 91.6] | 94.8 [93.8, 95.7] |
Neutropenia | 178 (35.46) | 66 (13.20) | 73.4 [51.5, 87.8] | 97.4 [95.9, 98.4] |
Thrombocytopenia | 210 (41.83) | 174 (34.66) | 86.8 [67.9, 95.3] | 93.2 [91.6, 94.5] |
Bicytopenia | 184 (36.65) | 112 (22.40) | 82.2 [60.2, 93.5] | 93.3 [91.5, 94.8] |
Pancytopenia | 83 (16.53) | 31 (6.20) | 72.3 [40.4, 91.0] | 98.2 [95.7, 99.2] |
Severe cytopenia: IPSS† | ||||
Anemia | 244 (48.61) | 151 (30.08) | 85.7 [69.4, 94.1] | 95.7 [94.3, 96.8] |
Neutropenia | 135 (26.89) | 48 (9.60) | 89.1 [54.0, 98.3] | 97.6 [95.7, 98.6] |
Thrombocytopenia | 124 (24.70) | 103 (20.52) | 90.0 [60.9, 96.3] | 93.8 [91.7, 95.4] |
Bicytopenia | 94 (18.73) | 41 (8.20) | 84.1 [48.7, 93.9] | 96.9 [94.6, 98.2] |
Pancytopenia | 25 (4.98) | 9 (1.80) | 57.5 [21.5, 79.8] | 97.5 [90.8, 99.4] |
. | MDS, n (%) . | No MDS, n (%) . | PPV, % [95% CI] . | NPV, % [95% CI] . |
---|---|---|---|---|
Total (all patients) | 502 (100) | 502 (100) | 88.4 [79.9, 93.6] | 94.3 [93.4, 95.2] |
Cytopenia: WHO* | ||||
Anemia | 454 (90.44) | 354 (70.52) | 85.0 [74.8, 91.6] | 94.8 [93.8, 95.7] |
Neutropenia | 178 (35.46) | 66 (13.20) | 73.4 [51.5, 87.8] | 97.4 [95.9, 98.4] |
Thrombocytopenia | 210 (41.83) | 174 (34.66) | 86.8 [67.9, 95.3] | 93.2 [91.6, 94.5] |
Bicytopenia | 184 (36.65) | 112 (22.40) | 82.2 [60.2, 93.5] | 93.3 [91.5, 94.8] |
Pancytopenia | 83 (16.53) | 31 (6.20) | 72.3 [40.4, 91.0] | 98.2 [95.7, 99.2] |
Severe cytopenia: IPSS† | ||||
Anemia | 244 (48.61) | 151 (30.08) | 85.7 [69.4, 94.1] | 95.7 [94.3, 96.8] |
Neutropenia | 135 (26.89) | 48 (9.60) | 89.1 [54.0, 98.3] | 97.6 [95.7, 98.6] |
Thrombocytopenia | 124 (24.70) | 103 (20.52) | 90.0 [60.9, 96.3] | 93.8 [91.7, 95.4] |
Bicytopenia | 94 (18.73) | 41 (8.20) | 84.1 [48.7, 93.9] | 96.9 [94.6, 98.2] |
Pancytopenia | 25 (4.98) | 9 (1.80) | 57.5 [21.5, 79.8] | 97.5 [90.8, 99.4] |
IPSS, International Prognostic Scoring System; WHO, World Health Organization.
Cytopenia according to WHO criteria: anemia (hemoglobin: <12 g/dL, women; <13 g/dL, men), neutropenia (absolute neutrophil count, <1.8 × 109/L), and thrombocytopenia (platelets, <150 × 109/L).
Severe cytopenia, using IPSS criteria: anemia (hemoglobin, <10 g/dL), neutropenia (absolute neutrophil count, <1.5 × 109/L), and thrombocytopenia (platelets, <100 × 109/L).
To determine the robustness of this model, we have examined its predictive characteristics in a variety of situations. Although most patients being evaluated for MDS have anemia, others have deficiencies in other cell lines, or in multiple cell lines. Table 2 displays the PPV and NPV for patients with anemia, neutropenia, and thrombocytopenia, as well as bi- and pan-cytopenia. Approximately 90% of the MDS patients had anemia; ≈35% to 40% of them had neutropenia, thrombocytopenia or bicytopenia, and ≈15% had pancytopenia, all according to World Health Organization (WHO) criteria. Using the more severe cytopenia criteria as would be used for the International Prognostic Scoring System (IPSS) score, ≈50% of MDS patients were severely anemic, ≈20-25% neutropenic, thrombocytopenic, or bicytopenic, and 5% pancytopenic. The PPV is lower, ranging from 72% to 90% (58% for severe pancytopenia), and the CIs broaden. Most important, however, is that the NPV and the lower limits of its 95% CI are all above 90%. This emphasizes the importance of this model at this stage as an effective “rule out” predictor.
Finally, we examined variation in pretest probability. We have assumed that the a priori prevalence of MDS in our patient population with unexplained anemia is ≈20%. Recognizing that this prevalence could vary according to age or other factors, we looked at the model’s performance with the full data set, also using 10% and 30% pretest probabilities.
Using an a priori prevalence of 10%, PPV = 77.2% (95% CI, 63.8%, 86.7%) and NPV = 97.4% (97.0%, 97.8%). With a 30% prevalence, PPV = 92.9% (87.2%, 96.2%) and NPV = 90.8% (89.3%, 92.1%).
To evaluate and internally validate the model, 25 times repeated fivefold cross-validation was used on the training data to get an estimate of out-of-sample performance. The cross-validation process was performed on the GBM fitting process, under the assumptions of fixed shrinkage value and interaction depth. This gave an AUC of 0.88. For comparison, logistic regression achieved an AUC of 0.82 under similar repeated cross-validation. The choice of random number seed used in the GBM construction was examined and the model was found to be insensitive to this choice.
To translate this methodology to a practical tool for clinicians, we have developed a Web-based predictor calculator (Figure 4). Figure 4A provides both the Web address as well as the quick response code. Upon entering the Web site, a window opens into which the values for the 10 variables should be entered (Figure 4B). In Figure 5, 3 examples are shown demonstrating typical data for patients with pMDS (Figure 5A), pnMDS (Figure 5B), and indeterminate diagnosis (Figure 5C), respectively. Note that this figure is created assuming a case prevalence of 20% (as opposed to Figure 1, where 50% was assumed).
In summary, assuming that the target population would be ≈20% of patients with unexplained anemia, 10 simple parameters are used in the model. The model sensitivity and specificity are 88% and 95% respectively, with an NPV and PPV of 0.94 and 0.88, respectively. The model helps in exclusion or diagnosis of MDS in 86% of the tested individuals.
Discussion
In 1959, B. J. Davis reported on the use of machine learning to improve diagnostic hematology.25 Today, digital and computational techniques are revolutionizing medicine. The possibility of collecting and analyzing large amounts of data has allowed the development of predictive models for new diagnostic techniques.4,26 These are already being applied in several fields, such as imaging,27,28 nuclear medicine,29 and pathology.30 Digital tools can also improve monitoring, predict outcome and course, and assist in the treatment of disease. Several examples of the endless potential of these tools include: electrocardiographic imaging for monitoring arrhythmias from the body surface,31-33 a smart watch to detect atrial fibrillation,34 a computational algorithm that can predict septic shock,35 tools that can monitor and control hypertension,36 and the development of prostheses by 3-dimensional techniques.37
Less attention has been paid to another potential role of these tools: improving quality of life using less-invasive techniques, while maintaining high accuracy. Today, diagnostic procedures and treatments are assessed not only by their effect on morbidity and mortality, efficacy and toxicity, but also by their effect on quality of life, as well as parameters reported by the patients (patient-reported outcomes).38-44
Here, we propose a noninvasive tool that might, in some situations, obviate the need for a BME, the gold standard for the diagnosis of MDS.7,8 This approach may be appropriate as a predictive tool for the primary care physician evaluating anemic patients, especially those who may be reluctant to undergo a BME.
In clinical practice, we often encounter elderly patients with mildly symptomatic (especially macrocytic) anemia or pancytopenia, for whom the initial workup has excluded the common causes, such as iron, B12, or folate deficiencies, or hemolysis. These individuals have an unexplained anemia and a BME would be the next recommended diagnostic step. This is the patient population who might benefit from such a novel noninvasive diagnostic technique.
The developed computer app is based on an analysis following data collection from >1000 individuals, MDS patients, and non-MDS controls, all BM proven. Several internal validations have confirmed the reliability of the predictive model. In practice, to help in the diagnosis or exclusion of MDS with this model, one needs only to enter 10 readily available clinical parameters such as the patient’s age, sex, blood counts, and routine blood chemical values. The result is a picture and a predictive conclusion (Figures 1 and 5): pMDS (the red area), pnMDS (green), or indeterminate (lavender). We have found that, in this patient population with unexplained anemia, ≈86% of them can have a determination of either pMDS or pnMDS. In the remaining indeterminate group, the patient and the physician would have to discuss whether the BME should be performed to make the definitive diagnosis. Although a long delay in diagnosis can be detrimental, postponing the decision for only 3 to 4 months is usually harmless in this lower-risk population.
We examined the model in patients with neutropenia and thrombocytopenia as well as in those with bicytopenia and pancytopenia. We found that the predictive model continues to be reliable especially with MDS exclusion in almost all of these categories, with NPV values all above 90% and relatively narrow 95% CIs. Moreover, the lower boundaries of the 95% CI are all above 90% as well.
As expected, for prediction of MDS in these groups the accuracy is somewhat diminished, and the 95% CIs are widened. This is in large part owing to the small numbers of patients in these groups. It is likely that for patients with multiple cytopenias, a BM evaluation would be indicated, irrespective of the model prediction.
Most of the variables found to be relevant and introduced into the model (Table 1; Figures 3 and 4) were expected to have an impact and help in the diagnosis. The likelihood of MDS is expected to increase as Hb, WBC, neutrophil, and platelet counts are reduced. The likelihood may also increase with increasing age, and sex has little effect, as expected. These were seen in the model (Figure 3). However, the impact of 2 variables, creatinine and glucose, was less expected. A possible hypothesis for the inverse relationship between creatinine and the incidence of MDS is that normal serum creatinine excludes the anemia associated with renal failure and makes the diagnosis of MDS more likely. The association of glucose and MDS requires further investigation. It is worth mentioning that impaired glucose metabolism in red blood cells,45 and involvement of glucose metabolism in the erythropoiesis in MDS patients, has already been reported.46-48 MCV in diabetes has been investigated but no definitive conclusions made. Although studies reported on lower MCV,49,50 others suggested that the hyperosmolarity is associated with an increased MCV.51 One should bear in mind that variables with high predictive value do not necessarily predict causality. These unexpected findings, however, highlight the power of such computer-based analyses, where the data and the machine learning draw our attention to new biologic phenomena that we had not noticed previously.
The proposed predictive model has some limitations. Although it has a high potential to help in the diagnosis or exclusion of MDS, certain relevant information has not yet been integrated into the model, especially morphology, blast percentage, genetics, and cytogenetics. We and others have suggested that BM morphology is not only subjective, but may also be less important today than in the past.52,53 MDS is not the first hematologic disease diagnosed without a BME. Chronic lymphocytic leukemia is diagnosed using peripheral blood (PB) cytogenetics and flow cytometry,54 and polycythemia vera is diagnosed with the demonstration of JAK-2 mutation in PB.55 Although the BM blast percentage, cytogenetics, and mutational analysis would also not be available, these limitations could eventually be overcome by obtaining PB genetic information,56,57 by flow cytometry,58 and also by medical imaging.59,60
A recent study has demonstrated that specific morphologies may be associated with somatic mutations.61 Perhaps, conversely, specific genetic signatures reflect corresponding morphologic changes. Thus, such genetic mutational information, when obtained from PB, could be a complementary component on the way toward a noninvasive MDS diagnosis, avoiding BME. Other studies on using machine learning diagnostic models have recently been reported.62-64
Today, next-generation sequencing is available in many laboratories and helps in the diagnosis of MDS.8 However, this technique is still not a standard in much of the world and is still not a mandatory component of the diagnosis of MDS. Moreover, although myeloid mutations are increasingly seen with advancing age and are associated with a markedly greater incidence of MDS, their presence is still not sufficient for diagnosis because the vast majority of patients with such genetic signatures do not have MDS.65-67 Although the exact place of the myeloid mutations is not fully determined at this time, its increasing importance makes it very likely that future incorporation of such information into our model will only improve its predictive quality. In the meantime, such a predictive model might be applied by any physician in the community, without the need for performing mutation analysis.
At this time, the principal use of this method would be to help in ruling out MDS without a BME. A BME would be recommended for the indeterminate patients to make a diagnosis, and for those with pMDS, to obtain the morphologic and genetic information. Of course, a BME would also be necessary when the diagnoses of other diseases are under consideration. We envision that, in the future, as the methods for obtaining PB genetic information are perfected, our model would be used to make the diagnosis as well.
Another limitation of the proposed model relates to the control population and to the model’s generalizability. The predictive model and the thresholds set were based on our MDS and control patients, where we assumed a 20% prevalence of MDS in the population of unexplained anemia. Although there is a great deal of information on the prevalence of MDS in the general population, there is a paucity of such information in our population. Whether the prevalence is the same for various regions around the world is also not clear. Because of the paucity of data, we made assumptions of prevalence based on personal experience, the experience of colleagues, and the literature. Our experience, along with that of our colleagues, estimated the MDS prevalence to range from 10% to 30%. We found similar results in estimations and extrapolations from the literature and then chose 20% as the pretest probability for the model.22,23,68-70 The ideal control is the patient with unexplained anemia after the initial negative workup, who has a normal BME. In reality, however, not all control patients fell into that category. Although all of them were at least 50 years old and had a normal BME, a portion of them had undergone the procedure as a part of staging for lymphoproliferative disorder. A control group consisting only of patients with unexplained anemia and a negative workup could probably result in a more accurate diagnostic model. We used our control group and assumed a 20% prevalence of MDS knowing well that neither assumption is perfect. We also do not know for certain whether any of our control patients had a suspicious myeloid mutation or eventually developed MDS with time. It is also possible that some of them had idiopathic or clonal cytopenia of undetermined significance (ICUS or CCUS), but the numbers would be small given the small prevalence of these in the general population. The control group reflects a real-world situation, but to determine the dependence of our method on the a priori prevalence, we checked its performance using 10% and 30% prevalence in addition to the 20%. We found that the NPV remains high, but that PPV is reduced with lower pretest probability. Our future work will perform a prospective external validation using new patient data (MDS and controls) from various centers in the EUMDS group and eventually branch out to other world locations. At least a portion of these data will include genetic information, allowing us to fine-tune the model and examine its robustness.
Because of these limitations, it would still be important for the physician to follow the patient, and with time, if there is a still a significant level of uncertainty, to consider performing a BME to make the definitive diagnosis.
Despite the limitations, the proposed model is indeed a step toward a less-invasive method to help in diagnosis or exclusion of MDS in the patient with unexplained anemia. Another group developed a basic MDS model with 4 variables using logistic regression, and the AUC to predict that confirmed MDS was 0.67.22,71 In our earlier logistic regression MDS model with 6 variables, the AUC was 0.75,15 the NPV was 0.87, and the PPV was 0.65.35 These compare with our current gradient-boosted MDS model, in which the AUC, NPV, and PPV are 0.96, 0.94, and 0.88, respectively.
This MDS model has the potential to be more than a helpful tool in the diagnostic process. In the future, this model could also be tested on patients for estimating prognosis (which at this time requires a BME) and following the GBM score as disease progresses and as patients respond the therapy. Moreover, broadening the concept, it may serve as a platform or example of incorporating big data and machine learning into the diagnostic process of diseases in general, and can serve to stimulate research to use such databases to develop similar noninvasive predictive models for a variety of other diseases.
In summary, a Web-based computer app has been developed to help the physician primarily to exclude MDS in a cytopenic individual and also to predict the possibility of MDS without performing the invasive BME. The app is based on analysis of data collected from >1000 individuals. Ten readily available clinical variables of the suspected patients are introduced into the app to assess the probability that the patient has MDS. In the future, we plan to increase the number of measured variables (eg, red blood cell distribution width, whose relevance has recently been demonstrated72) to improve the predictive power of the model. Moreover, as planned by the EUMDS group, the model will be validated with independent prospective patient data, and applications will be developed to test using the model as a predictive prognostic tool in addition to diagnosis.
Acknowledgments
The authors thank Yocheved Akiva for assistance in preparing the manuscript and Nitzan Cohen Sagy for assistance as research coordinator.This work was carried out within the BM registry of the Tel Aviv Sourasky Medical Center (TASMC) and the EUMDS Registry. The authors acknowledge all patients whose data were contributed to these registries, as well as all local investigators and operational team members for their continuing contribution to the EUMDS registry. The EUMDS Registry is supported by an educational grant from Novartis Pharmacy B.V. Oncology Europe, Amgen Limited, Celgene International, Janssen Pharmaceutica, and Takeda Pharmaceuticals International.
Authorship
Contribution: H.S.O., M.M., and T.d.W. designed and performed the research, analyzed the data, and wrote the paper; S.C., A. Smith, and G.Y. contributed vital analytical tools and contributed to writing the paper; B.A.S., S.B., A.K., S.N., and J.B.-E. gathered the data; and P.F., A. Symeonidis, R.S., J.C., G.S., E.H.-L., L.M., S.L., U.G., M.S.H., K.M., A.G.-B., D.C., L.S., J.M., I.K., C.v.M., and D.B. were involved in study design and writing the paper.
Conflict-of-interest disclosure: The authors declare no competing financial interests for the work described in this manuscript. Potentially perceived conflicts of interest outside the submitted work are as follows. A. Smith received research funding from Novartis, Cilag-Janssen, and Boehringer Ingelheim. P.F. received research funding and/or honoraria from Aprea, Astex, Celgene Corporation, and Jazz Pharmaceuticals. A. Symeonidis received institutional research funding, honoraria and/or consulting fees from Abbvie, Amgen, Bristol-Myers Squibb, Celgene/GenesisPharma, Gilead, Janssen-Cilag, Merck Sharp & Dohme, Novartis, Pfizer, Roche, Sanofi/Genzyme, and Takeda. R.S. received research funding, honoraria and/or consulting fees from Celgene, Novartis, and Teva (Ratiopharm). E.H.-L. received research funding from Celgene. U.G. received research funding and/or honoraria from Amgen, Celgene, Jazz Pharmaceuticals, and Novartis. C.v.M., project manager of the EUMDS Registry, is funded from the EUMDS (educational grants from Novartis Pharmacy B.V. Oncology Europe, Amgen Limited, Celgene International, Janssen Pharmaceutica, and Takeda Pharmaceuticals International) and MDS-RIGHT (grant from EU’s Horizon 2020 program) project budgets. T.d.W. received research funding from Amgen, Celgene, Janssen, Novartis, and Takeda during the conduct of the study, as project coordinator EUMDS. M.M. received research funding and/or honoraria from Novartis. The remaining authors declare no competing financial interests.
Correspondence: Moshe Mittelman, Department of Medicine, Tel Aviv Sourasky Medical Center, 6 Weizmann St, Tel-Aviv 64239, Israel; e-mail: moshemt@tlvmc.gov.il; and Howard S. Oster, Department of Medicine, Tel Aviv Sourasky Medical Center, 6 Weizmann St, Tel-Aviv 64239, Israel; e-mail: howardo@tlvmc.gov.il.
References
Author notes
For data sharing, please contact the corresponding authors, Moshe Mittelman at moshemt@tlvmc.gov.il or Howard S. Oster at howardo@tlvmc.gov.il.