Background: Oral anticoagulants are increasingly prescribed for atrial fibrillation, venous thromboembolism (VTE), arterial thrombosis, and other indications. Existing bleeding risk models (e.g., HAS-BLED, ORBIT, VTE-BLEED) are often linear, drug-specific, and may not adequately capture the complexity of modern treatment. Cancer patients remain a particularly high-risk subgroup.

To address these limitations, we leveraged one of the largest real-world multicenter cohorts of anticoagulated patients and applied interpretable machine learning (ML) models analyzing over 1,000 clinical and laboratory features, aiming to enable dynamic and individualized bleeding risk prediction.

Objective: To develop and validate a machine learning model that integrates clinical, laboratory, and treatment variables to predict bleeding complications in patients receiving anticoagulant therapy, with special focus on high-risk populations such as those with active malignancy.

Methods: We conducted a retrospective, multi-center cohort study using electronic health records from six Israeli medical centers (HY, BZ, SHAMIR, GMC, PORIA, BARZILAI). Adult patients (≥18 years) with at least 90 days of follow-up were included. A landmark design was applied, excluding patients who died during the initial 90 days to minimize immortal time bias.

The primary outcome was major bleeding occurring between days 91 and 455 after the index date, identified via ICD-9 codes and validated clinical criteria. Feature extraction yielded 1,046 variables, including demographics, clinical conditions (ICD-9), chronic diseases, laboratory results, and temporal patterns of care. Features with >90% missing data were excluded, and interaction variables were created (e.g., anemia severity from hemoglobin values).

Three models were evaluated: logistic regression, random forest, and XGBoost. Data were split into training (80%) and testing (20%) sets, ensuring balanced hospital representation. Model performance was assessed using AUC, sensitivity, specificity, and precision. Interpretability was achieved via SHAP values. Validation included both pooled and hospital-specific analyses, and patients were stratified into quintiles of predicted risk.

Results: Among 163,596 patients on anticoagulation, 7,705 (4.7%) experienced significant bleeding, including 2,503 (1.5%) major bleeds. SHAP analysis identified cancer history, baseline creatinine, baseline hemoglobin, age, and anticoagulant class as the strongest predictors. Additional contributors included hypertension, prior bleeding, platelet count, stroke history, diabetes, and chronic anemia. The XGBoost model achieved an AUC of 0.70 (95% CI: 0.67–0.73). Performance varied across hospitals (AUC range: 0.60–0.74). Risk stratification showed strong discrimination: patients in the lowest risk quintile had a 0.4% bleeding rate compared to 7.3% in the highest quintile, an 18-fold difference.

Based on these findings, we established a practical risk calculation tool that uses patient-specific variables to generate individualized bleeding probabilities, enabling clinicians to estimate risk in real time.

Conclusion: This study, analyzing one of the largest cohorts of anticoagulated patients and incorporating over 1,000 clinical and laboratory features, is among the first to apply interpretable ML models for major bleeding prediction across multiple hospitals. By highlighting key risk factors, including malignancy and renal function, the model provides a scalable framework for precision risk stratification. Its implementation may enable targeted interventions for high-risk patients while reducing monitoring intensity in the lowest-risk quintile, where bleeding incidence was only 0.4%.

This content is only available as a PDF.
Sign in via your Institution