Background Accurate diagnosis and risk stratification in myeloproliferative neoplasms (MPNs) rely on expert interpretation of complex clinical records. Outside major academic centers, specialist expertise is scarce, and time-consuming chart reviews extend clinic visits and delay treatment decisions, underscoring the need for automated, point-of-care phenotyping for MPN patients.

Methods Patients with the ICD codes of PV, ET and MF were identified from the electronic health record. Cases were defined as patients meeting the 2017 WHO diagnostic criteria for PV, ET or MF or those with a definitive MPN diagnosis determined by a hematologist. Controls were the patients who met neither the diagnostic criteria nor had a hematologist-assigned diagnosis. Two clinicians extracted demographic, clinical (symptoms, thrombotic history), laboratory (complete blood count, serum chemistry), genetic (mutations, karyotype), bone marrow morphology, diagnostic mentions and prognostic score information from unstructured clinical notes and pathology reports. Prognostic scores included the conventional risk model for PV, the International Prognostic Score of Thrombosis (IPSET) for ET and the Dynamic International Prognostic Scoring System (DIPSS), DIPSS-plus or Mutation-enhanced International Prognostic Scoring System (MIPSS-70) for MF. Discrepancies in data extraction were resolved by a consensus. A random sample of 510 patients (PV: 85, ET: 85, MF: 85, controls: 255) was split into prompt development (n=60) and test (n=450) sets for iterative workflow engineering and evaluation. The workflow included: (i) a rule-based algorithm for automated retrieval of demographic details, the first bone marrow biopsy report, the first clinical note after biopsy and the latest laboratory results within 90 days preceding biopsy (ii) a HIPAA-compliant LLM (GPT-4 turbo) for data extraction from the clinical notes and pathology reports (iii) a regular expressions-based algorithm for post-processing extracted data into structured tables, (iv) a clinician-informed source prioritization algorithm to select data from preferred sources, (v) a modified WHO-informed diagnostic characterization tool incorporating hematologist-assigned diagnoses to distinguish cases from controls and; (vi) an NCCN guidelines-based algorithm for prognostic tool selection and risk-stratification based on available data. Structured prompts were used to extract demographic, clinical, laboratory, genetic and pathological data in a zero-shot deterministic setting. The performance of automated data extraction, diagnostic characterization and prognostication was compared against the manually curated data. Accuracy and F1 scores were computed for data extraction and prognostication tasks while sensitivity and specificity were calculated to assess diagnostic performance.

Results A total of 450 (PV: 150, ET: 150, MF: 150) pathology reports and 50 (PV: 52, ET: 55; MF: 65) clinical notes were identified for 450 test set patients. The test set data extraction accuracy was 95% (reports: 97%, notes: 92%) for PV, 93% (reports: 92%, notes: 94%) for ET and 98% (reports: 99%, notes 97%) for MF. The test set F1 score was 0.95 (reports: 0.96, notes: 0.93) for PV, 0.95 (reports: 0.95, notes: 0.96) for ET and 0.98 (reports: 0.99, notes: 0.97) for MF. PV patients were diagnosed with 100% sensitivity and 98% specificity whereas ET and MF patients were diagnosed with 100% sensitivity and specificity. The accuracy of risk stratification was 100% for PV and ET and 94% for MF. The F1 score for risk stratification was 1.00 for PV and ET and 0.94 for MF.

Conclusions The first-in-class automated framework successfully identified pertinent clinical information, performed accurate data extraction from unstructured records and enabled clinician-guided source prioritization for accurate diagnosis and risk stratification of MPN patients. Prospective validation of the framework is required for widespread clinical implementation.

This content is only available as a PDF.
Sign in via your Institution