Abstract
Myeloproliferative neoplasms (MPNs) are a group of hematologic malignancies characterized by the clonal proliferation of one or more hematopoietic cell lineages demonstrated by terminal myeloid cell expansion into the peripheral blood. The most commonly diagnosed Philadelphia chromosome negative (Ph-) MPNs include essential thrombocythemia (ET), polycythemia vera (PV), and myelofibrosis (MF).
Assessing various categories of response to a treatment is determined according to multiple factors, including laboratory values; transfusion dependence; degree of splenomegaly; mutational status and variant allelic frequency (VAF); cytogenetic abnormalities; and bone marrow morphological features, such as myeloblasts, fibrosis, and the volume ratio of hematopoietic stem cells in bone marrow [1]. While many of these components exist in electronic health records (EHRs), they are documented with varying degrees of structure. Some terms are recorded as tabular values, whereas others are captured purely in free text (Table 1). However, nearly all clinical trials and reviews studying patients with MPNs rely on expert adjudication of response as determined by these data points.
The current process for determining response involves manual review of the patient's EHR data, an arduous task requiring extensive human effort. To alleviate these efforts, the Richard T. Silver Myeloproliferative Neoplasm Center at Weill Cornell Medicine (WCM) worked in tandem with the Architecture for Research Computing in Health (ARCH) program to develop a method for assessing response in patients with MPNs. A research data repository (RDR) containing data from both outpatient and inpatient EHRs was designed to allow for computational assessment of response.
Structured data elements, including laboratory values, mutational data, and VAF, were extracted from the EHR. A natural language processing (NLP) pipeline using the Leo framework was developed [2] to extract data on cellularity, reticulin fibrosis, and myeloblast count from bone marrow biopsy pathology reports (Figure 1). Other data points, including splenomegaly and symptom burden, continue to require interpretation and manual collection by trained research personnel, who enter these data points into REDCap (Research Electronic Data Capture), a WCM-provisioned secure web application for managing clinical databases. These manual data are then pivoted and loaded into a Microsoft SQL Server environment [3]. Data generated by the MPN RDR were frequently reviewed for quality control, which drove subsequent iterations designed to minimize any identified errors.
After acceptable confidence levels had been achieved, the MPN RDR was queried to provide data that were used to contribute to response assessments [1] for a retrospective review studying PV patients. Fully automated response parameters included hematocrit, platelet, and white blood cell count values; cellularity; reticulin fibrosis; and JAK2V617F VAF. Partially automated response assessments included rates of phlebotomy. Manually collected response criteria included symptom burden, degree of palpable splenomegaly, and indications of hemorrhagic/thrombotic events. Extracted data were merged with manually collected data within clinically justified temporal windows and applied to PV response criteria. The process provided valuable insight on potential modifications to the extraction process.
Future steps include extending these processes to criteria still dependent on manual collection. Efforts are currently under way to apply NLP to hepatosplenomegaly and cytogenetics reports. This process will also be used to assist in response assessments for the spectrum of other MPN subtypes. While the feasibility of a fully comprehensive computable approach to assessing response in MPNs may not be entirely feasible nor even advisable, it is rather the objective that clinical data collection, which has been historically onerous, become automated to the furthest extent possible so as to allow research personnel to focus on the extraction of data elements that do require manual adjudication. Adaption of a similar workflow may help other institutions expand their ability to assess response in patients with MPNs and, potentially, additional hematologic malignancies.
No relevant conflicts of interest to declare.
Author notes
Asterisk with author names denotes non-ASH members.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal