Data extraction from 13 included studies
Reference . | Setting (country, hospital type) . | Type of free-text data and language . | Cohort size (number of reports), characteristic, true positive . | Training approach . | Text processing approach . | ML approach . | Performance measure∗ . |
---|---|---|---|---|---|---|---|
Banerjee et al,16 2018 PMID 29175548 | United States, academic | Radiology (CT chest) reports, English | 4512 reports from 1 hospital 254/858 true positives in external validation set | 3512 for training, 1000 for testing 10-fold cross validation | Intelligent word embedding; combines semantic-dictionary mapping and neural embedding | Binary LR models (LASSO) | PE Internal validation (n = 1000) AUC, 0.95 Precision, 97.25% Recall, 96.70% F1 score, 0.97 External validation (n = 858) AUC, 0.96 Precision, 93.03% Recall, 93.02% F1 score, 0.94 |
Banerjee et al,17 2019 PMID 30477892 | United States, academic | Radiology reports (CT chest), English | 4512 reports from 1 hospital True positives not reported | 2512 reports for training, 1000 for calibration, 1000 for testing | Global Vectors for Word Representation (GloVe) Novel domain phrase hierarchy | CNN model HNN; without attention mechanism A-HNN DPA-HNN | PE DPA-HNN Internal validation (n = 1000) AUC, 0.99 Precision, 0.99 Recall, 0.99 F1 score, 0.99 External validation 1 (n = 1000) AUC, 0.94 Precision, 0.94 Recall, 0.81 F1 score, 0.86 External validation 2 (n = 1000) AUC, 0.93 Precision, 0.80 Recall, 0.80 F1 score, 0.80 External validation 3 (n = 858) AUC, 0.95 Precision, 0.87 Recall, 0.87 F1 score, 0.87 |
Chen et al,18 2018 PMID 29135365 | United States, academic | Radiology (CT chest) reports, English | 117 915 reports from 1 hospital 38/1 000 true positives in internal validation set 279 of 859 true positives in external validation set | 2500 for training with resampling, 1000 reports for calibration, 1000 for testing | GloVe | CNN model using Tensor Flow | PE Internal validation (n = 1000) Sensitivity, 0.950% Specificity, 0.997% Accuracy, 0.995% F1 score, 0.938 External validation (n = 859) Sensitivity, 0.952% Specificity, 0.905% Accuracy, 0.921% F1 score, 0.891 |
Danilov et al,19 2022 PMID 35062094 | Russia, academic | All clinical notes, Russian | 621 medical cases from 1 hospital 139/621 true positives | 300 for training with resampling, training/testing ratio 80%/20% | Semiautomatic IEA | RF, LR, SVM with kernel types linear, radial, and polynomial (poly), and K-nearest neighbors | PE RF Sensitivity, 0.959 Specificity, 0.976 PPV, 0.920 Accuracy, 0.950 F1 score, 0.937 |
Dantes et al,20 2018 PMID 29087984 | United States, academic | Radiology reports (duplex ultrasound of extremity, CTA chest, or MRI chest), English | 2551 reports from 1 hospital True positives not reported | 4-5 reports for training | IDEAL-X | IDEAL-X online ML mode, not further specified | DVT/PE Sensitivity, 92% (95% CI, 88.3-96.1) Specificity, 99% (95% CI, 98.5-99.4) |
Fiszman et al,27 1998 PMID 9929341 | United States, community | Radiology reports (V/Q lung scans), English | 572 reports from 1 hospital True positives not reported | 200 for training, 372 for testing | Rule-based | Bayesian networks | PE Precision, 0.88 Recall, 0.92 |
Pham et al,21 2014 PMID 25099227 | France, academic | Radiology reports (CTA/CTV chest), French | 573 reports from 1 hospital True positives not reported | Randomly selected 100 reports to form test set. With the remaining set, tripled the number of positive reports and increased negative reports to match that number; this formed the training set. | Human annotation with simple segmentation and tokenization | Initially used a Naïve Bayes classifier using Weka to identify optimal feature sets, then used Wapiti implementations of SVM and Maximum entropy (MaxEnt) | DVT/PE MaxEnt Precision, 1.00 Recall, 0.96 F1 score, 0.98 |
Rochefort et al,22 2014 PMID 25332356 | Canada, academic | Radiology reports, English | 2000 reports from 1649 patients from 5 hospitals 121/2000 true positives for PE, 259 of 2000 true positives for DVT | 10-fold cross validation | Bag of words | SVM | DVT Sensitivity, 0.80 (95% CI, 0.76-0.85) PPV, 0.89 (95% CI, 0.85-0.93) AUC, 0.98 (95% CI, 0.97-0.99) PE Sensitivity, 0.79 (95% CI, 0.73-0.85) PPV, 0.84 (95% CI, 0.75-0.92) AUC, 0.99 (95% CI, 0.98-1.00) |
Selby et al,23 2018 PMID 30056994 | United States, academic | Radiology reports (duplex ultrasound of extremity or CTA chest), English | 2746 reports from 2206 post-operative patients from 1 hospital 27/506 true positives for PE, 259/2000 true positives for DVT | Data set split into 70% training, 30% for testing | Bag of words | Weka; specific model was not specified | DVT Sensitivity, 85.1% Specificity, 94.6% PPV, 78.4% NPV, 96.5% PE Sensitivity, 90.0% Specificity, 98.7% PPV, 81.8% NPV, 99.3% |
Shah et al,26 2020 PMID 32600201 | United States, academic | All clinical notes, English | 1000 notes from 1 hospital True positives not reported | 400 for training, 600 for testing | Rule-based | Model not specified, used the tool Extractor from CloudMedX | DVT/PE Accuracy, 90.0%, Sensitivity, 97.0%, Specificity, 86.0% |
Weikert et al,24 2020 PMID 32135443 | Switzerland, academic | Radiology reports (CTA chest), German | 4397 reports from 1 hospital 209 of 1377 true positives | 2801 reports (all reports from years 2016-2017) used for training, 1377 reports (from year 2018) used for testing 3-fold cross validation | Term frequency-inverse document frequency (tf-idf) and word2vec model | SVM and RF using Scikit CNN using Tensor Flow | PE CNN Sensitivity, 97.7% (95% CI, 94.6-99.2) Specificity, 99.4% (95% CI, 98.8-99.8) PPV, 96.8% (95% CI, 93.5-98.4) NPV, 99.6% (95% CI, 99.0-99.8) Accuracy, 99.1% (95% CI, 98.5-99.6) F1 score, 0.972 (95% CI, 0.963-0.981) |
Wendelboe et al,28 2022 PMID 37206160 | United States, academic | Radiology reports (CTA chest, duplex ultrasound of extremity, V/Q lung scans), English | 1591 reports from 1 hospital, 1487 reports from another hospital for a total of 3078 reports 1204 of 3078 true positives | Training based on Dantes et al20 | IDEAL-X | IDEAL-X online ML mode, not further specified | DVT/PE Accuracy, 93.7 (95% CI, 93.7-93.8) Sensitivity, 96.3 (95% CI, 96.2-96.4) Specificity, 92 (95% CI, 91.9-92) PPV, 89.1 (95% CI, 89-89.2) NPV, 97.3 (95% CI, 97.3-97.4) |
Yu et al,25 2014 PMID 25117751 | United States, academic | Radiology reports (CTA chest), English | 10 330 reports from 1 hospital 1 972/10 330 true positives | 50% for training, 50% for testing | Rule-based NILE system, output converted to numeric features | LR with adaptive LASSO penalty | PE PPV, 0.95 NPV, 0.99 AUC, 0.998 ± 0.005 F1 score, 0.96 |
Reference . | Setting (country, hospital type) . | Type of free-text data and language . | Cohort size (number of reports), characteristic, true positive . | Training approach . | Text processing approach . | ML approach . | Performance measure∗ . |
---|---|---|---|---|---|---|---|
Banerjee et al,16 2018 PMID 29175548 | United States, academic | Radiology (CT chest) reports, English | 4512 reports from 1 hospital 254/858 true positives in external validation set | 3512 for training, 1000 for testing 10-fold cross validation | Intelligent word embedding; combines semantic-dictionary mapping and neural embedding | Binary LR models (LASSO) | PE Internal validation (n = 1000) AUC, 0.95 Precision, 97.25% Recall, 96.70% F1 score, 0.97 External validation (n = 858) AUC, 0.96 Precision, 93.03% Recall, 93.02% F1 score, 0.94 |
Banerjee et al,17 2019 PMID 30477892 | United States, academic | Radiology reports (CT chest), English | 4512 reports from 1 hospital True positives not reported | 2512 reports for training, 1000 for calibration, 1000 for testing | Global Vectors for Word Representation (GloVe) Novel domain phrase hierarchy | CNN model HNN; without attention mechanism A-HNN DPA-HNN | PE DPA-HNN Internal validation (n = 1000) AUC, 0.99 Precision, 0.99 Recall, 0.99 F1 score, 0.99 External validation 1 (n = 1000) AUC, 0.94 Precision, 0.94 Recall, 0.81 F1 score, 0.86 External validation 2 (n = 1000) AUC, 0.93 Precision, 0.80 Recall, 0.80 F1 score, 0.80 External validation 3 (n = 858) AUC, 0.95 Precision, 0.87 Recall, 0.87 F1 score, 0.87 |
Chen et al,18 2018 PMID 29135365 | United States, academic | Radiology (CT chest) reports, English | 117 915 reports from 1 hospital 38/1 000 true positives in internal validation set 279 of 859 true positives in external validation set | 2500 for training with resampling, 1000 reports for calibration, 1000 for testing | GloVe | CNN model using Tensor Flow | PE Internal validation (n = 1000) Sensitivity, 0.950% Specificity, 0.997% Accuracy, 0.995% F1 score, 0.938 External validation (n = 859) Sensitivity, 0.952% Specificity, 0.905% Accuracy, 0.921% F1 score, 0.891 |
Danilov et al,19 2022 PMID 35062094 | Russia, academic | All clinical notes, Russian | 621 medical cases from 1 hospital 139/621 true positives | 300 for training with resampling, training/testing ratio 80%/20% | Semiautomatic IEA | RF, LR, SVM with kernel types linear, radial, and polynomial (poly), and K-nearest neighbors | PE RF Sensitivity, 0.959 Specificity, 0.976 PPV, 0.920 Accuracy, 0.950 F1 score, 0.937 |
Dantes et al,20 2018 PMID 29087984 | United States, academic | Radiology reports (duplex ultrasound of extremity, CTA chest, or MRI chest), English | 2551 reports from 1 hospital True positives not reported | 4-5 reports for training | IDEAL-X | IDEAL-X online ML mode, not further specified | DVT/PE Sensitivity, 92% (95% CI, 88.3-96.1) Specificity, 99% (95% CI, 98.5-99.4) |
Fiszman et al,27 1998 PMID 9929341 | United States, community | Radiology reports (V/Q lung scans), English | 572 reports from 1 hospital True positives not reported | 200 for training, 372 for testing | Rule-based | Bayesian networks | PE Precision, 0.88 Recall, 0.92 |
Pham et al,21 2014 PMID 25099227 | France, academic | Radiology reports (CTA/CTV chest), French | 573 reports from 1 hospital True positives not reported | Randomly selected 100 reports to form test set. With the remaining set, tripled the number of positive reports and increased negative reports to match that number; this formed the training set. | Human annotation with simple segmentation and tokenization | Initially used a Naïve Bayes classifier using Weka to identify optimal feature sets, then used Wapiti implementations of SVM and Maximum entropy (MaxEnt) | DVT/PE MaxEnt Precision, 1.00 Recall, 0.96 F1 score, 0.98 |
Rochefort et al,22 2014 PMID 25332356 | Canada, academic | Radiology reports, English | 2000 reports from 1649 patients from 5 hospitals 121/2000 true positives for PE, 259 of 2000 true positives for DVT | 10-fold cross validation | Bag of words | SVM | DVT Sensitivity, 0.80 (95% CI, 0.76-0.85) PPV, 0.89 (95% CI, 0.85-0.93) AUC, 0.98 (95% CI, 0.97-0.99) PE Sensitivity, 0.79 (95% CI, 0.73-0.85) PPV, 0.84 (95% CI, 0.75-0.92) AUC, 0.99 (95% CI, 0.98-1.00) |
Selby et al,23 2018 PMID 30056994 | United States, academic | Radiology reports (duplex ultrasound of extremity or CTA chest), English | 2746 reports from 2206 post-operative patients from 1 hospital 27/506 true positives for PE, 259/2000 true positives for DVT | Data set split into 70% training, 30% for testing | Bag of words | Weka; specific model was not specified | DVT Sensitivity, 85.1% Specificity, 94.6% PPV, 78.4% NPV, 96.5% PE Sensitivity, 90.0% Specificity, 98.7% PPV, 81.8% NPV, 99.3% |
Shah et al,26 2020 PMID 32600201 | United States, academic | All clinical notes, English | 1000 notes from 1 hospital True positives not reported | 400 for training, 600 for testing | Rule-based | Model not specified, used the tool Extractor from CloudMedX | DVT/PE Accuracy, 90.0%, Sensitivity, 97.0%, Specificity, 86.0% |
Weikert et al,24 2020 PMID 32135443 | Switzerland, academic | Radiology reports (CTA chest), German | 4397 reports from 1 hospital 209 of 1377 true positives | 2801 reports (all reports from years 2016-2017) used for training, 1377 reports (from year 2018) used for testing 3-fold cross validation | Term frequency-inverse document frequency (tf-idf) and word2vec model | SVM and RF using Scikit CNN using Tensor Flow | PE CNN Sensitivity, 97.7% (95% CI, 94.6-99.2) Specificity, 99.4% (95% CI, 98.8-99.8) PPV, 96.8% (95% CI, 93.5-98.4) NPV, 99.6% (95% CI, 99.0-99.8) Accuracy, 99.1% (95% CI, 98.5-99.6) F1 score, 0.972 (95% CI, 0.963-0.981) |
Wendelboe et al,28 2022 PMID 37206160 | United States, academic | Radiology reports (CTA chest, duplex ultrasound of extremity, V/Q lung scans), English | 1591 reports from 1 hospital, 1487 reports from another hospital for a total of 3078 reports 1204 of 3078 true positives | Training based on Dantes et al20 | IDEAL-X | IDEAL-X online ML mode, not further specified | DVT/PE Accuracy, 93.7 (95% CI, 93.7-93.8) Sensitivity, 96.3 (95% CI, 96.2-96.4) Specificity, 92 (95% CI, 91.9-92) PPV, 89.1 (95% CI, 89-89.2) NPV, 97.3 (95% CI, 97.3-97.4) |
Yu et al,25 2014 PMID 25117751 | United States, academic | Radiology reports (CTA chest), English | 10 330 reports from 1 hospital 1 972/10 330 true positives | 50% for training, 50% for testing | Rule-based NILE system, output converted to numeric features | LR with adaptive LASSO penalty | PE PPV, 0.95 NPV, 0.99 AUC, 0.998 ± 0.005 F1 score, 0.96 |
A-HNN, attention–based hierarchical neural network; CTA, computed tomography angiography; CTV, computed tomography venography; DPA-HNN, domain phrase attention–based hierarchical neural network; HNN, hierarchial neural network; IEA, information extraction algorithm; LASSO, binary logistic regression models; MaxEnt, maximum entropy; MRI, magnetic resonance imaging; NILE, narrative information linear extraction; PE, pulmonary embolism; RF, random forest; V/Q, ventilation/perfusion.
If multiple models were used, the model with the best performance measure is reported.