Table 1.

Data extraction from 13 included studies

ReferenceSetting (country, hospital type)Type of free-text data and languageCohort size (number of reports), characteristic, true positiveTraining approachText processing approachML approachPerformance measure 
Banerjee et al,16 2018
PMID 29175548 
United States, academic Radiology (CT chest) reports, English 4512 reports from 1 hospital
254/858 true positives in external validation set 
3512 for training, 1000 for testing
10-fold cross validation 
Intelligent word embedding; combines semantic-dictionary mapping and neural embedding Binary LR models (LASSO) PE
Internal validation (n = 1000)
AUC, 0.95
Precision, 97.25%
Recall, 96.70%
F1 score, 0.97
External validation (n = 858)
AUC, 0.96
Precision, 93.03%
Recall, 93.02%
F1 score, 0.94 
Banerjee et al,17 2019
PMID 30477892 
United States, academic Radiology reports (CT chest), English 4512 reports from 1 hospital
True positives not reported 
2512 reports for training, 1000 for calibration, 1000 for testing Global Vectors for Word Representation (GloVe)
Novel domain phrase hierarchy 
CNN model
HNN; without attention mechanism
A-HNN
DPA-HNN 
PE
DPA-HNN
Internal validation (n = 1000)
AUC, 0.99
Precision, 0.99
Recall, 0.99
F1 score, 0.99
External validation 1 (n = 1000)
AUC, 0.94
Precision, 0.94
Recall, 0.81
F1 score, 0.86
External validation 2 (n = 1000)
AUC, 0.93
Precision, 0.80
Recall, 0.80
F1 score, 0.80
External validation 3 (n = 858)
AUC, 0.95
Precision, 0.87
Recall, 0.87
F1 score, 0.87 
Chen et al,18 2018
PMID 29135365 
United States, academic Radiology (CT chest) reports, English 117 915 reports from 1 hospital
38/1 000 true positives in internal validation set
279 of 859 true positives in external validation set 
2500 for training with resampling, 1000 reports for calibration, 1000 for testing GloVe CNN model using Tensor Flow PE
Internal validation (n = 1000)
Sensitivity, 0.950%
Specificity, 0.997%
Accuracy, 0.995%
F1 score, 0.938
External validation (n = 859)
Sensitivity, 0.952%
Specificity, 0.905%
Accuracy, 0.921%
F1 score, 0.891 
Danilov et al,19 2022
PMID 35062094 
Russia, academic All clinical notes, Russian 621 medical cases from 1 hospital
139/621 true positives 
300 for training with resampling, training/testing ratio 80%/20% Semiautomatic IEA RF, LR, SVM with kernel types linear, radial, and polynomial (poly), and K-nearest neighbors PE
RF
Sensitivity, 0.959
Specificity, 0.976
PPV, 0.920
Accuracy, 0.950
F1 score, 0.937 
Dantes et al,20 2018
PMID 29087984 
United States, academic Radiology reports (duplex ultrasound of extremity, CTA chest, or MRI chest), English 2551 reports from 1 hospital
True positives not reported 
4-5 reports for training IDEAL-X IDEAL-X online ML mode, not further specified DVT/PE
Sensitivity, 92% (95% CI, 88.3-96.1)
Specificity, 99% (95% CI, 98.5-99.4) 
Fiszman et al,27 1998
PMID 9929341 
United States, community Radiology reports (V/Q lung scans), English 572 reports from 1 hospital
True positives not reported 
200 for training, 372 for testing Rule-based Bayesian networks PE
Precision, 0.88
Recall, 0.92 
Pham et al,21 2014
PMID 25099227 
France, academic Radiology reports (CTA/CTV chest), French 573 reports from 1 hospital
True positives not reported 
Randomly selected 100 reports to form test set. With the remaining set, tripled the number of positive reports and increased negative reports to match that number; this formed the training set. Human annotation with simple segmentation and tokenization Initially used a Naïve Bayes classifier using Weka to identify optimal feature sets, then used Wapiti implementations of SVM and Maximum entropy (MaxEnt) DVT/PE
MaxEnt
Precision, 1.00
Recall, 0.96
F1 score, 0.98 
Rochefort et al,22 2014
PMID 25332356 
Canada, academic Radiology reports, English 2000 reports from 1649 patients from 5 hospitals
121/2000 true positives for PE, 259 of 2000 true positives for DVT 
10-fold cross validation Bag of words SVM DVT
Sensitivity, 0.80 (95% CI, 0.76-0.85)
PPV, 0.89 (95% CI, 0.85-0.93)
AUC, 0.98 (95% CI, 0.97-0.99)
PE
Sensitivity, 0.79 (95% CI, 0.73-0.85)
PPV, 0.84 (95% CI, 0.75-0.92)
AUC, 0.99 (95% CI, 0.98-1.00) 
Selby et al,23 2018
PMID 30056994 
United States, academic Radiology reports (duplex ultrasound of extremity or CTA chest), English 2746 reports from 2206 post-operative patients from 1 hospital
27/506 true positives for PE, 259/2000 true positives for DVT 
Data set split into 70% training, 30% for testing Bag of words Weka; specific model was not specified DVT
Sensitivity, 85.1%
Specificity, 94.6%
PPV, 78.4%
NPV, 96.5%
PE
Sensitivity, 90.0%
Specificity, 98.7%
PPV, 81.8%
NPV, 99.3% 
Shah et al,26 2020
PMID 32600201 
United States, academic All clinical notes, English 1000 notes from 1 hospital
True positives not reported 
400 for training, 600 for testing Rule-based Model not specified, used the tool Extractor from CloudMedX DVT/PE
Accuracy, 90.0%, Sensitivity, 97.0%, Specificity, 86.0% 
Weikert et al,24 2020
PMID 32135443 
Switzerland, academic Radiology reports (CTA chest), German 4397 reports from 1 hospital
209 of 1377 true positives 
2801 reports (all reports from years 2016-2017) used for training, 1377 reports (from year 2018) used for testing
3-fold cross validation 
Term frequency-inverse document frequency (tf-idf) and word2vec model SVM and RF using Scikit
CNN using Tensor Flow 
PE
CNN
Sensitivity, 97.7% (95% CI, 94.6-99.2)
Specificity, 99.4% (95% CI, 98.8-99.8)
PPV, 96.8% (95% CI, 93.5-98.4)
NPV, 99.6% (95% CI, 99.0-99.8)
Accuracy, 99.1% (95% CI, 98.5-99.6)
F1 score, 0.972 (95% CI, 0.963-0.981) 
Wendelboe et al,28 2022
PMID 37206160 
United States, academic Radiology reports (CTA chest, duplex ultrasound of extremity, V/Q lung scans), English 1591 reports from 1 hospital, 1487 reports from another hospital for a total of 3078 reports
1204 of 3078 true positives 
Training based on Dantes et al20  IDEAL-X IDEAL-X online ML mode, not further specified DVT/PE
Accuracy, 93.7 (95% CI, 93.7-93.8)
Sensitivity, 96.3 (95% CI, 96.2-96.4)
Specificity, 92 (95% CI, 91.9-92)
PPV, 89.1 (95% CI, 89-89.2)
NPV, 97.3 (95% CI, 97.3-97.4) 
Yu et al,25 2014
PMID 25117751 
United States, academic Radiology reports (CTA chest), English 10 330 reports from 1 hospital
1 972/10 330 true positives 
50% for training, 50% for testing Rule-based NILE system, output converted to numeric features LR with adaptive LASSO penalty PE
PPV, 0.95
NPV, 0.99
AUC, 0.998 ± 0.005
F1 score, 0.96 
ReferenceSetting (country, hospital type)Type of free-text data and languageCohort size (number of reports), characteristic, true positiveTraining approachText processing approachML approachPerformance measure 
Banerjee et al,16 2018
PMID 29175548 
United States, academic Radiology (CT chest) reports, English 4512 reports from 1 hospital
254/858 true positives in external validation set 
3512 for training, 1000 for testing
10-fold cross validation 
Intelligent word embedding; combines semantic-dictionary mapping and neural embedding Binary LR models (LASSO) PE
Internal validation (n = 1000)
AUC, 0.95
Precision, 97.25%
Recall, 96.70%
F1 score, 0.97
External validation (n = 858)
AUC, 0.96
Precision, 93.03%
Recall, 93.02%
F1 score, 0.94 
Banerjee et al,17 2019
PMID 30477892 
United States, academic Radiology reports (CT chest), English 4512 reports from 1 hospital
True positives not reported 
2512 reports for training, 1000 for calibration, 1000 for testing Global Vectors for Word Representation (GloVe)
Novel domain phrase hierarchy 
CNN model
HNN; without attention mechanism
A-HNN
DPA-HNN 
PE
DPA-HNN
Internal validation (n = 1000)
AUC, 0.99
Precision, 0.99
Recall, 0.99
F1 score, 0.99
External validation 1 (n = 1000)
AUC, 0.94
Precision, 0.94
Recall, 0.81
F1 score, 0.86
External validation 2 (n = 1000)
AUC, 0.93
Precision, 0.80
Recall, 0.80
F1 score, 0.80
External validation 3 (n = 858)
AUC, 0.95
Precision, 0.87
Recall, 0.87
F1 score, 0.87 
Chen et al,18 2018
PMID 29135365 
United States, academic Radiology (CT chest) reports, English 117 915 reports from 1 hospital
38/1 000 true positives in internal validation set
279 of 859 true positives in external validation set 
2500 for training with resampling, 1000 reports for calibration, 1000 for testing GloVe CNN model using Tensor Flow PE
Internal validation (n = 1000)
Sensitivity, 0.950%
Specificity, 0.997%
Accuracy, 0.995%
F1 score, 0.938
External validation (n = 859)
Sensitivity, 0.952%
Specificity, 0.905%
Accuracy, 0.921%
F1 score, 0.891 
Danilov et al,19 2022
PMID 35062094 
Russia, academic All clinical notes, Russian 621 medical cases from 1 hospital
139/621 true positives 
300 for training with resampling, training/testing ratio 80%/20% Semiautomatic IEA RF, LR, SVM with kernel types linear, radial, and polynomial (poly), and K-nearest neighbors PE
RF
Sensitivity, 0.959
Specificity, 0.976
PPV, 0.920
Accuracy, 0.950
F1 score, 0.937 
Dantes et al,20 2018
PMID 29087984 
United States, academic Radiology reports (duplex ultrasound of extremity, CTA chest, or MRI chest), English 2551 reports from 1 hospital
True positives not reported 
4-5 reports for training IDEAL-X IDEAL-X online ML mode, not further specified DVT/PE
Sensitivity, 92% (95% CI, 88.3-96.1)
Specificity, 99% (95% CI, 98.5-99.4) 
Fiszman et al,27 1998
PMID 9929341 
United States, community Radiology reports (V/Q lung scans), English 572 reports from 1 hospital
True positives not reported 
200 for training, 372 for testing Rule-based Bayesian networks PE
Precision, 0.88
Recall, 0.92 
Pham et al,21 2014
PMID 25099227 
France, academic Radiology reports (CTA/CTV chest), French 573 reports from 1 hospital
True positives not reported 
Randomly selected 100 reports to form test set. With the remaining set, tripled the number of positive reports and increased negative reports to match that number; this formed the training set. Human annotation with simple segmentation and tokenization Initially used a Naïve Bayes classifier using Weka to identify optimal feature sets, then used Wapiti implementations of SVM and Maximum entropy (MaxEnt) DVT/PE
MaxEnt
Precision, 1.00
Recall, 0.96
F1 score, 0.98 
Rochefort et al,22 2014
PMID 25332356 
Canada, academic Radiology reports, English 2000 reports from 1649 patients from 5 hospitals
121/2000 true positives for PE, 259 of 2000 true positives for DVT 
10-fold cross validation Bag of words SVM DVT
Sensitivity, 0.80 (95% CI, 0.76-0.85)
PPV, 0.89 (95% CI, 0.85-0.93)
AUC, 0.98 (95% CI, 0.97-0.99)
PE
Sensitivity, 0.79 (95% CI, 0.73-0.85)
PPV, 0.84 (95% CI, 0.75-0.92)
AUC, 0.99 (95% CI, 0.98-1.00) 
Selby et al,23 2018
PMID 30056994 
United States, academic Radiology reports (duplex ultrasound of extremity or CTA chest), English 2746 reports from 2206 post-operative patients from 1 hospital
27/506 true positives for PE, 259/2000 true positives for DVT 
Data set split into 70% training, 30% for testing Bag of words Weka; specific model was not specified DVT
Sensitivity, 85.1%
Specificity, 94.6%
PPV, 78.4%
NPV, 96.5%
PE
Sensitivity, 90.0%
Specificity, 98.7%
PPV, 81.8%
NPV, 99.3% 
Shah et al,26 2020
PMID 32600201 
United States, academic All clinical notes, English 1000 notes from 1 hospital
True positives not reported 
400 for training, 600 for testing Rule-based Model not specified, used the tool Extractor from CloudMedX DVT/PE
Accuracy, 90.0%, Sensitivity, 97.0%, Specificity, 86.0% 
Weikert et al,24 2020
PMID 32135443 
Switzerland, academic Radiology reports (CTA chest), German 4397 reports from 1 hospital
209 of 1377 true positives 
2801 reports (all reports from years 2016-2017) used for training, 1377 reports (from year 2018) used for testing
3-fold cross validation 
Term frequency-inverse document frequency (tf-idf) and word2vec model SVM and RF using Scikit
CNN using Tensor Flow 
PE
CNN
Sensitivity, 97.7% (95% CI, 94.6-99.2)
Specificity, 99.4% (95% CI, 98.8-99.8)
PPV, 96.8% (95% CI, 93.5-98.4)
NPV, 99.6% (95% CI, 99.0-99.8)
Accuracy, 99.1% (95% CI, 98.5-99.6)
F1 score, 0.972 (95% CI, 0.963-0.981) 
Wendelboe et al,28 2022
PMID 37206160 
United States, academic Radiology reports (CTA chest, duplex ultrasound of extremity, V/Q lung scans), English 1591 reports from 1 hospital, 1487 reports from another hospital for a total of 3078 reports
1204 of 3078 true positives 
Training based on Dantes et al20  IDEAL-X IDEAL-X online ML mode, not further specified DVT/PE
Accuracy, 93.7 (95% CI, 93.7-93.8)
Sensitivity, 96.3 (95% CI, 96.2-96.4)
Specificity, 92 (95% CI, 91.9-92)
PPV, 89.1 (95% CI, 89-89.2)
NPV, 97.3 (95% CI, 97.3-97.4) 
Yu et al,25 2014
PMID 25117751 
United States, academic Radiology reports (CTA chest), English 10 330 reports from 1 hospital
1 972/10 330 true positives 
50% for training, 50% for testing Rule-based NILE system, output converted to numeric features LR with adaptive LASSO penalty PE
PPV, 0.95
NPV, 0.99
AUC, 0.998 ± 0.005
F1 score, 0.96 

A-HNN, attention–based hierarchical neural network; CTA, computed tomography angiography; CTV, computed tomography venography; DPA-HNN, domain phrase attention–based hierarchical neural network; HNN, hierarchial neural network; IEA, information extraction algorithm; LASSO, binary logistic regression models; MaxEnt, maximum entropy; MRI, magnetic resonance imaging; NILE, narrative information linear extraction; PE, pulmonary embolism; RF, random forest; V/Q, ventilation/perfusion.

If multiple models were used, the model with the best performance measure is reported.

or Create an Account

Close Modal
Close Modal