Statistical data analyses using maximally selected χ2 square analysis and univariate CART analysis
Gene . | Sample sizes . | Maximally selected χ2statistics* . | CART† . | ||||||
---|---|---|---|---|---|---|---|---|---|
C . | B . | M . | Corrected P . | Estimated normal range . | Decision regions . | Overall error rate . | |||
C . | B . | M . | |||||||
ATM | 5 | 29 | 14 | .25 | — | — | — | — | — |
BCL2 | 6 | 26 | 12 | .01 | 0.38-2.61 | — | — | — | — |
BAX | 6 | 24 | 12 | .11 | — | — | — | — | — |
CCND1 | 9 | 31 | 15 | .01 | 0.17-5.9 | — | — | > 34.4 | 0.11 |
CCND3 | 9 | 31 | 15 | .27 | — | — | — | < 0.23 | 0.33 |
CDK2 | 6 | 21 | 13 | .05 | — | — | — | > 3.13 | 0.30 |
CDK4 | 8 | 25 | 13 | < .001 | 0.38-2.65 | < 2.03 | 2.03-11.96 | > 11.96 | 0.15 |
TP53 | 9 | 34 | 14 | < .001 | 0.53-1.89 | — | — | — | — |
RB1 | 9 | 32 | 15 | .11 | — | — | — | — | — |
CDKN1B | 8 | 19 | 10 | .006 | 0.42-2.37 | < 1.85 | > 2.39 | 1.85-2.39 | 0.24 |
CDKN1A | 9 | 19 | 8 | .79 | — | — | — | — | — |
MYC | 7 | 31 | 14 | .002 | 0.39-2.55 | — | — | — | — |
SELL | 8 | 32 | 6 | .003 | 0.37-2.73 | — | — | — | — |
E2F1 | 6 | 24 | 12 | .24 | — | — | — | > 3.83 | 0.24 |
TFDP2 | 10 | 24 | 15 | .03 | — | — | — | — | — |
ETV5 | 8 | 28 | 15 | < .001 | 0.09-10.9 | — | — | — | — |
TNFSF10 | 6 | 26 | 11 | .13 | — | — | — | > 9.63 | 0.28 |
Gene . | Sample sizes . | Maximally selected χ2statistics* . | CART† . | ||||||
---|---|---|---|---|---|---|---|---|---|
C . | B . | M . | Corrected P . | Estimated normal range . | Decision regions . | Overall error rate . | |||
C . | B . | M . | |||||||
ATM | 5 | 29 | 14 | .25 | — | — | — | — | — |
BCL2 | 6 | 26 | 12 | .01 | 0.38-2.61 | — | — | — | — |
BAX | 6 | 24 | 12 | .11 | — | — | — | — | — |
CCND1 | 9 | 31 | 15 | .01 | 0.17-5.9 | — | — | > 34.4 | 0.11 |
CCND3 | 9 | 31 | 15 | .27 | — | — | — | < 0.23 | 0.33 |
CDK2 | 6 | 21 | 13 | .05 | — | — | — | > 3.13 | 0.30 |
CDK4 | 8 | 25 | 13 | < .001 | 0.38-2.65 | < 2.03 | 2.03-11.96 | > 11.96 | 0.15 |
TP53 | 9 | 34 | 14 | < .001 | 0.53-1.89 | — | — | — | — |
RB1 | 9 | 32 | 15 | .11 | — | — | — | — | — |
CDKN1B | 8 | 19 | 10 | .006 | 0.42-2.37 | < 1.85 | > 2.39 | 1.85-2.39 | 0.24 |
CDKN1A | 9 | 19 | 8 | .79 | — | — | — | — | — |
MYC | 7 | 31 | 14 | .002 | 0.39-2.55 | — | — | — | — |
SELL | 8 | 32 | 6 | .003 | 0.37-2.73 | — | — | — | — |
E2F1 | 6 | 24 | 12 | .24 | — | — | — | > 3.83 | 0.24 |
TFDP2 | 10 | 24 | 15 | .03 | — | — | — | — | — |
ETV5 | 8 | 28 | 15 | < .001 | 0.09-10.9 | — | — | — | — |
TNFSF10 | 6 | 26 | 11 | .13 | — | — | — | > 9.63 | 0.28 |
Absolute values of the log-transformed normalized data were used to test for the existence of cutoff values in expression to discriminate tumor from control samples by overexpression or underexpression. If the corresponding corrected P value of the maximally selected χ2 statistic was smaller than or equal to .01 we assumed that a cutoff gene expression existed. In these cases the cutoff was estimated by the value corresponding to the maximum χ2 statistic, and a one-sided 99% confidence interval of the cutoff value was computed using 1000 bootstrap samples. The upper boundary of this confidence interval then defined limits for overexpression or underexpression on the log-scale. By back-transformation into the original data scale, these limits described a normal range for gene expression with respect to the observed gene expression in tumors.
For each gene, univariate CART analysis was performed resulting in decision trees used to discriminate the three samples—B-CLL, MCL, and control. We used the entropy index to grow the trees and the cross-validated misclassification rate to prune them. Overall error rates are presented, together with the decision regions of the trees. Error rates and regions are not shown if the tree could not be validated by 10-fold cross-validation.
C indicates control; B, B-CLL; and M, MCL.