Unsupervised cluster analysis of CD4+ TH populations. (A) Clustered heat map showing samples (columns) and T-cell subsets (rows). For each subset, the frequency data (parent population is in braces) is expressed as the number of SDs above (yellow) and below (cyan) the mean across all samples, after log-transformation. Samples are clustered (within each tissue type) on the basis of Euclidean distance; subsets clustered by correlation coefficient. Dendrograms were constructed with complete linkage. (B) Same as panel A, but all samples are clustered together to show how the samples group independent of the tissue type. (C) PC analysis of dataset. The first 2 PCs of the 21 original variables (normalized cell subset frequencies) are shown. Each symbol represents a NLN (green circle), FL (purple triangle), or RLN (red star) sample. Most of the variance in the data is explained by the first 3 PCs (see pareto plot, inset, which shows that the first 3 PC account for ∼ 80% of the overall variance), indicating that several of the cell subsets are correlated with each other. Vectors emanating out from the center show how the original variables (cell subsets) are related to the first 2 PCs. The horizontal and vertical projections of a vector indicate the relative contributions of that variable to the first and second PC, respectively. The color of the cell subset label for each vector indicates the parent population used to compute the frequencies (see the key, top right corner of plot).