Fig. 3.
Gene clusters in the first 2 principal component spaces.
Principal component analysis allowed us to present the multidimensional data (in this case, 4-dimensional data of each gene expression pattern) in a simple 2-dimensional graph. We derived the 4 principal components, which are a linear combination of the standardized expression intensities (zero mean and unit variance) at 0, 24, 48, and 72 hours. The first 2 principal components captured most of the variation of the data (approximately 85%). Therefore, the data can be displayed (with a minor loss of information) in a 2-dimensional graph. The first and second principal components, c1 and c2, are given by the linear combinations c1 = 0.747 · n1 − 0.11 · n2 − 0.656 · n3 + 0 · n4 andc2 = 0.278 · n1 + 0.353 · n2 + 0.233 · n3 − 0.863 · n4, where n1, n2, n3, and n4 are the rescaled and standardized expression levels at 0, 24, 48, and 72 hours, respectively. The axes legends c1 and c2 stand for the first 2 principal components. In this paper we used the Pearson correlation to measure the similarity of each gene with the idealized expression patterns, as opposed to the Euclidean distance we used in a previous work,19 because clusters were better separated using this measure. In both cases, we presented the data in the 2-dimensional space of the lowest principal components. The data had a tendency to be circularly distributed when we used the Pearson correlation as a distance measure.