## Permutation testing

With the large number of comparisons being made, a valid criticism should be raised that perhaps high correlation coefficients or extreme values in other dissimilarity measures are found due to chance. One way to counter these arguments is to estimate how often random chance would generate correlation coefficients of the magnitude obtained in the original analysis. Typically, this is done through use of permutation testing.

One can take the entire array of cases and features (i.e., samples and genes, respectively) and randomly shuffle the values in each column of features, thus breaking the links between the features. The mean and standard deviation expression level for each gene would remain the same. One can then repeat the entire analysis (e.g., dendrogram or relevance network) that one is performing 10 or 100 times or more to determine the strongest dissimilarity measure calculated.

For example, let us assume we are analyzing a microarray data set with 60 cancers and 2000 genes measured in each of those cancers. We want to perform a dendrogram analysis on this data set. As before, we would start by calculating the comprehensive pairwise comparisons. Within this large set of comparisons, are high correlation coefficients likely to be spuriously generated? One way to answer this is to take each of the genes and randomly shuffle the gene expression measurements among the cancers and then repeat the entire comprehensive pairwise calculations. This permutation and calculation of the measure-triangle is repeated several times. For each iteration, the distribution of values is stored. After all the iterations we will have a distribution of correlation coefficients from the original data set as well as several or an averaged distribution of correlation coefficients in the permuted and randomly shuffled sets. This is shown in figure 4.15.

Distribution of r2

Distribution of r2

Figure 4.15: Distribution of in the original versus permuted gene expression data set. The distribution of calculated using an original gene expression data set is shown with solid circles. For each gene, expression measurements were independently randomly shuffled 100 times. The average distribution of is shown with error bars covering 2 Sd. In this example, random permutation was unable to create an association with ^ > °-80 or < .85. (From Butte et al. [39].) If there is structure in the expression data set that is more robust than what would be expected by chance, the distribution of correlation coefficients from the original analysis should be wider than the distributions of the randomly permuted sets. We view this difference in area-under-the-curve between the two distributions as a measure of the signal-to-noise in the analysis. If the distribution of correlation coefficients in the randomly permuted data sets is as wide as the distribution correlation coefficients from the original unshuffled data set, it is a clear indication that the high correlation coefficients being seen could very well be due to random chance.

Figure 4.15: Distribution of in the original versus permuted gene expression data set. The distribution of calculated using an original gene expression data set is shown with solid circles. For each gene, expression measurements were independently randomly shuffled 100 times. The average distribution of is shown with error bars covering 2 Sd. In this example, random permutation was unable to create an association with ^ > °-80 or < .85. (From Butte et al. [39].) If there is structure in the expression data set that is more robust than what would be expected by chance, the distribution of correlation coefficients from the original analysis should be wider than the distributions of the randomly permuted sets. We view this difference in area-under-the-curve between the two distributions as a measure of the signal-to-noise in the analysis. If the distribution of correlation coefficients in the randomly permuted data sets is as wide as the distribution correlation coefficients from the original unshuffled data set, it is a clear indication that the high correlation coefficients being seen could very well be due to random chance.

This permutation testing does not substitute for a p value, as traditionally used in tests for statistical significance, but it is one generally accepted way (in the functional genomics literature) to test a novel clustering algorithm and its application to determine whether findings are otherwise significant.