One traditional way to gauge whether the findings for a particular microarray analysis are significant is to establish whether statistical significance is present. Traditionally, this involves the use of particular algorithms or formulas to determine the likelihood that that particular finding was governed simply by random chance. For example, in traditional clinical trials, when a study finding is noted to have a statistical significance, or p value of <.05, the assumption is that the likelihood of the findings in that particular analysis being due to random chance is less than or equal to 5 chances in 100.
These analyses are problematic when applied to functional genomic data sets. Suppose we have 10 tissue samples from patients with disease and 10 samples from patients without the disease. For a single gene, we can estimate how likely it is that the observed difference in the means of the expression of that particular gene in the two sets of samples could be attributable to chance. A simple lookup in the probability density function for that statistical test (e.g., a t-test) will provide this estimate. However, if the p value is found to be < .05 for that particular gene, then in a gene expression microarray measuring the expression of 10,000 genes, this would mean that we would expect to find the observed difference between the mean expression values across the two sets of 10 samples for 500 genes solely due to chance. Determining how to obtain the significance threshold for all the different bioinformatics techniques we have described above is an active area of research but has yet to make its way into the mainstream of published microarray analyses. Instead, pragmatic tests of the robustness of the results found have been adopted of the form described below.
Was this article helpful?