Data Analysis

The analysis of expression data is of primary importance but beyond the scope of this chapter; however, a few critical issues are highlighted here (Quackenbush 2001). Most studies are designed to identify important genes that participate in a critical process or to classify samples into previously unrecognized biologically or clinically important subsets. This can be accomplished by reducing the expansive data sets using statistical or metric thresholds to identify genes whose expression varies significantly between samples of interest. A number of methods can then be used to identify samples or genes with the desired properties. Un-supervised algorithms search the data with few user imposed restrictions in an effort to recognize molecular substructure and identify previously unrecognized classes of genes or samples. Supervised methods apply prior knowledge, such as histology, phenotype, stage or outcome, and identify genes with statistically significant expression differences between groups; however, because of the large volume of gene expression data and relatively small number of samples, some associations are likely due to chance. For this reason it is imperative that the significance of correlations be established by testing independent sample sets.

0 0

Post a comment