Expression Data Analysis

The use of DNA microarrays generates a large number of individual data points, which must then be analyzed and archived. Optimal analysis requires expertise in statistics and bioinformatics, and the time and effort required to progress from the initial data acquisition to the extraction of relevant biological information is substantial. Some of the key aspects involved include image processing, data normalization, differential expression analysis, and database management. Each of the areas is complex, and a comprehensive discussion is beyond the scope of this chapter but can be explored further in several references.11,12

The process begins with image acquisition and analysis, which is largely dependent on the technology platform used. Data normalization issues are specific to the

Figure 29-3. Hierarchical clustering analysis. Tumors can be divided based on similarities in gene expression patterns. Cells with log ratios of 0 are colored black, increasingly positive log ratios with reds of increasing intensity, and increasingly negative log ratios with greens of increasing intensity. Each gene is represented by a single row and each tumor sample by a single column.

Figure 29-3. Hierarchical clustering analysis. Tumors can be divided based on similarities in gene expression patterns. Cells with log ratios of 0 are colored black, increasingly positive log ratios with reds of increasing intensity, and increasingly negative log ratios with greens of increasing intensity. Each gene is represented by a single row and each tumor sample by a single column.

microarray platform used for analysis, whether cDNA or oligonucleotide.

Investigators have adopted two general strategies for interpreting microarray data, commonly referred to as "unsupervised" and "supervised," to establish the degree of similarity between the gene expression levels in multiple samples. Unsupervised analysis does not take advantage of previous knowledge about samples in order to avoid any a priori relational assumption about function, structure, or any other experimental variable. The purpose is to identify relationships between samples based on similarities in gene expression. The most widely used of these techniques is hierarchical clustering, the object of which is to group together genes or samples with similar properties. The results of this analysis are typically displayed as a dendrogram representing the relatedness of the genes expressed within samples or of the samples themselves (Figure 29-3). The genes are commonly represented by colors, the intensity of which is proportional to the level of gene expression. Eisen et al.13 developed a widely used hierarchical clustering and visualization software package called Cluster and TreeView, which can be used for analysis and display of microarray data.

Supervised methods, developed in statistics, are designed to classify samples according to key properties and can be used to test the strength of newly discovered sample relationships. This method also is used to look for statistically significant differences between two or more groups of samples. These methods rely on prior assumptions about the samples, such as the grade or stage of the tumors.

A typical microarray experiment yields expression data for thousands of genes from a relatively small number of samples, and gene-class correlations, therefore, can be revealed by chance alone. This issue can be addressed by collecting more samples from each class studied, but this is often difficult with clinical cancer specimens. Another approach is to perform exploratory data analysis on an initial data set and apply the findings to an independent test set. Findings confirmed in this fashion are less likely to be a result of chance alone. Permutation testing, which involves randomly permuting class labels and determining gene-class correlation, also has been used to determine statistical significance. Observed gene-class correlations that are stronger than those seen in permuted data are considered statistically significant.

Procedures to assess and maintain sample quality and calibrate scanning instrumentation to assure integrity of the data sets have not been standardized. Microarray technology is evolving rapidly, and end users are only beginning to learn how to interpret changes in largely unfamiliar study endpoints. Therefore,proscriptive guidelines for data interpretation also are not standardized. Establishment of such guidelines may be detrimental because it may limit exploratory research and the advancement of the science. Once we have a better understanding of study design, an accurate picture of the limitations of the technology, and improved understanding of data interpretation, such common guidelines may be possible.

0 0

Post a comment