## Data Processing

As mentioned, systematic errors (or bias) are reduced as far as possible by using an appropriate experimental design; however, further reduction is usually necessary using statistical processes. First, all dubiously low measurements - normally those lower than the background plus two standard deviations - must be either excluded or set at a base level above the background. Secondly, the data must be normalized, which is an extremely important step in ensuring the validity of the array readings. Normalization is a process that transforms microarray fluorescence intensities to account for systematic errors. These can be divided into "within array" and "between array" normalizations. Systematic errors usually accounted for include, for example, the differences in dye incorporation efficiencies (Cy3 is much smaller and more easily incorporated into cDNA than Cy5), the quantity of RNA hybridized to the array, spot size, probe length, spacial effects etc.

Normalization can be done in several ways, and both statistical methods and strategies based on internal controls have been developed. Less commonly used are those based on internal controls and invariant genes. This is because many genes previously thought to be invariant have been shown actually to vary quite significantly, and also because data from a type 1 experiment still suffers from a dye bias. Type 2 experiments using gDNA are a promising method of overcoming these limitations.

A nonnormalized data set will exhibit a bias towards spots with a strong fluorescent intensity, whereas a normalized data set should center on a log ratio (see below) of zero and be independent of spot intensity. Most often used for within-array normalizations like this are statistical, globally applied methods. One statistical method of transformation commonly used to calibrate microarray data is LOWESS (locally weighted scatterplot smoothing [36]), which smooths scatter-plots of log ratios in a weighted, least-squares fashion to remove intensity-dependent bias. Another is MAD (median absolute deviation), which provides a robust estimate of standard deviation. For further information on microarray normalization methods and the theory underlying them, many reviews exist, but Refs. [37, 38] are suggested as excellent starting points for a basic discussion, or Ref. [39] for more in-depth coverage.

Once the data from technical and biological replicates are compiled and the microarray data is ready for analysis, the fold difference in mRNA levels is calculated. This fold change in expression between the two samples hybridized to the array is represented using a ratio ofthe fluorescence intensities from the two signals. A problem with the ratio generated by microarray analysis is that it is not symmetrical; a gene induced two-fold is given a ratio of 2 whereas a gene repressed two-fold is given a ratio of-0.5. Therefore, to make ratios from both up- and downregulated genes symmetrical, the data is transformed by its logarithm to the base 2 (log2). This normalizes the ratios to the value of 1, meaning that a two-fold increase is represented as a ratio (log2) of 1, a two-fold decrease is represented as a ratio (log2) of -1, and a gene with an unchanged expression level is given a ratio (log2) of 0.

34 | 2 Transcriptome Analysis: Towards a Comprehensive Understanding of Global Transcription Activity 2.5.3

## Post a comment