Gene expression studies can be roughly divided in two categories: situations in which samples are used to provide information on genes, and situations in which the genes are used to provide information on the samples. The first approach permits an integrated approach to biology, in which genetic regulation can be examined within the context of circuitry, a sophisticated network in which the interplay of positive and negative signals ultimately directs cellular fate. This approach is already revealing great elegance and efficiency in biological design. The second approach is revolutionising molecular medicine at the level of classification of disease, diagnosis and prognostic prediction, and in a number of industrial and pharmaceutical applications.
Biologists are discovering that genes involved in common processes are often co-expressed. These include genes required for nutrition and stress responses, and genes whose products encode components of metabolic pathways. Similarly, the genes encoding subunits of several multi-subunit complexes such as die ribosome, the proteosome and the nucleosome are also coordinately expressed (Alon ctal., 1999; Brown and Botstein, 1999; Causton etal., 2001; Eiscn ctal., 1998; Hughes ctal., 2000; Lashkari ctal., 1997). In many cases, this is attributable to coordinate regulation by common factors. 'Waves' of co-expressed temporally regulated genes have also been observed during die development of the rat spinal cord (Wen et al., 1998). Coordinate regulation of genes is extremely efficient, as all the components of multi-subunit complexes or factors required for complex processes are usually required in a defined ratio at the same time, whenever they are needed.
The gene expression profile, or signature, can be thought of as a precise molecular definition of the cell in a specific state (Young, 2000). Accurate, quantitative information on the transcriptional profile of biological samples is therefore of great utility. The expression profile is one of the few relatively accessible ways of describing a phenotype that can be used to characterise a wide variety of samples. Cellular phenotypes can be inferred from gene expression profiles, in part because defects in similar pathways or processes can be detected via their effects on the expression of similar groups of genes, and because agents that perturb these pathways also affect the same gene sets. A large reference collection of profiles against which gene expression data can be compared is therefore useful, but requires careful and accurate data generation, storage and description.
The 'compendium approach', in which large numbers of biological samples are profiled and pattern matching used to predict the function of previously un-characterised genes and putative drug targets, has been elegantly demonstrated using yeast (Gray et al., 1998; Hughes et al., 2000; Marton ct al., 1999). Similarly, databases integrating gene expression data from 60 pharmacologically characterised human cancer cell lines (NCI60, http://dtp.nci.nih.gov/) treated with 70,000 agents independently, or in combinations, have been used to link drug activity with its mode of action, to correlate expression levels ofin-dividual transcripts with mechanisms of drug sensitivity and resistance, and to examine the variation in gene expression patterns between individuals. The same dataset was also used to classify cell lines in relation to their tissue of origin and to predict drug chemosensitivity or resistance (Ross ct al., 2000; Scherf etal., 2000;Staunton etal.,2001; Weinstein ria/., 1997).
Gene expression has proved a highly robust 'reporter' of biological status for a wide range of samples under a variety of conditions, with the result that microarray technology is now utilised extensively within industry. Pharmaceutical companies use microarray technology at numerous stages of drug development, from high throughput screening of small molecules for identifying possible drugs, to drug target identification and assessment of toxicity.
Gene expression data have proven highly informative of disease state, particularly in the area of oncology, where accurate and early diagnosis, followed by appropriate treatment, can prove critical. Studies on clinical samples have shown that gene expression data can be used not only to distinguish between tumour types, define new (histologically indistinct) subtypes, and identify mis-classified cell lines, but also to predict prognostic outcomes (Alizadeh et al., 2000; Bittner etal., 2000; Golub etal., 1999; Perou etal., 1999; Shipp etal., 2002). This approach is particularly powerful in offering the promise of'personalised medicine', in which the specific underlying defect can be identified, the prognosis predicted, and treatment tailored to the genetic makeup of the individual and the specific defect in each patient, thus reducing the likelihood of unwanted side effects.
Was this article helpful?