This problem of the comparability of expression measurements across generations of microarrays is likely to be with us for several years. Microarray technology is in such flux that a definitive platform that only suffers small incremental changes is not in the immediate future. Given that laboratories will be studying the same biological system for several years, and therefore over several generations of microarrays, the most prudent and conservative advice would be to:
• Decouple the hybridization schedule from tissue sampling and banking. Preferably all the hybridizations should be done within a few months, if not days. The effect of the different time of hybridization is often visible in subsequent analysis. For example, careful review of the clustering of the acute myelogenous leukemia (AML) and acute lymphocytic leukemia (ALL) data set of Golub et al.  reveals that the test and training sets automatically cluster separately. That is, the differences between the gene profiles in the test and training sets are of the same order as the differences between AML and ALL. As reported in  the only difference between the two sets was the date on which the hybridization was done. In addition to environmental and procedural variation, it is likely that each batch of microarrays was manufactured on a different date.
• Purchase or construct sufficient microarrays to last your entire study. Even if a newer and better generation of microarrays becomes available, the quality of the study is best served by staying with a single generation of microarrays. Otherwise, the investigator will have to wander into the uncertain territory of estimating consistency across generations of microarray technology for each gene interrogated by the microarray, as outlined above. Unfortunately, some studies necessarily are so long as to make the logistics of this tactic unrealistic.
• Maintain a reference RNA pool. This can be used to determine the consistency of measurements across microarray generations when all else fails. The selection of the reference pool should be informed by the need to maintain a large amount of reference RNA over years with minimal degradation. In addition, the pool should contain a distribution of transcript abundance that is appropriate for the experiments envisaged. Too much or too little of a transcript will cause skewed noise or variation profiles due to different degrees of cross-hybridization against different biological samples.
Was this article helpful?