We know then that the consistency of expression measurements for the overall set of probe sets across a generation of microarrays is poor. There also is a subset of probe sets for which consistency is high. To maximally salvage the data of the HuGeneFL arrays, then, the goal should be to determine which probe sets are members of that high-consistency subset. One reasonable heuristic is to pick a threshold correlation and and then pick only those probe sets that have at least n probes in common such that the correlation for n as obtained from the histogram of figure 7.7 is above the correlation threshold. Just what this threshold should be will be determined by a decision analytic procedure of the kind illustrated on page 2.9.
However, the above procedure will only provide a small subset of the genes represented by the various probe sets on each of the microarrays. There may be a substantially larger set of genes that are consistently reported across the generations of microarrays that might be "salvageable." If a particular laboratory is in possession of duplicate hybridizations across these generations, then the technique described in section 3.2.2 can be adapted to provide an estimate of which probe sets are the most consistent.
That is, if the identical RNA extract has been hybridized to two generations of microarrays, then a simple program can be written to determine for any measure of gene expression (e.g., the Cy3/Cy5 ratio, Average Difference, Log Average Ratio, Present vs. Absent Calls, etc.) the threshold (e.g., minimal Cy3/Cy5 ratio) above which the expression levels reported have an acceptable level of consistency across the microarray generations. These thresholds will not only vary with the particular microarray platform employed but also with the laboratory performing the hybridizations and the tissue types employed. Consequently, this procedure must be done for each laboratory and the results may not be safely generalizable across laboratories.
Was this article helpful?