Noise is an informal term widely used in microarray data analyses and experiments to refer to:
• N1. Physical effects and parameters extraneous to the aspect of a system one is investigating, which have entered into the system or the measurement device that quantifies this aspect, and which, if detectable, would lead to,
• N2. Experimental outcomes which contradict a priori known facts, or more specifically and commonly,variations in experimental outcomes that should be "identical."
Strictly, N1 and N2 are origin (or cause) and effect respectively, i.e., N1, if detectable by the measurement device or microarray, is manifested to the laboratory observer as N2. Most studies do not distinguish the former from the latter in their use of the term noise. Typically, the presence of noise in a system is deduced from comparing an actual measurement outcome of the system against another comparable reference quantity or a known and confirmed fact. For instance, if we use a ruler to measure the length of a pencil twice and obtain slightly different readings each time, then we know that there must be "noise" in our measurement activities because the indisputable a priori fact is that we are measuring one and only one pencil, with a single fixed length. Here, N1 might include parallax error and N2 is the different recorded lengths. Recalling our prior discussion on replicate experiments, the range of parameters that affect a biological system study is vast and not completely controllable so that, in general, N1 may encompass any combination of conditions that are present at the time of a microarray assay, whether they are inherent in the measurement device or in the biological system.
For partly cultural reasons, there is a lack of appreciation for the challenge presented by noise in microarray expression data analysis. Historically, genomicists were fortunate to have had their first and successful foray into understanding patterns in the genome to have occurred at the level of the DNA sequence. DNA sequence components, being drawn from an alphabet of 4 characters (A, T, C, G), are unambiguous—and furthermore, when one assumes that these sequences code for function deterministically, one finds that the representation of information in a DNA sequence is essentially digital, or discrete. By contrast, the measure of mRNA species levels, and by extension the expression levels of associated genes, is intrinsically analog. Because of this, there is more room for measurement error, and a greater and generally unfavorable dependency on the measurement technology. RNA level is a continuous quantity and its measurement is no different from the measurement of any other biochemical processes or products, such as the rate of lactic acid oxidation in muscle or endorphin levels in the bloodstream. Specifically, in the context of microarrays, each chip measures microquantities of upwards of 105 different chemical species at a time on a hybridization medium with an area of less than 10" m2. Bioinformaticians who make the transition from sequential to expression genomics are often surprised by the massive increase in the ambiguity and noise levels in the genomic data.
It is worth noting that noise is not entirely a negative phenomenon. In some situations, it could inform the genomicist to reconsider a priori assumptions, postulates, or questions for a system which may not be entirely correct or well posed. For instance, referring to the earlier thought experiments for replicating the measurement of blood glucose levels in mice in section 3.2.1, while differences (noise) in the repeat measurements E5 point to measurement errors which are unquestionably undesirable, differences in E1 and E4 (that are not measurement errors) might imply the existence of significant physiological variations between M and M2 that may suggest that not all mouse glucose levels respond similarly to intravenous insulin and this knowledge inevitably refines or changes the study question and objective.
Was this article helpful?