No one can step into the same river twice. Heraclitus, c. 540-480 B.C.
Nature does not make leaps. G. W. Leibniz, 1646-1716
Consider the following five thought experiments that aim to compare the levels of blood glucose in 6-month-old mice, 1 hour before and 1 hour after an intravenous insulin treatment at noon.
• E1. A 6-month-old mouse M has its glucose levels measured and recorded by a glucometer G, 1 hour before and 1 hour after an intravenous insulin treatment at noon. Call these readings R(M, G, Monday, 11), R(M, G, Monday, 13), respectively.
• E2. A 6-month-old mouse M2, not the same mouse as M, is subject to the same experimental protocol as for M in E1 on the following day and here we call the readings R(M2, G, Tuesday, 11), R(M2, G, Tuesday, 13), respectively.
• E3. The same 6-month-old mouse M in E1 is subjected to the same experimental protocol the following day as it had undergone in E1 with glucometer readings R(M, G, Tuesday, 11) and R(M, G, Tuesday, 13), respectively.
• E4. Another 6-month-old mouse M2, not the same mouse as M, is subject to the same experimental protocol as for M in E1 simultaneously as M but using another glucometer G2 to obtain readings R(M2, G2, Monday, 11), R(M2, G2, Monday, 13), respectively.
• E5. During the time E1 was carried out, M has its glucose levels simultaneously measured and recorded by another glucometer G2 to produce readings R(M, G2, Monday, 11) and R(M, G2, Monday, 13).
Questions: Which of experiments E2 through E5 qualify as a replicate of E1? Which is "the best" replicate of E1? Observe that E2 replicates the experimental protocol, except for the biological variation between mice M and M2 and the difference in days. E3 replicates E1 in terms of experimental protocol and the use of the same mouse, except for the difference in days and perhaps more important, the initial condition (physiological) of mouse M on Tuesday could be measurably different following its Monday treatment. E1 and E4 are replicates up to experimental protocol and simultaneity in running both experiments, modulo the biological variation between M and M2, and the use of separate glucometers which may be calibrated differently. Finally, E1 and E5 are simply repeat measurements of the glucose levels of M following the same protocol but with different glucometers. Typically in the study above, the absolute values fi(mouse, glucometer, day, 11) and fi(mouse, glucometer, day, 13) are themselves not important; rather, it is the relative change (difference or fold) from fi(mouse, glucometer, day, 11) to fi(mouse, glucometer, day, 13) that is of interest.
The point of the illustration above is to demonstrate that the question above is not well-posed until we specify what we mean by a replicate. Clearly, there are different levels or gradations for deciding how one experimental setup replicates another. The way that one chooses to define a replicate experiment will be driven by, and context dependent upon the biological question that these experiments were designed to answer in the first place. Only after one has made this definition can one sensibly discuss (ir)reproducibility of the attendant experimental outcomes. The range of parameters that could potentially enter into, be manifested, and detectable in a complex biological system is arguably wider and less easily characterized or controlled than in systems in the physical sciences. In our example above, the day of the experiment or the individual physical condition of the mouse may additionally affect its blood glucose level. Without entering into the epistemologies of Bacon  or Popper , the experimentalist normally attempts to resolve this difficulty by designing appropriate control conditions for the experiments and hope that - to paraphrase Leibniz's aphorism that measurable processes in nature do not change abruptly - the biological processes are not dramatically different from one normal mouse to another. Specific to our thought experiments, if the experimentalist has decided to replicate E1 in the sense of E4, then she or he would reasonably expect the mice physiologies (and by extension, their blood glucose level in reaction to insulin) in E1 and E4 to not be radically different so that relative changes such as
A(M\ C) = R(M',G\ Monday, 13) - R(M',G', Monday, 11),
A(A/,C) = R(M,G, Monday, 13) - R(M,G, Monday, 11), are "close" to one another. A measure of similarity such as a metric is used to determine and quantify the "closeness" of "(M2, G2) to "(M, G), and thereby the reproducibility of the experimental results. Detailed discussions of measures of similarity are found in section 3.6. While it is highly unlikely in a real laboratory situation to find "(M2, G2) = "(M, G), the experimentalist might expect that with a weaker measure of similarity—such as "(M2, G2) "=" "(M, g) when both "(M2, G2) and "(M, G) havethe same sign—the experimental results are reproducible. Additionally, a statistical approachmay be incorporated into the idea of reproducibility. For instance, if we carried out 1000 experiments E1 with a different mouse each time and followed the same protocol, we may find that 990/1000 or 99% of the mice that were sampled exhibited a decrease in glucometer readings after treatment so that we can reasonably state that most — assuming a fair and random enough sampling — 6-month-old mice experience a drop in glucometer reading after an insulin intake at noon. Postulating (and checking) further properties about the data readout distribution, standard (non-)parametric statistical tools such as the Student t and Kruskal-Wallis tests may be used to determine the statistical significance of this conclusion, specifically the average drop in blood glucose reading.
Was this article helpful?