## Comparison among Many Means Analysis of Variance

Analysis of variance (ANOVA) allows comparison among more than two sample means. One-way ANOVA deals with a single categorical independent variable (or factor). Factorial ANOVA deals with multiple factors in many different configurations.

No doubt, on the occasions when you sat in front of your television set for an evening, it must have occurred to you to ask whether there really was any difference among the products advertised on the commercials. Is Cottonbelle bathroom tissue really softer? Do floors shine better with new clear Smear? Does Driptame really stop postnasal drip? If you set out to do the experiment, one problem might be to pick two products to compare because in the case of bathroom tissue, dirty floors, and runny noses, there are many brand names from which to choose. The real question is not so much "Is Brand A better than Brand B?" but rather "Is there any measurable difference at all among the brands?" So instead of a single comparison between two groups, what we are after is some overall comparison among possibly many groups.

Let's be a bit more specific. Suppose, as a family physician, you are interested in determining whether there is a significant difference in pain-relieving characteristics among the many acetylsalicylic acid-based over-the-counter medications. A visit to the local drugstore convinces you to include six different medications in your study: five brand-name drugs and one generic drug. A first comparison that you might wish to make would be between the brandname drugs and the generic drug (ie, five possible comparisons). But you also might wish to compare Brand A with Brand B, A with C, A with D, and so forth. If you work it out, there are 15 possible comparisons. If there were eight drugs, there would be 28 comparisons; 10 drugs, 45 comparisons; and so on. The rub is that 1 out of 20 comparisons will be significant by chance alone, at the 0.05 level, so pretty soon you can no longer tell the real differences from the chance differences. (Actually, with 15 comparisons, the likelihood that at least one comparison will be significant is already approximately 54%, thanks to the arcane laws of probability). The use of multiple t tests to do two-way comparisons is inappropriate because the process leads to a loss of any interpretable level of significance. What we need is a statistical method that permits us to make a statement about overall differences among drugs, following which we could seek out where the differences lie. Our null hypothesis (H0) and alternative hypothesis (H1) take the following forms:

H0: All the means are equal.

H1: Not all the means are equal.

We would proceed to assemble some patients, randomly allocate them to the six treatment groups, administer the various acetylsalicylic acid (ASA) preparations (each delivered in a plain brown wrapper), and then ask the patients to rate pain on a subjective scale from 0 = no pain to 15 = excruciating pain. The results of the experiment are shown in Table 5-1.

Five patients are assigned to each of the six groups. Each patient makes some pain rating (eg, 5.0 is the rating of patient 1 in the drug A group). The mean score in each group is obtained by averaging these ratings (eg, 7.0 is the mean rating in group A). Finally, we can obtain an overall mean, 8.0, from averaging all 30 ratings.

Now, if we want to know whether drug A stood out from the crowd, the first step is to find the difference between the mean pain score of drug A and the overall mean. Similarly, any difference between one drug and the rest can be detected by examining the difference between its group mean and the grand mean.

Table 5-1

Pain Ratings for Patients in Six Acetylsalicylic Acid Groups

Table 5-1

Pain Ratings for Patients in Six Acetylsalicylic Acid Groups

Patient |
A |
B |
Drug C |
D |
E |
F |

1 |
5.0 |
6.0 |
7.0 |
10.0 |
5.0 |
9.0 |

2 |
6.0 |
8.0 |
8.0 |
11.0 |
8.0 |
8.0 |

3 |
7.0 |
7.0 |
9.0 |
13.0 |
6.0 |
7.0 |

4 |
8.0 |
9.0 |
11.0 |
12.0 |
4.0 |
5.0 |

5 |
9.0 |
10.0 |
10.0 |
9.0 |
7.0 |
6.0 |

Mean |
7.0 |
8.0 |
9.0 |
11.0 |
6.0 |
7.0 |

Overall mean = |
8.0 |

So to find the overall effect of the drugs, we take the differences between group means and the overall mean, square them (just as in the standard deviation) to get rid of the negative signs, and add them. The sum looks like the following:

(7 - 8)2 + (8 - 8)2 + (9 - 8)2 + (11 - 8)2 + (6 - 8)2 + (7 - 8)2 = 16.0

The sum is then multiplied by the number of subjects per group, 5, to obtain the Sum of Squares (between groups):

Sum of Squares (between) = sum of (group mean - grand mean)2 X N

The next question is how to get an estimate of the variability within the groups. This is done by calculating the sum of the squared differences between individual values and the mean value within each group because this captures individual variability between subjects. Because this is based on variation within groups, it is called the Sum of Squares (within groups):

Sum of Squares (within) = sum of (individual value - group mean)2 = (5 - 7)2 + (6 - 7)2 + (7 - 7) + (8 - 7)2 + (9 - 7)2 + + (6 - 7)2

There are 30 terms in this sum. The larger the Sum of Squares (between) relative to the Sum of Squares (within), the larger the difference between groups compared to the variation of individual values. However, the Sum of Squares (between groups) contains as many terms as there are groups, and the Sum of Squares (within groups) contains as many terms as there are individual data in all the groups. So the more groups, the larger the Sum of Squares (between), and the more data, the larger the Sum of Squares (within). Since what we're really trying to do is get the average variation between groups and compare it to the average variation within groups, it makes sense to divide the Sum of Squares (between) by the number of groups and divide the Sum of Squares (within) by the number of data.

Actually, at this point, a little more sleight of hand emerges. Statisticians start with the number of terms in the sum, then subtract the number of mean values that were calculated along the way. The result is called the degrees of freedom, for reasons that reside, believe it or not, in the theory of thermodynamics. Then, dividing the Sum of Squares by the degrees of freedom results in a new quantity called the Mean Square. Finally, the ratio of the two mean squares is a measure of the relative variation between groups of variation within groups and is called an F ratio.

F = mean square (between)/mean square (within)

The F test is something like a t test; that is, the bigger it is, the smaller the probability that the difference could occur by chance. And as with a t test, you have to look up the value of probability corresponding to the particular value in the back of a book (if the computer hasn't provided it for you). It has one important difference: the value depends on both the number of degrees of freedom (df in the numerator and the denominator, so that the table lists both the numerator and denominator df. And it has one major similarity: if you do an analysis of variance (ANOVA) with only two groups, where you would expect the ANOVA to arrive at the same conclusion as the equivalent t test, it does. In fact, the value of the F test is the square of the equivalent t test.

This analysis is usually presented in an analysis of variance table, which looks something like Table 5-2.

The first column of Table 5-2 lists the Sums of Squares over which we agonized. The second column indicates the degrees of freedom, roughly equal to the number of terms in the Sum of Squares. The third column, the Mean Square, derives from dividing the Sum of Squares by degrees of freedom. Finally, the F test is the ratio of the two Mean Squares and can be looked up in yet another table at the back of the book. Because the method involves the analysis of only a single factor (eg, drug brand), it is called a one-way ANOVA.

These relationships may be presented graphically (Figure 5-1). Individual data from each group are shown to be normally distributed around each group mean, and all groups are projected downward onto an overall distribution centered on the grand mean of 8.0. Now the mean square (between) is related to the average difference between individual group means and the grand mean, and the greater the differences between groups, the larger this quantity. The mean square (within) comes from the difference between individual data and their group mean and so estimates the variance of the individual distributions. Finally, the F ratio is the ratio of these two quantities, so the larger the difference between groups, in comparison to their individual variance, the larger the F ratio and the more significant (statistically speaking) the result.

Let's go back to the initial question we posed. We wondered if there were any difference at all between various pain relievers. Having used ANOVA to

Table 5-2

Analysis of Variance

Table 5-2

Analysis of Variance

Sum of |
Degrees of |
Mean | ||

Source |
Squares |
Freedom |
Square |
F |

Between groups |
80.0 |
5 |
16.0 |
6.4 |

Within groups |
60.0 |
24 |
2.5 |
— |

Total |
140.0 |
29 |
— |
— |

A >/ |
R --- | ||||

B / |
w\ | ||||

C |
u |
w\ | |||

D |
--^ij |
w\ | |||

E / |
R ^--- | ||||

F / | |||||

B = Between groups | |||||

Figure 5-1 Graphic interp |
etation of one-way ANOVA. | ||||

satisfy ourselves that there is a significant difference somewhere in all that propaganda, the next likely question is "where?" Note that if the ANOVA does not turn up any overall differences, the rule is STOP! DO NOT PASS GO, DO NOT COLLECT $200, AND DO NOT DO ANY MORE NUMBER |

CRUNCHING! But supposing the F value was significant, there are a number of procedures, called "post hoc comparisons," that can be used to find out where the significant differences lie. The use of a t test, even with a significant ANOVA, is still forbidden ground when there are many groups.

## Post a comment