## [2 flfoiyjyJW2

and has values from -1 to +1. A value of r = +1 demonstrates a perfect correlation between x and y. Perfect negative correlation is denoted by r — — 1. For calibration plots in analytical chemistry, using Beer's law data, for example, we generally would like to have r > 0.99. For this type of calibration data, values of r < 0.90 would usually suggest an unacceptably poor correlation between y and x.

We now examine a typical linear regression analysis of Beer's law data. The model used is eq. (2.2):

with b] = el, where path length I is 1 cm, and e is the molar absorptivity in L mor'cm The data represent a series of absorbance measurements Aj over a range of micromolar concentrations C, of an absorber. The results of analysis of these data by the BASIC program for linear least squares in Appendix I are given in Table 2.2 This program makes the standard assumptions that random errors in y are independent of the value of y(meas) and that there are no errors in x. This corresponds to an unweighted regression, with Wj = 1 in the error sum 5 (eq. (2.10)).

Least squares analysis of the experimental data in the first two columns of Table 2.2 gave a slope of 9.94 ± 0.05 X 10 3. Because the path length is 1 cm and the concentration units are y,M, we multiply by 106 to convert the slope to £ = 9,940 L mor'cm"1 for the absorber. The intercept is 1.24 ± 1.51 X 10~3. The standard deviation of the intercept is larger than the intercept itself, suggesting that it is not significantly different from zero. A Student's t-test comparing the intercept to zero can be done to confirm this conclusion statistically [2],

Note that the correlation coefficient in Table 2.2 is very close to one. This confirms an excellent correlation of y with x. Because of the assumptions we made about the distribution of errors in y and the lack of errors in x, the

Table 2.2 Linear Regression Analysis of Beer's Law Data xi (pM)

Diff./SDR'

0.01118 0.05096 0.10068 0.20013 0.29957 0.39901 0.49846

-1.84E-04 -1.96E-03 1.32E-03 -1.13E-03 4.43E-03 -1.01E-03 -1.45E-03

Slope = 9.944367E-03

Standard deviation of the slope = 5.379905E-05 Intercept = 1.239661E-03

Standard deviation of the intercept = 1.51158E-03 Correlation coefficient (r) = 0.9999269 " Standard deviation of the regression = 2.438353E-03

standard deviation of the regression (SDR) is expressed in the units of y. To interpret this statistic in terms of goodness of fit, we compare it to the standard error (ey) in measuring y. In this case, ey is the standard error in A. Suppose we use a spectrophotometer that has an error in the absorbance eA = ±0.003, which is independent of the value of A. (This error would not be very good for a modern spectrophotometer!) We have SDR = 0.0024 from Table 2.2. Therefore, SDR < eA, and we consider this as support for a good fit of the model to the data.

Although summary statistics such as the correlation coefficient and SDR are useful indicators of how well a model fits a particular set of data, they suffer from the limitation of being summaries. We need, in addition, to have methods that test each data point for adherence to the model. We recommend as a general practice the construction of graphs for further evaluation of goodness of fit. The first graph to examine is a plot of the experimental data along with the calculated line (Figure 2.1(a)). We see that good agreement of the data in Table 2.2 with the straight line model is confirmed by this graph. A very close inspection of this type of plot is often needed to detect systematic deviations of data points from the model.

A second highly recommended graph is called a deviation plot or residual plot. This graph provides a sensitive test of the agreement of individual points with the model. The residual plot has [yy(meas) - y;(calc)]/SDR on the vertical axis plotted against the independent variable. Alternatively, if data are equally spaced on the x axis, the data point number can be plotted on the horizontal axis [1]. The quantities [y;(meas) - y,(caIc)]/SDR SDR are called the residuals or deviations. They are sometimes given the symbols dev;. The dev; are simply the differences of each experimental w o c

CO V

CO V

Figure 2.1 Graphical output corresponding to linear regression analysis results in Table 2.2: (a) points are experimental data, line computed from regression analysis; (b) random deviation plot.

Figure 2.1 Graphical output corresponding to linear regression analysis results in Table 2.2: (a) points are experimental data, line computed from regression analysis; (b) random deviation plot.

data point from the calculated regression line normalized by dividing by the standard deviation of the regression. Division by SDR allows us to plot all deviation plots on similar y-axis scales, which are now scaled to the standard deviation in y. This facilitates comparisons of different sets of experimental data.

In the example from Table 2.2, the residual plot (Figure 2.1(b)) has points randomly scattered around the horizontal line representing devy = 0. This random scatter indicates that the model provides a good fit to the data within the confines of the signal to noise ratio of the measurements. This type of plot is a highly sensitive indicator of goodness of fit.

So far, we have discussed only data that agree very well with the chosen model. We now explore a second set of data that are fit less well by the linear Beer's law model. The data in Table 2.3 gave a slope of

 xi _yy(meas) _y,(calc) Difference (MM) (A) (A) 3>y(meas) — j^(calc) Diff./SDR" 1.0 0.021 0.02953 -8.54E-03 -1.129 5.0 0.067 0.06928 -2.28E-03 -0.300 10.0 0.126 0.11895 7.05E-03 0.929 20.0 0.220 0.21829 1.71E-03 0.225 30.0 0.325 0.31764 7.36E-03 0.970 40.0 0.421 0.41698 4.02E-03 0.530 50.0 0.507 0.51633 -9.32E-03 -1.227

Slope = 9.934478E-03

Standard deviation of the slope = 1.676636E-04 Intercept = 1.960314E-02

Standard deviation of the intercept = 4.710805E-03 Correlation coefficient (r) = 0.9992887 " Standard deviation of the regression = 7.590183E-03

Slope = 9.934478E-03

Standard deviation of the slope = 1.676636E-04 Intercept = 1.960314E-02

Standard deviation of the intercept = 4.710805E-03 Correlation coefficient (r) = 0.9992887 " Standard deviation of the regression = 7.590183E-03

9.93 ± 0.16 X 10~3 when fit to the straight line model in eq. (2.2). As before, this gives s = 9,930 L mol~ 'cm-1 for the absorber. The intercept is 1.96 ± 0.47 X 10~2, which in this case suggests that there is a real positive intercept. The correlation coefficient is 0.9993, which might lead us to believe that the fit is good. However, one indication that there may be a problem with the fit of the model to these data is that SDR is 0.008, which is greater than our estimate of 0.003 for the error in A. Another subtle indicator of a poor fit is that the deviations in the last column of Table 2.3 seem to follow a trend; that is, two negative residuals are followed by four positive ones then a final negative value.

Plots of the data in Table 2.3 can be used to support more solid conclusions about a rather poor fit to the model. Careful inspection of the calibration graph (Figure 2.2(a)) reveals that the first and last data points are slightly below the calibration line, while points 3-6 are slightly above the calibration line. The residual plot (Figure 2.2(b)) provides the same information in a clearer format. We see that the dev; are arranged roughly in the pattern of an upside down parabola. This evident pattern in the residual plot, or deviation pattern, indicates that the model does not fit the experimental data well. There can be a number of reasons for this, and they will be discussed later.

Table 2.3 and Figure 2.2(b) illustrate a case where summary statistics provide somewhat confusing evidence concerning goodness of fit. The value of r suggests a good correlation between y and x, but RSD is somewhat larger than the errors in y. However, small systematic deviations of the data from the model are clearly apparent from the residual plot (Figure

Figure 2.2 Graphical output corresponding to linear regression analysis results in Table 2.3: (a) points are experimental data, line computed from regression analysis; (b) nonrandom deviation plot.

Figure 2.2 Graphical output corresponding to linear regression analysis results in Table 2.3: (a) points are experimental data, line computed from regression analysis; (b) nonrandom deviation plot.

2.2(b)). From consideration of all these criteria, we conclude that the model fits the data poorly. The residual plot is a major tool is establishing the poor quality of the fit. We will return to this theme often in later discussions of nonlinear regression analysis.

Residual plots can also reveal trends in the errors in yj. For example, the plot in Figure 2.3 shows nearly random scatter about dev, = 0, but the envelope of positive and negative deviations increases with increasing x. This type of deviation plot suggests that the original assumption about the constant variance in y is incorrect [3]. The plot suggests that the errors in y depend on the magnitude of y, because the size of the deviations of y from the model increase with x. As discussed in Sections A.3 and B.5, the Wj = 0 assumption cannot be used in such cases, and the regression analysis needs to be weighted appropriately.

Figure 2.3 Deviation plot characteristic for an unweighted regression analysis when the errors in y depend on the size of y.

Figure 2.3 Deviation plot characteristic for an unweighted regression analysis when the errors in y depend on the size of y.

0 0