The number of degrees of freedom in a regression analysis was defined previously as the number of data points (n) analyzed minus the number of parameters (p). It is intuitive that the number of degrees of freedom should be significantly larger than 1. Although it might be acceptable to use a 5-parameter model to analyze a set of 6 data points, better accuracy and precision of the parameters would probably be obtained with 10 or 20 data points. On the other hand, there is usually an upper limit on the number of degrees of freedom needed to provide parameters of the required accuracy and precision. This can be tested for a given analysis by using data for a standard system with known parameters or theoretical data with a realistic amount of error added to it.

Some researchers in the physical sciences may hold the incorrect opinion that an increase in the number of parameters will always lead to a better fit of the model to the data. Solid arguments against this simplistic view have been presented in several review articles [1,6]. Examples were presented to show that models of the wrong form with large numbers of parameters could not adequately fit data that was fit quite well by the correct model having fewer parameters.

An illustration of such an example is given in Figure 3.8, which graphically presents the results of a fit of two different polynomials onto a sigmoid curve generated with the equation y = 1/{1 + exp[(l - x)/0.0257]} (3.19)

to which has been added 0.5% normally distributed absolute noise. This model is the same as that for steady state voltammetry in eq. (3.4). The

Figure 3.8 Plots of best fits of regression analysis of polynomial models with five and six parameters onto data representing a sigmoid curve described by y = 1/{1 + exp[(l - x)l 0.0257]}, with 0.5% normally distributed absolute noise (solid line, 24 data points evenly spaced between the two solid vertical bars were analyzed). Polynom(5): y = 14.2 - 25.8* + 2.60;t2 + 9.11*3 + 0.354x4; Polynom(6): y = 13.6 - 25.4x + 2.48x2 + 9.11x3 + 0.432*4 + 0.233/x2.

Figure 3.8 Plots of best fits of regression analysis of polynomial models with five and six parameters onto data representing a sigmoid curve described by y = 1/{1 + exp[(l - x)l 0.0257]}, with 0.5% normally distributed absolute noise (solid line, 24 data points evenly spaced between the two solid vertical bars were analyzed). Polynom(5): y = 14.2 - 25.8* + 2.60;t2 + 9.11*3 + 0.354x4; Polynom(6): y = 13.6 - 25.4x + 2.48x2 + 9.11x3 + 0.432*4 + 0.233/x2.

Table 3.13 Results of Regression Analyses of Sigmoidal Data in Figure 3.8

Regression equation y = l/{l + exp[(l - x)/0.0257]} y = 14.2 - 25.8a" + 2.60jc2 y = 14.2 - 25.8a- + 2.60a2 + 9.11a3 v = 14.2 - 25.8A + 2.60a:2 + 9.1U3 + 0.354a4 y = 13.6 - 25.4a + 2.48a2 + 9.11*3 + 0.432a4 + 0.233/*2

" Standard deviation of the regression.

No. Deviation parameters 102 SD" plot

3 0.61 Random

3 3.9 Nonrandom

4 2.3 Nonrandom

6 2.3 Nonrandom dashed lines in Figure 3.8 represent the best fits of five- and six-parameter polynomial equations onto the data. It is obvious that the fits of these polynomial equations are rather dismal. They simply do not fit the data. The model is incorrect.

The numerical results of these analyses (Table 3.13) show that the correct equation always gives the best standard deviation, regardless of the number of parameters in the incorrect polynomial equations. The four-parameter polynomial equation gives a better standard deviation than the three-parameter polynomial. However, the addition of further parameters does not improve the fit, as seen by the lack of improvement of the standard deviation for polynomials with four, five, and six parameters. Simply adding more parameters to the incorrect polynomial model does not provide a better fit!

The rather frivolous but often repeated statement that "with enough parameters you can fit an elephant" is exposed here as blatantly incorrect. If this statement were assumed to be correct, the results in Figure 3.8 and Table 3.13 would provide definitive, much needed proof that a sigmoid curve is not an elephant [6].

Of course, as implied by the discussion in Section C.l, inclusion of addition parameters in a model that already gives a relatively good fit has a tendency to lower the standard deviation a bit and introduce more scatter into the deviation plot. In such cases, the extra sum of squares test (eq. (3.15)) can be used to see if the apparent improvement in the fit is statistically significant.

Was this article helpful?

## Post a comment