## B4 Example Regression Analysis for an Exponential Decay

Problem. The fluorescence intensity of a reactant is measured to obtain its concentration in millimoles vs. time for a suspected first-order decomposition reaction. The data are read from a file in the form of y = reactant concentration in mM and * = time in seconds. Assuming that errors in y are independent of y (i.e., errors are absolute), use nonlinear regression to see if first-order decay holds and to estimate the rate constant and the pre-exponential factor.

This example demonstrates how nonlinear regression analysis is used. The data were analyzed by using a program for nonlinear regression in the Mathcad environment. The program, data, graphs, and results are integrated together in this format, as shown in Box 2.2. We considered time to be free of error. Therefore, only the random errors in y are significant. Because errors in y are absolute, we minimize 5 in eq. (2.9) with w, = 1. The model is given in eq. (2.20), where b is assumed to be zero. There are two parameters: bi = y„ and b2 = k, the first-order rate constant in s These data were calculated from eq. (2.20) and normally distributed random noise was added to y. Thus, we know that the true values are bi = 14.0 mM and b2 = 15.0 s

We now describe the elements of the Mathcad calculation. The first equation reads the data from a file called CURDAT1. Data in Mathcad are read from a data file as a matrix. The following equations display this data matrix as M and define the data vectors Y and x for the program. These data vectors are «XI matrices, also called column vectors. Y contains all the y, and x contains all the xn respectively.

The number of data pairs (n) to be analyzed must be provided in the next section. Next, the initial values of the parameters b\ and b2 are guessed. These should be best guesses based on inspection of the data and can be adjusted by the user. These best guesses define the starting point of the regression analysis. The model function to be fit to the data, F(x, b2, b2), is provided in the next section.

The first graph is the result of a calculation of the model F(xn b\, b2) with the initial parameters, given by the dotted line, plotted together with

12.02

0.01

10.205

0.02

8.985

0.03

7.745

0.04

6.558

0.05

5.751

0.06

4.943

0.07

4.306

0.08

3.623

0.09

3.032

0.1

2.8

0.11

2.27

12.02 10.205 8.985 7.745 6.558 5.751 4.943 4.306 3.623 3.032 2.8 [ 2.27

0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12

Box 2,2 Nonlinear regression program using MINERR function (Mathcad). Data are read from an ASCII file as x,y pairs.

 bl := 16 b2 =28 Function to be fit to data: exponential decay (change F to any function you wish to fit) F(x,bl,b2) - bl exp( b2 x) GRAPH OF DATA + MODEL FOR INITIAL PARAMETERS 20 1 1 F(Xj,bl,b2) -V - 1 ■■-■■(■■■ 0 J SSE(bl,b2) : = ^](Yj-F(xj,bl,b2))2 j Given SSE(bl.b2)=() ! | -Minerr(bl,b2) \b2/ Values of the parameters: bl = 13.907 Sum of squares: b2 = 14.846 SSE(bl,b2) =0.0659 Root mean square error: i n-2

Box 2.2 (continues)

Box 2.2 (continues)

SD =0.0856

 Yi F(Xj,bl,b2) Xi Yj - F(Xj,bl,b2) SD 12.02 11.989 0.01 0.031 0.^63 10.205 10.335 0.02 -0.13 -1.516 8.985 8.909 0.03 0.076 0.893 7.745 7.68 0.04 0.065 0.762 6.558 6.62 0.05 -0.062 -0.724 5.751 5.707 0.06 0.044 0.511 4.943 4.92 0.07 0.023 0.273 4.306 4.241 0.08 0.065 0.756 3.623 3.656 0.09 -0.033 -0.386 3.032 3.151 0.1 -0.12 -1.399 2.8 2.717 0.11 0.084 0.978 2.27 2.342 0.12 -0.071 -0.834

Deviation Plot (Residuals):

Box 2.2 (continued)

the data points, yf. If agreement between the points and the line is not way out of line, the user can begin the regression by telling the program to calculate the Mathcad document. (An Automatic Calculate mode can also be chosen.) In many cases, sufficient initial agreement may require simply that both plots fall in the same coordinate space, as shown in this example. If agreement is unsatisfactory, a new set of initial parameters can be chosen to give a better initial agreement.

The next line in the program defines the error sum as the sum of squares of the errors, SSE(fc:, b2). The several lines following use the MINERR function of Mathcad, which employs the Marquardt-Levenberg algorithm to find the minimum in the error sum. (The reader is directed to the Mathcad manual for more details. See the end of this chapter for the source.) The remainder of the Mathcad program gives the final results, including parameter values, minimum error sum (sum of squares), and root mean square error. In cases such as this one, where the errors are absolute, the root mean square error is the same as the standard deviation of the regression.

The second graph in the document is a plot of the data along with the model computed with the final parameters. This is followed by lists of all input and computed data, and a final plot of the residuals; that is, [y;-(meas) - y,(calc)]/SD on the vertical axis plotted against the independent variable on the horizontal axis [1],

The quantities [_y; (meas) - _y; (calc)]/SD are sometimes given the symbol dev;. The dev, are the differences of each experimental data point from the calculated regression line divided by the standard deviation (SD) of the regression. Since the input data had randomly distributed errors, the residual plot shows a random scatter of points about the zero residual axis; that is, dev, = 0. As for linear regression, this type of plot is evidence that the model provides an acceptable fit to the data.

In our example, we chose a value of b2 rather far from the final convergence value, but the nonlinear regression analysis converged rapidly to the values of the parameters very close to the true values of bi = 14 mM and

= 15 s"1. The errors in the parameters are related to the random error added to the data. The standard deviation of the regression (SD) is about 0.7% of the largest y value. This is slightly less than the amount of random error added to the data, which was 1% of the largest y value. This is another indicator of an excellent fit; that is, SD < ey, as discussed for linear regression. For real experimental data, ey is the estimate of random error in the measured signal.

Therefore, we see that the goodness of fit criteria for nonlinear regression analyses are similar to those for linear regression. Ideally, the residual plot should be random, and we should be able to achieve SD < ey for a good fit. The example also illustrates another important practice, that of testing an unfamiliar analysis model with theoretical data having known parameters before proceeding to analysis of experimental data. This concept will be illustrated further in Chapter 3.

The Mathcad program used in the preceding example is general and can be used with any model. The user needs to enter only the model F(x, b\, ..., bk), to provide the initial guesses for the parameters, and indicate a suitable data file. Note that the structure of the data file is pairs of yh Xj, with one data point pair per line as in a typical ASCII file.