## Sources of Data and Background Contributions

We can envision at least two different sources of data that could be analyzed by nonlinear regression. One type of data is a typical response curve from a single instrumental experiment. This curve can be represented as a series of digital data points obtained as the response measured by the instrument vs. time or vs. an experimentally controlled variable. Some familiar examples are given in Table 3.1. The measured response is the dependent variable y;(meas). The respective independent variables Xj, are listed in Table 3.1.

A second type of data could involve a physical quantity derived from a series of experiments vs. a series of values of an experimentally controlled variable, which is different in each experiment. In this case, one data point results from each experiment. Some examples might be the absorbance at a fixed wavelength vs. the concentration of absorber in a series of different solutions or measured diffusion coefficients in solutions of an aggregating protein vs. the concentration of the protein.

We distinguish between these two types of data because it is necessary to account for the background in analyses of the data. Either the regression models themselves, or data manipulation prior to regression analyses, is required to account for all contributions to the measured signal. Thus, a generic expression for a single equation model (cf. Chapter 2, Section B.l) is y = signal of interest + background. (3.1)

For example, for absorbance vs. concentration data, the background might take the simple form of a constant offset in y; that is, a blank value

Table 3.1

Variables for Some Typical Response Curves

Experiment

Dependent variable

Independent variable

UV-Vis Spectroscopy Voltammetry

Fluorescence Spectroscopy Fluorescence decay Photoelectron spectroscopy Chromatography Mass spectrometry FT-IR spectroscopy

Absorbance Current

Signal intensity Signal intensity Detector counts Detector signal

Relative intensity Absorbance

Wavelength

Applied voltage

Wavelength or energy

Time

Energy

Time

Mass/charge Wave number of the absorbance in a solution of zero concentration. The diffusion coefficient data mentioned previously, if properly corrected for artifacts of the measurement, may have a zero background term. On the other hand, a full instrumental response curve may contain background drift, nonrandom noise, and other instrumental bias added to the signals from the chemical or physical events being studied. Thus, the background term in eq. (3.1) may depend on x. Instrumental contributions may include finite background offset, drift, instability, and other characteristics of the measuring system that yield a finite signal not characteristic of the sample. We will refer to these instrumental signatures collectively as background.

If background contributions to an instrumental response curve are reasonably constant with time or if their variations with time are highly reproducible, they can sometimes be subtracted from the data. A response curve for a blank not containing the analyte might be recorded separately and subtracted from the response curve of the sample. However, situations in which this approach is successful are limited and require a highly reproducible background that is identical for the blank and the sample.

An alternative to subtraction is to include specific terms in the model accounting for the background. This approach assumes that the background can be described accurately in mathematical form. In the best cases, where signal to noise ratio is large and the background drifts slowly, a simple linear background term may be successful [1, 2].

Suppose a measured exponential decay response has a background that is constant with time. An appropriate model is

Consider an example set of exponential decay data similar to that in Section 2.B.4, but with a constant background term b3 that is about 25% of the maximum ys. A fit of such data was done to the model without background:

 Parameter/statistic Initial value (¿3 = 0) Final value (b3 = 3) bi 16 14 13.91 16.32 b2 28 15 14.85 10.04 SD (ey = 0.15) 0.085 0.246 Deviation plot Random Nonrandom

" Data generated by using Eq. (3.2) with absolute random noise at 1% of the maximum y. One set of data was generated with i>3 = 3, and the other with ¿>3 = 0.

" Data generated by using Eq. (3.2) with absolute random noise at 1% of the maximum y. One set of data was generated with i>3 = 3, and the other with ¿>3 = 0.

The results from this analysis by a Marquardt nonlinear regression program are summarized by Table 3.2. The data are the same as those discussed in Section 2.B.4, but we have included a second data set with a constant background added. As seen previously, eq. (3.3) fit the data without any background (b3 = 0) quite well. Final parameter values were close to the true values, SD < ey (Table 3.2), and the deviation plot had a random scatter of points.

On the other hand, the fit of the data with a constant background b3 - 3 shows final parameter values that are in error by as much as 33% when compared to the true values (Table 3.2). A value of SD < ey and a clearly nonrandom deviation plot (Figure 3.1) are clear indicators that the model provides a poor fit to the data. Because we generated the data with eq. (3.2), we can surmise that the reason for this poor fit is that the background b3 is not included in the model. This simple example illustrates the fact that very large errors can result from the neglect of background in nonlinear regression.

Figure 3.1 Nonrandom deviation plot corresponding to results in Table 3.2 for nonlinear regression of exponential data with a constant background to a model not considering background.

Figure 3.1 Nonrandom deviation plot corresponding to results in Table 3.2 for nonlinear regression of exponential data with a constant background to a model not considering background.

Table 3.3 Results of the Three-Parameter Model in Eq. (3.2) Fit onto Exponential Decay Data with a Constant Background

Final value

Parameter/statistic Initial value True value" (f>3 = 3)

b2 28 15 14.89

Deviation plot Random

" Data generated by using eq. (3.2) with absolute random noise at 1% of the maximum y.

The correct model (eq. (3.2)) fits the decay data with b3 = 3 with excellent results. This analysis (Table 3.3) gives final parameter values with small errors, a random deviation plot (Figure 3.2), and a SD that is 10-fold smaller than the fit to eq. (3.3).

Therefore, the fit of the model including the background term is successful. The value of SD is 10-fold smaller after the addition of the third parameter to the model. The goodness of fit of two and three parameter models such as eqs. (3.3) and (3.2) can also be compared by using the extra sum of squares F test [1], which is discussed in more detail later in this chapter (Section C.l).

In the preceding example, we were lucky enough to be able to model the background in the exponential decay with a simple constant offset term. In general, the background may depend upon the independent variable. For example, the measured background may vary with time. In some cases involving approximately linear drift, a linear background term may suffice. In other situations, more complex expressions for background may be necessary. Some useful background expressions that can be added to regression models are listed in Table 3.4.

Figure 3.2 Random deviation plot corresponding to results in Table 3.3 for nonlinear regression of exponential data with a constant background onto the correct model.

Figure 3.2 Random deviation plot corresponding to results in Table 3.3 for nonlinear regression of exponential data with a constant background onto the correct model.

Table 3,4 Common Background Expressions for Regression Models

Background model

Shape

Possible application h b} exp(±fo4x)

Constant offset Increasing or decreasing exponential

Constant blank

Voltammetry for peaks near the ends of potential window b->,x + ¿>4

Linear

Sigmoidal variation

Drifting baseline

### X-ray photoelectron spectroscopy

In our experience, there seems to be no general way to predict how to account for background in every experiment. Each experimental situation is a little bit different. We recommend establishing the best way to deal with background well before analyzing the real experimental data. If the form of the background is well known, this can be approached initially by tests with computer generated data, as already illustrated. The data generated should contain added noise to eliminate the effect of computer roundoff errors in final statistics and deviation plots and to realistically mimic experimental conditions.

A BASIC subroutine that can be used to add randomly distributed noise to data is described in Box 3.1. Random number generators on computers

SUBROUTINE 3.1. BASIC SUBROUTINE FOR ADDITION OF RANDOM NOISE TO DATA

10 REM GENERATES A SET OF RANDOM NUMBERS WITH ZERO MEAN AND UNIT VARIANCE

20 REM these numbers can be used to generate noise in data 30 REM generate your data then add noise by adding to each point a value 40 REM = (fraction of noise desired)*ZZ(i%)*(max value of y) for absolute noise for N1% data points

50 REM This program may be used as a subroutine in the program you use to generate data

60 REM NAME - RANDOM.BAS

120 NEXT J%

130 PRINT "DATA POINT #","RANDOM #","DATA POINT #","RANDOM #" 140 FOR l% = 1 TO N1%/2 150 ZZ(2*I%-1)=(-2*LOG(X1(2*l%-1 ))/LOG(10))«(1/2)*COS(6.283*X1 (2*!%))

160 ZZ(2*I%)=(-2*LOG(X1(2*I%-1))/LOG(10))A(1/2)*SIN(6.283*X1 (2*l%)) 170 PRINT 2*I%-1,ZZ(2*I%-1),2*I%,ZZ(2*I%) 180 NEXT l% 190 RETURN

Box 3.1 BASIC subroutine for addition of random noise to data.

do not give a normally distributed set of numbers. The subroutine here uses the method described by Box and Muller [3] to convert random numbers generated by the BASIC RND function to N normally distributed numbers z, with a mean of 0 and a variance of 1. This set of numbers is used to add noise to a set of values of y.

Once an initial survey with noisy theoretical data is completed, it is important to base the final decision on how to deal with background on real experimental data. If possible, the testing should be done by using data for a standard system with well-known parameters. For example, in the exponential decay case we might choose as our standard a chemical that decays with a well-known rate constant. We would have the most confidence in the method of background treatment that returns the values of the parameters with the least error under experimental conditions similar to those to be used in studies of our unknown systems. This procedure follows from common practice in the physical sciences. If we are developing a new method, we first test it with a reference standard.

An auxiliary approach may be used if representative background response curves can be obtained in the absence of the sample. For example, in the exponential decay experiment, one could measure the signal vs. time and fit these data to various background models. The model that best fits the background data can then be added to the model for the sample data. However, it is not always possible to obtain such background data. It must be ensured that the presence of the sample does not change the background.