Fitting models to data is central to the quantitative pursuit of modern science. Let us first consider what types of situations involving measurements might arise during a typical day's work in a laboratory. Suppose I want to measure the pH of an amino acid solution. I place a precalibrated pH electrode into the solution and shortly thereafter read its pH on the pH meter to which the electrode is connected. This type of simple "read out" experiment is an exception. Few instrumental methods in chemistry and biochemistry can directly give us the specific information we set out to find.

In the most modern instrumental methods, the raw data obtained must be analyzed further to extract the information we need. Fortunately, this task can be handled automatically by a computer. For example, suppose we want to measure the diffusion coefficient of a protein in aqueous solution to get an idea of its molecular size. We have no "diffusion meter" available for this job. We need to do an experiment such as ultracentrifugal sedimentation, dynamic light scattering, or pulsed-field gradient spin-echo nuclear magnetic resonance and extract the diffusion coefficient by analyzing the data appropriately. Similarly, if we wish to know the lifetime of a transient species in a chemical or photochemical process, we cannot measure this quantity directly. The required information can be obtained by forming the transient species and measuring its decay with time. The resulting data are analyzed mathematically to obtain the lifetime.

Thus, extraction of meaningful parameters by computer analysis of ex perimental data is a widely required procedure in the physical sciences. The aim of this book is to describe general methods to analyze data and extract relevant information reliably in situations similar to those just described. These methods involve analyzing the data by using appropriate models. The models should be accurate mathematical descriptions of the response measured in the experiment that include all relevant contributions to the resulting signal. The goal of the computer data analysis is to reveal the relevant information in the raw experimental data in a form that is useful to the experimenter.

Computer modeling of experimental data, as the phrase is used in this book, involves using a computer program that can fine tune the details of the model so that it agrees with the data to the best of its ability to describe the experiment. Procedures in this computer program are designed to provide the best values, in a statistical sense, of numerical parameters in the model. These parameters might include diffusion coefficients or lifetimes discussed in the previous examples. In some cases, parameters will need to be interpreted further to obtain the desired information. For example, we might estimate decay lifetimes for a wide range of experimental conditions to elucidate a decay mechanism. A general block diagram of the analysis procedure is shown in Figure 1.1.

The computer programs utilized for modeling data should provide statistical and graphical measures of how well the model fits the data. These so-called goodness of fit criteria can be used to distinguish between several possible models for a given set of data [1, 2]. This might be the specific goal of certain analyses. In fact, goodness of fit parameters and graphic representations of deviations from models can be used as the basis for an expert system; that is, a computer program that finds the best model for sets of data on its own [1].

An example of a simple linear model familiar to chemists is Beer's law, which describes the absorbance (A) of light by a solution containing an absorbing molecule. The measured absorbance is linearly related to the product of the concentration of the absorber (C), the path length of light through the sample (b), and the molar absorptivity (e) of the absorber:

This model is linear, and linear regression analysis [3] can be used to obtain the molar absorptivity for a system with a known path length. The data

COLLECT RAW DATA FROM EXPERIMENT

ANALYZE DATA TO OBTAIN PARAMETERS

ÏNTEÏtPRJBTTHE PARAMETERS: DESIRED INFORMATION

analyzed are the measured absorbance vs. the concentration of absorber. For such linear models, closed form equations enable rapid, one-step calculations of the parameters of the model, such as the slope and intercept of a straight line. In this example, the slope of the regression line is directly proportional to the molar absorptivity.

Unfortunately for us, nature tends to be nonlinear. The preceding scenario involving a linear model is only a special case. Many models for experimental data turn out to be nonlinear. An example is the transient lifetime experiment discussed previously. If the decay of the transient species is first order, the rate of decay is proportional to the amount of transient, and the signal decreases exponentially with time. Also, at a given time, the signal decreases exponentially as the lifetime decreases. The model is nonlinear. Linear regression cannot be used directly to obtain lifetimes from the intensity vs. time data.

Although, linearization of a nonlinear model can be an option for analyzing data, it is often unwise. Reasons for this will be discussed in Chapter 2, where we also show that nonlinear models can be fit quite easily to data in a general way by using nonlinear regression analysis. The same principles apply as in linear regression (linear least squares), and in fact the two methods have a common goal. However, in nonlinear regression, we shall see that a stepwise, iterative approach to the least squares condition is necessary [4, 5],

Was this article helpful?

## Post a comment