## A 1 Models for Peak Shape

This section discusses analysis of data from single channel chromatography. In these experiments, the effluent from the column passes through a single detector and the output of the analysis is the response of the detector vs. time. The time of introduction of the sample onto the column is t = 0. In Section 3.B.2 we discussed various models for peak-shaped data. Because of the wide use of chromatography in modern science, we now focus specifically on separation of overlapped peaks in this technique.

Full baseline resolution of all sample components is the ultimate goal of a chromatographic analysis, but it is not always achievable. This is especially true for complex samples containing many components with similar properties. If the shapes of chromatographic peaks are perfectly symmetrical, they can often be modeled using Gaussian or Gaussian-Lorentzian peak shapes. Models for overlapped peaks identical to those described in Section 3.B.2 can be employed in such cases.

The symmetric peak shape represents an ideal case in chromatography. Peaks frequently have some degree of asymmetry, featuring so-called tailing on the longer time side of the peak. Examples of symmetric and tailing peak shapes are given in Figure 14.1.

Research into models for unsymmetric chromatographic peaks has been conducted for several decades [1-3]. We will provide here only a brief summary of some of the more useful models considered. At the end of this section, we discuss simple, easy to compute, general models (Tables 14.1 and 14.2).

One appropriate and popular model for chromatographic peaks involves the convolution of a Gaussian peak with an exponential tail , This model, known as the exponentially modified Gaussian  can be expressed as

/? = -[' d? exp[(i - t0)2/2W2] exp[-(i - t' + t0)h] (14.1) 0 5 10 15 20 25 time
 Peak/parameter A c a b a 1 5 1 1 b 1 5 1 0.6 c 1 5 1 0.3 d 1 15 2 R = response of the detector, t = time following injection of the sample, ?o = position of the peak maximum on the t axis, t = decay lifetime of the tail, W = width at half-height of the Gaussian component of the peak. The model in eq. (14.1) was found to be identical to a Gaussian model when t/W < 0.4. Approximate algorithms can be used to compute the model when t!W> 0.4, and these are described in the original literature , A computer program for model computation has been published . The form of eq. (14.2) relates the model specifically to chromatographic parameters, as follows: (TG t and tG = retention time, b > 0 for a tailing peak; a = b for a symmetric peak, A is related to peak height, and c controls the peak position; the term b5t + b6 is the background Regression equation Regression parameters: Data: A a b c b5 b6 Response vs. time Useful quantities: Figure 14.1. Curve d has the same a/b as c in Figure 14.1, but the values of a and b have been doubled and c shifted to a larger value for clarity. This change illustrates the decrease in peak width given by this function as absolute values of a and b are increased. Overlapped peaks can be resolved by using a model consisting of the sum of two functions of the form in Table 14.1. This overlapped peak model (Table 14.2) provides rapid and stable convergence during nonlinear regression analysis using the Marquardt-Levenberg algorithm, provided that good initial guesses of peak positions are used. Initial guesses of the peak positions can be made using the minima of second derivatives, as discussed in Chapter 7. The peak position is relatively close to the parameter Cj (see Table 14.1). The At values are roughly twice the heights of the component peaks. An example of a two-peak fit to theoretical data illustrates the quality of the initial guesses required for good convergence. Figure 14.2 shows a set of data for two overlapping peaks computed from the multiple peak model in Table 14.2 with normally distributed absolute error of 0.5%AX. Also shown in this figure is the theoretical curve computed from the initial guesses used to begin a regression analysis, whose results are illustrated in Figure 14.3. Although the curve computed from initial guesses shows deviations of data from the model in all regions of the peak, the regression analysis converged to an excellent fit after six cycles. The parameters found in this analysis are given in Table 14.3. These results illustrate how close our initial guesses need to be to achieve rapid convergence of the model in Table 14.2. Larger errors in initial values of A, a, and b caused excessively long convergence times using the Table 14.2 General Model for Multiple Symmetric or Unsymmetric Single Peaks Properties: a > b > 0 for a tailing peak; a = b for a symmetric peak; A is related to peak height and c controls the peak position; the term b\$t + b6 is the background Regression equation: R = i {exp[-fli(i - Cj)] +'exp[-bit - C;)]} + H + hb for k peaks Regression parameters: Data: Aj aj bj Cj for i = 1 to k Response vs. time and b5 b6 Special instructions: 1. Second derivative of chromatogram can be used to identify number of peaks and their initial positions in cases of severe overlap (see Chapter 7) 2. Test for goodness of fit for models with different numbers of peaks by comparing residual plots and by using the extra sum of squares F test (see Section 3.C.1) 3. Peak properties can be obtained from formulae in Table 14.1; peak areas can be obtained by integration of the component peaks after finding their parameters in the regression analysis Marquardt-Levenberg algorithm. In the example in Figures 14.2 and 14.3, errors in initial values of the c; of about 0.2 min. also caused an excessive number of cycles for convergence. Errors larger than this usually caused the convergence to fail. However, with the use of the second derivative to aid in initial guesses of c,- and with careful attention to the choice of initial Figure 14.2 Curve from a two-peak model in Table 14.2 (dots) generated with normally distributed absolute noise of 0.5%/li shown along with the starting line (solid) computed from the initial parameters of a regression analysis (see Table 14.3) onto the same model. time Figure 14.2 Curve from a two-peak model in Table 14.2 (dots) generated with normally distributed absolute noise of 0.5%/li shown along with the starting line (solid) computed from the initial parameters of a regression analysis (see Table 14.3) onto the same model. 0.00 1.00 final final 0123456789 time Figure 14.3 Curve from a two-peak model in Table 14.2 (dots) generated with normally distributed absolute noise of 0.5%Ai shown with computed line (solid) from a successful regression analysis (see Table 14.3 for parameters) onto the same model. parameters to give a reasonable starting point to the computation (cf., Figure 14.2), the model in Table 14.2 provides excellent convergence. We end this discussion with a small caveat on the use of the model in Table 14.2. Although this model provides a symmetric peak shape when a = b, it is slightly different from Gaussian or Lorentzian shape. This is illustrated by the results of fitting a two-peak model from Table 14.2 onto a set of slightly overlapped peaks. The first peak is Gaussian and the second is constructed from the model itself (Table 14.1). The fit to the first Gaussian Table 14.3 Initial and Final Parameters for Data in Figures 14.2 and 14.3 Parameter Initial True value" Final' 0.001 1.977 ± 0.014 4.003 ± 0.002 3.980 ± 0.017 3.339 ± 0.068 1.046 ± 0.024 4.495 ± 0.002 4.824 ± 0.111 2.011 ± 0.014 a Data computed with absolute normally distributed error of 0.59Mi b Value given with standard error from the regression analysis. time Figure 14.4 Curve from a two-peak model (dots) generated with normally distributed absolute noise of 0.5% of the first peak height. The first peak has a Gaussian shape and the second is computed from the model in Table 14.1. The solid line was computed from the best fit parameters onto the two-peak model in Table 14.2. time Figure 14.4 Curve from a two-peak model (dots) generated with normally distributed absolute noise of 0.5% of the first peak height. The first peak has a Gaussian shape and the second is computed from the model in Table 14.1. The solid line was computed from the best fit parameters onto the two-peak model in Table 14.2. peak is not as good as the fit to the second peak computed from the model itself (Figure 14.4). The fitted function is slightly wider at the bottom than the Gaussian peak, and the peak height is slightly overestimated. The baseline offset, which was zero in the computed data set, is underestimated by the regression program. This may be acceptable in some, but probably not all, analyses where Gaussian peak are encountered. We include the example above to make the point that, although the models in Table 14.1 and 14.2 can provide a variety of shapes including tailing peaks, they may not be appropriate for all chromatographic data. Where they are not, the exponentially modified Gaussian and Gaussian-Lorentzian models should be considered.