## B1 Lysozyme

Determination of Component Bands The protein spectrum is considered to be the sum of the individual absorption bands arising from specific structural components, such as a helices, ¡3 sheets, and turns. Fitting such a spectrum directly without prior knowledge of the unknown number of Gaussian bands would be a daunting task. To aid in this task, as mentioned previously, we first examine the second derivative of the spectrum (Figure 7.3). The negative peaks in the second derivative spectrum (cf. Figure 7.1) correspond to the component bands in the original spectrum. Inspection of the second derivative yields an estimate of the number of component bands and the approximate positions of these bands, which are used as initial guesses for the xo i.

The next step in the analysis is to enhance the resolution of the original spectrum by using the fourier deconvolution (FD) algorithm developed by Kauppinen et al. [2], Care was taken to choose the correct values for the band width and resolution enhancement factor used in this algorithm, so that the FT-IR spectrum is not over- or underdeconvoluted. Underdeconvo-lution is recognized by the absence of a major band previously indicated by a negative peak in the second derivative spectrum. Overdeconvolution can be recognized by the appearance of large side lobes in the baseline region of the FD spectrum where no peaks should appear. This region should be flat, as in the original spectrum. As the deconvolution procedure progresses, analysis of the FD spectrum by nonlinear regression analysis can be used in an iterative fashion to help choose the FD parameters [9,10].

The FD spectrum of lysozyme is shown in Figure 7.3. Nonlinear regression analysis is done on the FD spectrum using the model in Table 7.1, where Aa is a constant baseline offset. We first fix the set of x0ji at the initial values found from the second derivative spectrum. The regression analysis on the FD spectrum is now done to obtain estimates of component peak widths and heights. Starting with these values and the x0j, as initial parameter guesses, a second nonlinear regression is done on the FD spectrum in which all of the parameters are found. Finally, the parameters found in this second analysis are used in a nonlinear regression analysis on the original spectrum (raw data).

The FT-IR spectrum of lysozyme showing the amide I and amide II regions is shown in Figure 7.2. The outer envelope of the graph represents the experimental data. The underlying peaks represent the individual component bands making up the best fit to the model in Table 7.1, undertaken as discussed previously.

Quantitative criteria used to ensure a correct fit to the model are as follows:

1. Correlation of all the components bands and their positions with the negative second derivative peaks;

2. Agreement of FD and experimental baselines;

4. A successful fit to the original spectrum using fixed x()J and the other parameters found from the best fit of the FD spectrum.

In practice, attainment of these criteria may require several cycles of FD and regression, until an optimal fit is achieved.

Criterion 4 involves using the results of the regression analysis of the FD spectrum (Figure 7.2) to provide the number of bands and their frequencies, which are then fixed in a model to perform a nonlinear regression analysis of the original spectrum with the band widths and heights as parameters.

The final fit to the lysozyme FT-IR spectrum is illustrated by Figure 7.4 with its 29 component peaks. The inset shows that the residuals of the regression are reasonably random, indicating the model is a reliable fit to

1700

1650 1600 WAVENUMBER, cm1

1550

Figure 7.4 FT-IR spectrum showing amide I and amide II bands of lysozyme in aqueous solution. One outer envelope double line is the original spectrum. The second line on outer envelope is the computed best fit from nonlinear regression according to Table 7.1. The individual component bands underneath were constructed from the results of the nonlinear regression analysis. The inset shows plot of residuals (connected by a line) of the differences between the calculated and experimental absorbances vs. frequency. (Reprinted with permission from [10], copyright by the American Chemical Society.)

1750

1700

1650 1600 WAVENUMBER, cm1

1550

1500

Figure 7.4 FT-IR spectrum showing amide I and amide II bands of lysozyme in aqueous solution. One outer envelope double line is the original spectrum. The second line on outer envelope is the computed best fit from nonlinear regression according to Table 7.1. The individual component bands underneath were constructed from the results of the nonlinear regression analysis. The inset shows plot of residuals (connected by a line) of the differences between the calculated and experimental absorbances vs. frequency. (Reprinted with permission from [10], copyright by the American Chemical Society.)

the data. Relative areas under the component bands of the original spectrum are in good agreement with those calculated from results of the regression analysis of the FD spectrum.

Further validation of the calculated components of the amide I and II bands can be obtained by comparing the second derivative FT-IR spectrum with the second derivative calculated from the model using the parameters found in the nonlinear regression. The results of such a comparison for lysozyme are shown in Figure 7.3. The inset of this figure shows a reasonably random residual plot, which further establishes the reliability of this methodology for quantitatively resolving FT-IR spectra of proteins into their individual component bands.

Component Band Assignments Now that the individual component bands of the lysozyme spectrum have been identified, their frequencies need to be assigned to the specific structural units that gave rise to them. The sum of the normalized areas under the peaks corresponding to a given structural feature can then be used to indicate the fraction of that particular feature in the protein [3, 4, 9, 10].

The band assignments given in Table 7.2 are based on theoretical considerations [5] as well as experimental correlations with X-ray crystal structures, as we shall demonstrate. The number of components bands found for a wide variety of proteins using the protocols outline in Table 7.1 are consistent with the numbers of bands predicted by theory [9, 10]. The assignments in Table 7.2 can be used for structural analysis based on the FT-IR of any protein in water.

Note that the assignments include bands for the side chains of asparagine (ASN) and glutamine (GLN) amino acid residues that occur within the amide I envelope [9, 10]. The true fractions of GLN and ASN obtained from the protein's amino acid content must be subtracted from the fractional areas of component bands at the appropriate frequencies. Any excess area may be assigned to a relevant secondary structural feature of the polypeptide backbone. If the fraction of area is less than the fraction of GLN and ASN present in the protein, then the experimental areas should be

Table 7.2 Secondary Structure Assignments of Amide I FT-IR Bands

Frequency, cm-1 Assignment to structural unit

Table 7.2 Secondary Structure Assignments of Amide I FT-IR Bands

Frequency, cm-1 Assignment to structural unit

1681- |
-1695 |
Turn |

1673- |
-1679 |
Turn, twisted sheet |

1667- |
-1669 |
GLN (C = O) and ASN (C = O) side chain, 3/10 helix, bent strand |

1657- |
-1661 |
a helix (A band) |

1651- |
-1653 |
GLN (C - AO and ASN (C - N) side chain, a helix (E band) |

1643- |
-1648 |
Disordered, irregular, gentle loop |

1622- |
-1638 |
Extended strand, rippled and pleated sheets |

subtracted from the amide I envelope and all the remaining bands should be normalized.

Assignments of component bands in the amide I and amide II envelopes of lysozyme obtained from the nonlinear regression analysis are listed in Table 7.3. If this fractional area is less than the fraction of GLN and ASN residues present in the protein, then the experimental areas should be subtracted from the amide I envelope and all the remaining bands should be normalized so that their sum equals unity.

The band assignments in Table 7.3 can now be used to obtain the fractions of the different structural features. The area of each band is expressed as a fraction of the total band area of the amide I or amide II envelopes. From the relative fractional areas of these bands, we obtain estimates of the fractional amounts of the different structural features in the polypeptide chain. Replicate values of these fractions can be obtained from analysis of the amide I and amide II regions from the original and Fourier deconvoluted spectra, as shown in Table 7.4. Standard deviations suggest that the fractions for each feature are reproducible to ±10%.

Lysozyme has a known crystal structure, and results from the FT-IR analysis can be compared to the X-ray crystal structure. The average FT-IR fractions agree reasonably well with the fractions from the crystallographic analysis. The agreement becomes a bit better when we realize that the X-ray data consider only one type of extended feature, but the FT-IR counts all such features. Also, the fraction of disordered regions is not measured directly by X-ray analysis but only as the difference between the sum of all other fractions and 1. Thus, the fraction of disordered regions will be an overestimate from the crystal structure, because the fraction of extended regions has been underestimated. Taking these discrepancies into account, we see that there is good agreement between the X-ray and FT-IR analyses (Table 7.4).

Extended |
Disordered | |||

Assignments |
Helix |
(cm1) |
(loops) |
Turns |

Amide I |
1660 |
1637 |
1646 |
1691 |

1655 |
1629 |
1683 | ||

1623 |
1675 | |||

1668 | ||||

Amide II |
1540 |
1532 |
1548 |
1578 |

1525 |
1571 | |||

1564 | ||||

1556 |

Number of Component Peaks Because of the large number of component peaks contributing to the FT-IR spectra of proteins, care must be taken that the correct number is used in the model. Careful attention must be paid to goodness of fit criteria (Section 3.C.1) as specifically outlined for the FT-IR analysis in Table 7.1. In addition, the extra, sum of squares F test (Section 3.C.1) can be used as a statistical indicator to find the correct number of component peaks in the model.

An detailed study of the number of peaks needed to fit FT-IR spectra of proteins has been reported [9], The number found in an analysis following the protocols in Table 7.1 is consistent with the number of peaks predicted by theory [5-8]. In general, it is found that inclusion of too many peaks in the model causes the insignificant component peaks to attain near zero or negative areas in the regression analysis. Inclusion of too few peaks causes the sum of squares of the deviations and the standard deviation of the regression to become larger. This latter situation is best evaluated with the extra sum of squares F test, since models with different numbers of peaks have different degrees of freedom.

## Post a comment