## Experimental Validation of Cascade Recurrent Neural Network Models

This chapter examines cascade RNN models for modelling bench-scale fed-batch fermentation of Saccharomyces cerevisiae. The models are experimentally identified through training and validating using the data collected from experiments with different feed rate profiles. Data preprocessing methods are used to improve the robustness of the neural network models. The results show that the best biomass prediction ability is given by a DO cascade neural model.

### 6.1 Introduction

A large number of simulation studies of neural network modelling have been reported in the literature [104,105,106,107], and good results have generally been obtained. However, only a few of such studies have been taken the further step to experimental validation. Simulations allow systematic study of the complex bioreaction without conducting real experiments. However, because of the inherent nonlinear dynamic characteristics of fermentation processes, the process-model mismatching problem could significantly affect the accuracy of the results.

The main objective of this study is to model a laboratory scale fed-batch fermentation by neural network models using the cascade recurrent structure proposed in Chapter 5.

The remaining sections of this chapter proceed as follows: in Section 6.2, the cascade RNN and mathematical models are given; in Section 6.3, the experimental procedure is described; in Section 6.4, the experimental model identification and various aspects of data processing are detailed; conclusions are drawn in Section 6.5.

L. Z. Chen et al.: Modelling and Optimization of Biotechnological Processes, Studies in Computational Intelligence (SCI) 15, 71-89 (2006)

www.springerlink.com © Springer-Verlag Berlin Heidelberg 2006

6.2 Dynamic Models

### Recurrent neural network models

Two neural network models employed in this work are shown in Figure 6.1 and Figure 6.2. Development of these kinds of neural models is described in Chapter 5. The difference between Figure 6.1 and Figure 6.2 is that model I uses Co, which is the concentration of DO, as its state variable, while model II uses the concentration of glucose Cs as its state variable.

Both model I and II use cascade structures, which contain two recurrent neural blocks. They model the dynamics from inputs, F and V, to the key variable Co (or Cs) and the biomass concentration X. The first block estimates the trend of Co (or Cs) which provides important information to the second neural block. The topology of each neural block is the same as that of the softsensor developed in Chapter 4.

Block 1

Block 2

Block 1

Block 2

In each of the neural blocks, both feed-forward and feedback paths are connected through TDLs in order to enhance the dynamic behaviors. Sigmoid activation functions are used for the hidden layers and a pure linear function is used for the output layers. The structure of the neural blocks reflects the differential relationships between inputs and outputs.

The first neural block can be described as follows:

C(t + 1) = h(C(t), C(t - 1), •••,C(t - m), H1(t), H1(t - 1), •••, H1(t - u), F (t),F(t - 1), ---,F(t - n),V(t)) (6.1)

Block 1

6.2 Dynamic Models 73 Block 2

Block 1

6.2 Dynamic Models 73 Block 2

where, /.(•) is the nonlinear function represented by the first block; H1 is a vector of the values of activation feedback in block 1; C represents Co or Cs, which is the concentration of DO or glucose; u, m and n are the maximum number of activation feedback delays, output layer feedback delays and input F delays in the first block correspondingly.

The second neural block has an additional input, Co in Figure 6.1 or Cs in Figure 6.2, as compared with the first block. The predicted biomass concentration can be described as:

X(t + 1) = h(X(t),X(t - 1), •••, X(t -p), C(t),H2(t), H2(t - 1), •••,

H2(t - v), F(t), F(t - 1), •••^(t - n), V (t)) (6.2)

where, /2( ) is the nonlinear function represented by the second block; H2 is a vector of the values of activation feedback in block 2; v, p and n are the maximum number of activation feedback delays, output layer feedback delays and input F delays in the second block correspondingly. In this study, m, n, p, u, v are chosen as 6, 4, 4, 1, 1 respectively.

### Dynamic mathematical model

To have a comparison with the neural models, a mathematical model was also identified for optimization. In this experimental investigation, four state variables, the concentration of biomass, DO, glucose and the fermentation volume were available. For a mathematical model that can describe the fermentation system with those available information, a popular mass balance equation structure, using simple Monod-like kinetics [108], was chosen. The required number of independent state variables was exactly the same as model II described above. The mass balance equations are in the form of:

dX F

dt Yxs V

where, ((S) = K +i's+'S2/K ; X and S are respectively the concentrations of biomass and glucose; SF is the glucose concentration in the feeding solution; V is the liquid volume in the fermentor and F is the volumetric feed rate; KS, Ki, (max, Yxs and m are the model parameters.

The following initial culture conditions and feed concentrations have been used:

The GA was used to identify the parameters. The details of the identification method are described in Chapter 3. The identified parameters are listed in Table 6.1.

Table 6.1. The identified parameters that are used in the work.

Parameter Value

KS 0.01811 g/L

Yxs 0.2086

6.3 Experimental Procedure Yeast strain and preservation

A pure culture of the prototrophic baker's yeast strain, Saccharomyces cere-visiae CM52(MATa his3-A200 ura3-52 leu2-A1 lys2-A202 trp!-A63), was obtained from the Department of Biologic Science, The University of Auckland (Auckland, New Zealand). A serial transfer method [109] was used for preserving and maintaining yeast strains. The pure culture was sub-cultured on YEPD agar slopes (Bacteriological peptone: 20g/l; yeast extract: 10g/l; glucose: 20g/l; agar: 15g/l), which were autoclaved for 15 minutes at 121°C . These cultures were kept in an incubator at 30° C for 3 days. Then the transferred cultures were stored at 0-4° C . The pure culture can be kept in a fridge for 30 days, after that a re-transferring is normally required. The stock cultures that were used for the inoculum preparation are shown in Figure 6.3.

Growth of inoculum

The preserved culture was initially revived by growth in YEPD medium with the following composition: peptone, 20g/l; yeast extract, 10g/l; glucose: 20g/l. A 250mL shaker-flask contained 100mL of YEPD medium. Both flask and medium were sterilized at 110°C for 20 minutes. Yeast cells on the surface of a refrigerated agar slope were washed into 100ml of sterilized YEPD medium and propagated on the digital incubator shaker (Innova 4000, New Brunswick Scientific Co.,Inc., USA), as shown in Figure 6.4, at 30°C and 250 rpm for 12 hours. 100ml of such culture was used as the inoculum for each fermentation experiment.

Batch and fed-batch phases

To allow the initial yeast inoculum to adapt to the new environment of bench-scale reactor and be sensitive to the feeding medium, a 12-hour batch fermentation was initially conducted after 100ml of inoculum was added into the reactor containing one liter YEPD medium. The medium was sterilized together with the reactor at 121°C for 25 minutes before a batch phase. During the batch phase, the temperature, agitation speed and air supply for the fermentation courses were respectively maintained at 30°C , 500 rpm and 4.0 L/min. The laboratory fermentor is shown in Figure 6.5.

The fed-batch fermentation was carried out under the same aerobic and temperature conditions of batch cultivation except a feeding medium was added in the bioreaction vessel, as shown in Figure 6.6, during the fed-batch cultivation. The feeding medium contained per liter: peptone, 200g; yeast extract, 100g; glucose, 200g; anti-foam, 20 drops. Due to the high glucose concentration in the feeding medium, the autoclave condition was changed to 110°C and 20 minutes. After sterilization, the glucose concentration was measured as 56.56 g/L. 1.5 liters of such medium was fed into the fermentor using the controllable peristaltic pump (illustrated in Figure 6.7) with flow rates between 0 and 0.2988 L/h.

Sampling time

Fermentation processes were run for 12.5 hours for model identification and 8 hours for optimal feed rate validation. Medium samples were taken every 30 minutes approximately to determine biomass and glucose concentrations. The choice of sampling time was based on the practical operation condition and that suggested in the literature [110,111]. The choice was guided by, but not restricted to, the dominant time constant of the process. A good rule of thumb is to choose the sampling time At < 0.1rmax, where Tmax is the dominant time constant. A 0.5-hour sampling time was chosen in this study. A total of 26 medium samples were measured during a 12.5-hour fermentation run.

Analysis

### Determination of culture dry weight

To compromise between the effects on the total volume of the bioreaction and the accuracy of the biomass concentration measurements, two 2-ml culture samples were taken from the broth (as shown in Figure 6.8) and centrifuged at 45000 rpm for five minutes (Eppendorf centrifuge, Germany) After decanting the supernatant liquid, distilled water was added to the tubes and it was centrifuged again to wash the residual medium components off the yeast cells. The wet yeast was dried at 100° C for 24 hours before being weighed using the balance (Sartorius BP110S, Sartorius AG GOTTINGEN, Germany). The average dry weight was used for calculating the biomass concentration.

### Measurement of glucose concentration

Glucose was measured off-line by reacting the glucose in glucose (Trinder) reagent (Sigma Diagnostics, USA) to yield a colored (red) solution. The change in color was measured by a spectrophotometer (Hewlett-Packard 8452A Diode Array Spectrophotometer, Germany) using a 505nm wavelength. The concentration was calculated by comparing the change in absorbency to a known standard glucose solution with the standard curve fitting error less than 2%.

### Monitoring of dissolved oxygen

The values of DO were monitored on-line by the DO electrode as illustrated in Figure 6.9. Prior to measuring, the electrode was calibrated by frequently saturating it in the fermentation medium (without yeast) with air and by equating the instrument response with "air saturation". Instrument readings during the fermentation can then be expressed as "% of air saturation". The DO data were automatically logged into the computer every minute and stored in a process database.

6.4 Model Identification

Nine different feed rates for experimental identification

Due to the intensive data-driven nature of neural network modelling, a sufficient number of data are required. A data base that can provide "rich" enough information to build an appropriate and accurate input-output model is important [110].

Fig. 6.10. Different feed rates for system identification.

Fig. 6.10. Different feed rates for system identification.

In this study, a small data base was built by conducting nine experiments controlled by different feed rate profiles. The nine feed rates, which are illustrated in Figure 6.10, were designated in the experiments to excite the fed-batch fermentation system. They were carefully chosen in order to cover the "experimental space" as many times as possible and to yield informative data sets. Data which were collected during the bioreaction course were used to explore the complex dynamic behavior of the fed-batch fermentation system.

They were used to train the neural networks and to identify the parameters of the mathematical model. The feed flow rate, measured data of DO, biomass, glucose and the calculated value of volume (using Equation 6.5) for one of the experiments are plotted in Figure 6.11.

ta 0 100

Fig. 6.11. One of the experimental data sets.

82 6 Experimental Validation of Neural Models Training of neural network models

The aim of network training is to minimize the MSE between the measured value and the neural network's output. The LMBP training algorithm was employed to train the neural networks [34]. The explanation of neural network training and the LMBP algorithm are given in Chapter 4. In this section, a cross validation technique is emphasized.

An early stopping method is employed to prevent the neural network from being over-trained. A set of data which is independent from the training data sets is used as validation data. The error of validation is monitored during the training process. It will normally decrease during the initial phase of training. However, when the network begins to over-fit the data, the error on the validation set typically begins to rise. When the validation error increases for a specified number of iterations, the training is stopped, and the weights and biases of the network at the minimum of the validation error are obtained. The rest of the data, which is not seen by the neural network during the training and validation period, is used in examining the merit of the trained network.

Figure 6.12 shows the network training procedure. In this work, a total of nine experimental data sets were available. Seven of them were used for training, one of them was used for validation and one of them was used for testing. For each training, 50 networks were trained, and the one that generated the minimum test error was saved. Different combinations of the nine sets data were chosen in turn to train the network. The network that produced the minimum test error for all training was selected as the model of the fed-batch fermentation process. The number of hidden neurons for the first hidden layer and the third hidden layer were chosen as 12 and 10 respectively. The 6/4/4 structure (the feed rate delays are six, the first block output delays are four and the second block output delays are four) was selected as the topology of the network [112].

### Data processing Data interpolation

Due to the infrequent sampling of biomass and unequal sampling time between DO and biomass, an interpolation method was needed to process the experimental data before they could be applied to the model. To preserve the monotonicity and the shape of the data, a piecewise cubic interpolation method [98] was adopted in this study. After interpolation, the time step was 6 minutes for all data sequences .

DO data normalization

DO value was measured on-line and was recorded every one minute. From the data obtained, DO values were shown to be located between 20% and 100%.

But a few of the data sets were out of this range. This is due to the difficulties in calibration of the DO sensor, initial air saturations varying from batch to batch and the bubble formation during the bioreaction. The unexpected changes of DO range could significantly affect the accuracy of the biomasss concentration prediction, because the DO value is the key information for biomass estimation in the proposed neural network model I. Assuming that the trend of measured DO data was correct, a normalization method was used to bring the DO values that were outside the boundary to the range between 20% and 100%. The trend of the DO value was thus emphasized and the uncertainty on the biomass prediction was reduced. The mathematical formulation that was used for normalization is as follows:

where, Conew, Co, Comin, Comax are normalized value, original value, maximum value and minimum value of original DO data, respectively.

Figure 6.13 shows the neural network predictions using the original data and the normalized data. It can be seen that a closer representation of the cell growth has been achieved by the model trained with normalized DO data.

Fig. 6.13. Comparison of biomass prediction based on the neural networks trained with the original DO data and the normalized DO data.

Fig. 6.13. Comparison of biomass prediction based on the neural networks trained with the original DO data and the normalized DO data.

Pre-processing and post-processing of input and output data

For mathematical model identification problems, data processing normally means the filtering of the noisy data. For neural network model identification problems, however, data processing is mainly focused on scaling data so that they can fall into a specific range, which is the most sensitive area of the activation function. In the proposed neural networks, a sigmoid function is used in the activation layers:

where, x is the input to the neuron, y is the output of the neuron, 3 G IR.

The most sensitive input area of the above function is in the range of [—1,1]. The mathematical equation used for input data scaling is given as follows:

where,xn, x, xmin, xmax are processed value, original value, minimum and maximum value of the original data, respectively.

As the inputs have been transformed into the range of [—1,1], the output of the trained network will also be in the range [—1,1]. Thus the output data of the neural network have to be converted back to their original units using:

where, y, yn, ymax, ymin are converted value, network output value, maximum value of target data and minimum value of target data, respectively.

Some aspects of data scaling problems were discussed by Koprinkova and Petrova [113]. A major issue was the scaling factor S, which is defined as:

where, B is the highest value of the input, R is the span of the specific range to which the input data will be transformed.

Koprinkova and Petrova found that the smaller S was, the bigger the error would be on the neural network prediction. There was no significant loss of information when S = 0.009. The loss of information, however, was unacceptable if S reached the value of 0.00025. In the current study, the highest value of inputs was the upper bound of DO, which was 100. The specific range was [—1,1], so that the span R = 2. Thus the scaling factor S = 2/100 = 0.02, which was much higher than 0.009. Furthermore, the scaling factors that were used for all inputs by Koprinkova and Petrova [113] were the same, whereas in this study, different inputs had different scaling factors, such that the scaled values were distributed uniformly through the whole specific range of [—1,1]. This further reduces the loss of information caused by the different input ranges, and it is evident in this study when the above aspects are considered.

86 6 Experimental Validation of Neural Models Improvement on initial value prediction

As can be seen from Figure 6.13, the initial value prediction was not satisfactory for the trained neural network model. One of the biomass predictions is shown in Figure 6.14. The feed rate profile is f6 as plotted in Figure 6.10. It is obvious that an overshoot occurred at the beginning phase of the prediction. Since the initial culture conditions were the same for all experiments, the most likely reason for this problem was due to the different initial feed rates.

Though an initial prediction problem has been encountered in many neural network modelings, few of them have been solved. An attempt was made by Dacosta et al. [73] to cope with the initial biomass prediction problem when dealing with a radial basis function network model. The different sizes of the initial inoculum were modelled by incorporating a single additional characterization input, which was the initial off-line biomass weight assay. However, the neural network used in this work has a recurrent structure, which makes the proposed method unsuitable in this case.

In order to overcome this problem, a zero-appending method was used. A series of zeros were appended to the beginning of each feed rate. Because the time length must be the same for all input variables, another series of 1s, which equal to the initial values of the reaction volume (1L), was also appended to the data sequence of the fermentation volume. The appended sequences included eight points with six minute intervals. The total time length for appending was 42 minutes. This time length was selected and determined by trial and error. This reflected the unstable period of network prediction. After this period, a stable accurate prediction could be achieved. The result is shown in Figure 6.15.

### Results of model prediction

The predictions based on three different models, neural network model I, model II and the mathematical model, were compared as shown in the Figure 6.16. The parameters of the mathematical model, which were identified using the GA, are given in Table 6.1. Among these three biomass prediction curves, the prediction based on neural network model I yielded the best agreement with the experimental data, whereas the mathematical model gave the worst prediction. The overall prediction MSEs for neural models I, II and the mathematical model were 0.1067, 0.3826 and 0.4684 respectively.

Fig. 6.15. Biomass prediction with zero appending at the beginning of feed rate.

Fig. 6.15. Biomass prediction with zero appending at the beginning of feed rate.

Fig. 6.16. Results of biomass predictions using neural network model I, neural network model II and the mathematical model.

### 6.5 Conclusions

The Cascade RNN models are proposed in this work to describe a bench-scale fed-batch fermentation of Saccharomyces cerevisiae. The nonlinear dynamic behavior of the fermentation process is modelled by the cascade RNN models with internal and external feedback connections. The structures of the models are identified through the training and validating using the data collected from 9 experiments with different feed rate profiles. Data preprocessing methods are used to improve the robustness of the neural network model to match the process' dynamics in the presence of varying initial feed rates. The most accurate biomass prediction is obtained by the DO neural model. The results show that the proposed neural network model has a strong capability of capturing the nonlinear dynamic underlying phenomena of the fed-batch process, provided that sufficient data, measured at appropriate sampling intervals, are available. Results also show that proper data processing and zero-appending methods can further improve the prediction accuracy.

## Post a comment