Fig. 4.2. Schematic illustration of the simulated fermentation model.
Five different feed rate profiles were chosen to excite the mathematical fermentation model: (1) a square-wave feed flow, (2) a saw-wave feed flow, (3) a stair-shape feed flow, (4) an industrial-feeding policy and, (5) a random-steps feed flow. These feed rates are shown in Figure 4.3. Each of the first four feed rate profiles yielded 150 input-output (target) pairs corresponding to six minutes sampling time during a 15-hour fermentation; the random-step feed rate yielded 450 data pairs during a 45-hour fermentation with the same length of sampling interval.
A general procedure for developing neural networks  are: (1) data preprocessing, (2) appropriate training procedure, (3) generalization and, (4) topology optimization.
Before training a RNN, the input and target data are pre-processed (scaled), thus they are within a specified range, [-1, 1]. This specified range is the most sensitive area of the sigmoidal function, which is the hidden layer activation function. In this case, the output of the trained network will also
(a) Square-wave feed
(b) Saw-wave feed
(c) Industrial feed
(d) Stair-shape feed
3000 2000 1000 0
(d) Stair-shape feed
(e) Random-steps feed Fig. 4.3. Plots of simulation data for five different feed rates.
be in the range [-1,1]. A post-processing procedure has to be performed in order to convert the output back to its original unit.
The performance function that is used for training the neural networks is a mean square error (MSE):
where, N is the number of training data pairs; Xa is the target (actual) value of biomass; Xi is the corresponding estimated value preduced by the neural softsensors.
The Levenberg-Marquardt backpropagation (LMBP) training algorithm is adopted to train the neural networks due to its faster convergence and memory efficiency [34,93]. The algorithm can be summarized as follows:
1. Present input sequence to the network. Compute the corresponding network outputs with respect to the parameters (i.e., weights and bias) Xk. Compute the error e and the overall MSE error.
2. Calculate the Jacobian matrix J through the backpropagation of Marquardt sensitivities from the final layer of the network to the first layer.
3. Calculate the step size for updating network parameters using:
where, ik is initially chosen as a small positive value (e.g., ik = 0.01).
4. Recompute the MSE error using Xk + AXk. If this new MSE error is smaller than that computed in step 1, then decrease ik, let Xk+1 = Xk + AXk and go back to step 1. If the new MSE error is not reduced, then increase ik and go back to step 3.
The algorithm terminates when i) the norm of gradient is less than some predetermined value or, ii) MSE error has been reduced to some error goal or, iii) ik is too large to be increased practically or, iv) a predefined maximum number of iterations has been reached.
Data generated from the five different feed rate profiles were divided in three groups: the training data set, the validation data set and the testing data set. A well known fact of choosing the training data set is the training data set has to cover the entire state space of the system as many times as possible. In this study, the random-steps, which excited the process the most, was used to generate the training data set. Another set of data generated from the stair-shape feed rate was used as the validation data set. To prevent the neural network from being over-trained, an early stopping method was used. The error on the validation set was monitored during the training process. The validation error would normally decrease during the initial phase of training. However, when the network began to over-fit the data, the error on the validation set would typically begin to rise. When the validation error increased for a specified number of iterations, the training was stopped, and the weights and biases at the minimum of the validation error were obtained. The rest of the data sets, which were not seen by the neural network during the training and validation period, were used in examining the trained network.
There are no general rules or guidelines for selection of the optimal number of hidden neurons in RNNs [39,78]. The most commonly used method is trial and error. Fewer neurons results in inadequate learning by the network; while too many neurons create over-training and result in poor generalization. One straightforward approach adopted by many researchers is to start with the smallest possible network and gradually increase the size until the performance begins to level off [39,94,95,96]. From an engineering point of view, however, the smallest possible size of a neural network, which can solve the problem, is the desired end result. The approach, which works in the opposite way to the method mentioned above, was used in this work. Starting with a reasonably big network, it was then gradually "shrunk" until the error appearing on the test data was beyond acceptance.
Was this article helpful?