Predicting industrial‐scale cell culture seed trains–A Bayesian framework for model fitting and parameter estimation, dealing with uncertainty in measurements and model parameters, applied to a nonlinear kinetic cell culture model, using an MCMC method

For production of biopharmaceuticals in suspension cell culture, seed trains are required to increase cell number from cell thawing up to production scale. Because cultivation conditions during the seed train have a significant impact on cell performance in production scale, seed train design, monitoring, and development of optimization strategies is important. This can be facilitated by model‐assisted prediction methods, whereby the performance depends on the prediction accuracy, which can be improved by inclusion of prior process knowledge, especially when only few high‐quality data is available, and description of inference uncertainty, providing, apart from a “best fit”‐prediction, information about the probable deviation in form of a prediction interval. This contribution illustrates the application of Bayesian parameter estimation and Bayesian updating for seed train prediction to an industrial Chinese hamster ovarian cell culture process, coppled with a mechanistic model. It is shown in which way prior knowledge as well as input uncertainty (e.g., concerning measurements) can be included and be propagated to predictive uncertainty. The impact of available information on prediction accuracy was investigated. It has been shown that through integration of new data by the Bayesian updating method, process variability (i.e., batch‐to‐batch) could be considered. The implementation was realized using a Markov chain Monte Carlo method.


| INTRODUCTION
In bioprocessing, mathematical modeling, statistical data analysis, and IT-supported tools have become important instruments within the framework of process design, optimization, and control. They are also part of the process analytical technology (PAT) regulatory initiative for building in quality to pharmaceutical manufacturing, defined by the United States Food and Drug Administration. PAT methods are playing an important role, for example, in cell culture upstream processes for the production of biopharmaceuticals (Glassey et al., 2011). While optimization of the production scale has been in the focus for a long time, it turned out, that the cost-and time-intensive cell proliferation process (the so-called seed train) also has an impact on the success rate in production (Brunner, Fricke, Kroll, & Herwig, 2017). There are various factors that influence the seed train (Le et al., 2012). Examples are selection of vessel and filling volumes of the seed train scales, differences in bioprocess engineering parameters between scales, inoculation cell densities, ratio of fresh medium to passaged medium, substrate and metabolite concentrations, point in time for cell passaging and corresponding viable cell density, apparent growth rate, and viability.
To maintain cell growth and product formation attributes within the seed train, monitoring and optimization strategies are required (Frahm, 2014). Temporal or longer lasting changes in cell behavior can occur, so that the seed train protocol has to be adapted. Also, for new cell lines or new products or the transfer of the process to another production plant, seed train protocols also have to be developed or adapted, keeping in mind the reduction of time and costs during development of those protocols. Another application is to support the selection of the optimal clone for a new process and the development of a suitable seed train protocol.
Model building of dynamic bioprocesses such as cell culture seed trains faces a lot of challenges due to different factors, like limited amount of high-quality experimental data (measurement uncertainty, offline data and large time steps between measurements, etc.), process nonlinearity and the necessity of various model parameters characterizing the bioprocess. As already described in Liu and Gunawan (2017), these factors lead to significant uncertainty in the process model. Furthermore, prediction performance depends on the accuracy of the model, the variability of the biological process, and identifiability of model parameters. Nonidentifiability arises if many different combinations of model parameters can explain the experimental data equally well. The reasons could be that the model contains too many parameters (overparameterization) leading to the problem that noise or random variations in the training data is interpreted and learned as concepts.
Often, estimation methods identifying only one set of model parameters based on the available dataset, like the Nelder-Mead simplex algorithm (Press, 1996), are applied. This type of optimization algorithm is a "best fit" estimator, a point estimator, meaning that only one value for each model parameter is identified, leading to one predicted value of the quantity of interest at each time step. No information about the output uncertainty is given this way, and most of these types of optimization algorithms could get stuck in a local minimum. Nevertheless, these methods have turned out to be useful tools, sometimes resulting in fast solutions and they are already implemented in functions (e.g., in Matlab or R), which are easy to apply. They can be combined with statistical methods like Monte Carlo (MC) simulation, sensitivity, uncertainty, and/or identifiability analysis, in order simulate output uncertainty and to gain more information of the process (Hines, 2015;Price, Nordblad, Woodley, & Huusom, 2013;Raue et al., 2009;Sin et al., 2009).
However, in many cases, there are only few data available for model building and parameter estimation (e.g., when planing a new production), but frequently there is some knowledge about the organism or process from literature or expert knowledge. It is desirable to quantify this information and include it in the model building process. Within a Bayesian context, this kind of knowledge is expressed by probability statements and it is combined with the available data, leading to a whole set of probable model parameter values for each model parameter. This procedure is also called Bayesian parameter estimation and the numerical implementation can be performed through a Markov chain Monte Carlo (MCMC) procedure. There are many ways of numerically implementing this method and some examples in the field of biochemical engineering can be found in Galagali and Marzouk (2015); Liu and Gunawan (2017); Vrugt (2016); and Xing, Bishop, Leister, and Li (2010).
Another similar technique to simulate process dynamics under uncertainty are Gaussian processes, but they were not subject to investigation in this work.
In this work, a Bayesian approach, facing the above-mentioned challenges in model building and parameter estimation dealing with uncertainties and eventually lack of data, is applied to seed train prediction of an industrial Chinese hamster ovarian (CHO) cell culture process. It is shown in which way sources of uncertainty as well as prior knowledge (from experts or literature) can be considered leading to predictions including inference uncertainty. Bayesian parameter estimation also provides a framework for detection of nonidentifiabilites but as this is not the focus of this work, we refer to Raue, Kreutz, Theis, and Timmer (2013). Numerical implementation of the Bayesian approach was carried out via an MCMC procedure using an adaptive single component metropolis algorithm.
These types of models have gained renewed attention because they can be considered as a structured representation of the available process knowledge (Glassey et al., 2011;Kroll, Hofer, Stelzer, & Herwig, 2017;Möller & Pörtner, 2017;Sanderson, Phillips, & Barford, 1996). They have been used for development of predictive control strategies, for example, to ensure high batch-to-batch reproducibility in animal suspension cell cultures (Aehle et al., 2012) and for the design of cell culture fed-batch control (Frahm et al., 2002). HERNÁNDEZ RODRÍGUEZ ET AL.

| 2945
The industrial CHO cell culture process is used to illustrate in which way inference uncertainty can be derived and with which accuracy individual seed train scales and the whole seed train can be predicted, depending on the available information.

| Investigated suspension cell culture process
In this contribution, the subject of investigation is an industrial CHO cell culture process containing a seed train comprising five shake flask scales and three bioreactor scales as well as the production scale, whereby the focus lies on the the bioreactor part of the seed train, which is composed of bioreactor 1 (N-3, 40 L), bioreactor 2 (N-2, 320 L) and bioreactor 3 (N-1, 2,160 L). From experimental data (offline measurements), taken once a day, time profiles for viable cell density were divided into 10 seed trains for training and 10 seed trains for testing, choosing randomly one or two cultivations for training and one or two for testing per campaign. Additional datasets have been generated for modeling purposes. Therefore, 12 cultivations in four flask scales (three cultivations each) having filling volumes of 40, 70, 300, and 1,500 ml were provided. They cover cultivation time spans of 264 hr (11 days) each, meaning that the stationary and death phases were also included. All datasets are labeled and listed in Table 1 to assign them correctly in this work.

| Cultivation conditions and analytics
Cell cultivation was carried out using a CHO cell line for the production of a therapeutic recombinant protein (cell line and product are not further specified due to confidential reasons). Process conditions, which were the same for all investigated seed train cultivations, are listed in Table 1. Samples were taken once a day. Viable cell concentration and viability were measured using the Vi-CELL cell viability analyzer from Beckman Coulter. Glucose, glutamine, lactate, and ammonia were determined by a Nova Bioprofile 100+ Analyzer.

| Data cleansing/preparation
Data cleansing and preparation was performed by handling missing data. Within the parameter estimation process initial concentration values are required for solving the ordinary differential equation system (the model), but in some cultivation datasets, there are one or two missing initial concentrations.
Concerning datasets of 20 seed trains, with three bioreactor scales and six state variables each, 16% of the initial concentrations are missing in total (viable cell concentration 0%, viability 0%, glucose 0%, glutamine 23%, lactate 72%, and ammonia 0%). If initial concentrations are missing at the beginning of bioreactor scale 1, the relevant quantity is replaced by the mean of initial concentration values of training datasets. This decision is based on the fact that the same cultivation conditions are intended for each cultivation. If initial concentrations of bioreactor scale 2 or 3 are missing, then they are calculated based on the concentrations at the end of the previous scales and the volumes of the previous and the current scale.
T A B L E 1 List of available data, some used for training, and other for testing, containing the following abbreviations: Systems (SF, shake flask; BR, bioreactor; ST, seed train), initial filling volumes (Volume) and cultivation labels (Label). Since process data from 6 campaigns were considered, they were labeled by C1 (campaign 1) to C6 (campaign 6)

System
Volume (L) Cultivation labels Controlled process parameters Usage The specific growth rate includes a term for lag phase description where t Lag stands for the duration of the lag phase. a 0, 1 Lag ∈ [ ] describes by which percentage growth rate is decreased in the beginning of the lag phase and for t t Lag > . The specific death rate contains constant minimum and maximum death rates as well as dependencies on glucose and glutamine concentration (similar to Frahm, 2014). Substrate uptake rates are expressed similar to substrate uptake rates presented in Frahm (2014) and Kern et al. (2016), describing a high glucose uptake at high glucose concentrations and low glucose uptake at low glucose concentrations and analogously for glutamine. Also metabolic production rates are expressed similar to substrate uptake rates presented in Frahm (2014) The ode23 function of MATLAB version 2017b (Matlab, 2017) was used for numerical computation.

| Bayesian parameter estimation and inference
The goal of Bayesian parameter estimation is to compute a maximum a posteriori point (MAP) estimate of each unknown model parameter (e.g., maximum growth rate max μ ) as well as the corresponding probability distribution (posterior distribution), describing how probable it is that certain parameter values are adopted, based on the measured data and prior knowledge. These estimates and distributions can then be used for prediction of new observations (e.g., viable cell density X v ; Gelman et al., 2013, chap. 1). Bayesian parameter estimation and prediction can be divided into the following main tasks, which will be explained afterward: • Step 1: Quantification of prior knowledge including uncertainties More details on different types of prior distributions are given in the Supporting Information Material.

| Bayesian parameter estimation/ determination of posterior distributions
In a second step, Bayesian parameter estimation using prior probabilities and experimental data has to be performed, obtaining posterior parameter distributions. The key element is the Bayes theorem, which is a theorem for the computation of conditional probabilities. Since in practice the applied mathematical models are complex and high dimensional, the calculation of the posteriori parameter distributions turns out to be a nontrivial tasks. But numerical solutions can be computed by application of MCMC methods. The concept of MCMC F I G U R E 1 Propagation of uncertainty. Uncertainty in model parameters and uncertainty resulting from measurement deviations are considered and a Bayesian approach, having the Bayes theorem as a key element, was applied to propagate these uncertainties, to estimate model parameters, and to include the information of uncertainty in the prediction of the interesting quantities in form of prediction intervals forming a prediction band [Color figure can be viewed at wileyonlinelibrary.com] simulation is to create a random process whose stationary distribution is the specified target distribution and to run the simulation long enough that the distribution of the current draws is close enough to this stationary distribution (Gelman et al., 2013, chap. 11). Different types of algorithms realizing this principle exist, whereby the single component Metropolis-Hastings algorithm was applied in this work (Gilks et al., 1998), which has the posterior samples of the model parameters as its output, which represent the posterior distributions of the model parameters to be estimated. The sample size should be chosen large enough so that the Markov Chain (MC) standard error is less than 5% (more details on convergence diagnostics are given in the Supporting Information Material). For example, they can be computed using quantiles of the posterior predictive distributions at different points in time, leading to prediction bands over the considered time span (see Figure 1).

| Bayesian updating
As a fourth step, (if additional data is provided), Bayesian updating can be executed, which is an important characteristic of Bayesian statistics. It is the ability to learn from new data through adding information to the present knowledge and thus, to update the current state of information.
This is realized by repetition of Steps 2 and 3 using the current posterior distributions as new prior distributions and executing MCMC simulation to obtain new posterior parameter distributions.
Simply spoken Bayesian updating is performed by "taking the posterior from today as prior from tomorrow." This is described in literature by terms like Bayesian updating, Bayesian learning, or the sequential nature of Bayes (Luce, Anthony, & Dennis, 2003;O'Hagan, 2008).
More detailed information, such as formulas and implementation of the adaptive single based Metropolis-Hastings algorithm, is provided as Supporting Information Material.

| Evaluation
To evaluate the prediction performance regarding accuracy and precision, three criteria are presented in this work. The first criterion quantifies the amount of predictive uncertainty, based on the available information and is expressed in this work by the relative half bandwidth of the prediction interval at a specified point in time.
Supposing that y pred is the posterior predictive sample and q 0.975 its 97.5% quantile and q 0.025 its 0.25% quantile, then: with k number of measurements.
All criteria can be computed for one or more quantities.
Furthermore, the coefficient of variation is used for the quantification of uncertainty. The coefficient of variation of a sample y is calculated by

| RESULTS AND DISCUSSION
This section shows how an industrial cell culture seed train, that is described by a mechanistic model, is predicted (simulated) using Bayesian parameter estimation. The prediction is complemented by corresponding prediction intervals describing the expected deviation from the predicted values based on the available information.
Furthermore, the performance of the predictions will be analyzed concerning prediction precision and accuracy. Moreover, it is investigated, how taking additional data into account improves the prediction.

| 2949
First, it is demonstrated in Section 3. a Lag and the duration of lag phase t Lag , were kept as fixed parameter values, while max μ , K S,Glc , and K S,Gln were set as "free" parameters.
Concerning death rate, the minimum death rate d,min μ were kept fix, while the maximum death rate d,max μ were kept "free". Moreover the cell lysis constant K Lys were kept fix, based on Kern et al. (2016).
Concerning production and uptake of lactate and ammonia the parameters q Lac uptake , and q Amm uptake , were kept fix because they describe lactate and ammonia uptake at the end of the death rate, which is not relevant during the process. Because they would increase the complexity of parameter estimation, we decided to estimate them once from training data and keep them fix later on whereas k Amm was set "free" as well as kinetic production constants In a next step these quantities were used to adopt a gamma distribution for each free model parameter following the methods explained in Section 2.5, meaning that the distribution were calculated according to Equation (2 Table 3A.

| Prior knowledge of starting concentrations
The starting concentration values (of the modeled time courses, e.g., initial viable cell density X v,0 , glucose concentration c , Glc,0 …) are also set as random variables because it is assumed that the measurement errors have a significant impact on prediction performance. The prior distributions of the measurement error were derived from trend chart data of Vi-Cell in case of viable cell density (sample size N = 236, gathered from two instruments) and Nova BioProfile 100+ analyzers for glucose, glutamine, lactate, and ammonia concentrations (sample size N = 1,065, gathered from three instruments) using resampling techniques (bootstrapping).
The corresponding means and standard deviations of coefficients of variation are listed in Table 3B.
In case of viable cell density X v , the coefficient of variation (cv) from Table 3B  3.2 | Bayesian parameter estimation/ determination of posterior distributions and Bayesian updating The starting parameter values were also sampled randomly from these prior distributions.  Table 3A.
The corresponding distributions are illustrated in Figure  It can be seen that the steps from prior 1 to posterior 1 and from prior 2 (= posterior 1) to posterior 2 lead to more narrow distributions → less uncertainty, more precision) in case of the maximum growth rate max μ . In case of the other parameters In each MCMC run, convergence diagnostic was applied using visual methods such as history plots and autocorrelation plots.
Furthermore, the MC error was controlled and an MC error less than 5% was satisfied in each run, which indicates convergence. This procedure was applied during every parameter estimation (update step) via MCMC.

| Prediction of a single seed train bioreactor scale based on available information
Following the above presented procedure, a comprehensive study concerning prediction accuracy and prediction precision based on available information from cultivation training data is presented.
For the prediction of one cultivation scale at a time (reactor scale 1 (N-3) = 40 L, 2 (N-2) = 320 L, or 3 (N-1) = 2,160 L), the initial concentrations of a scale are known and the following two sources of T A B L E 3 A) Prior knowledge of model parameters expressed by prior means and coefficients of variation (cv) in % as well as posterior knowledge expressed by posterior means (based on eight specific experiments performed for modeling; after parameter estimation) and posterior coefficients of variation (cv) in %.  presented. The labels of the cultivation data used for training and testing correspond to the labels listed in Table 1.
It should be mentioned that evaluation scores, half bandwidth, within band score and rel. error, were first calculated comparing one test dataset at a time (e.g., concerning viable cell density often only four measurements were compared with the predictions at the same points in time, meaning if one measurement falls outside the prediction band, the within band score is reduced to 75%). Afterwards, the average over 10 cultivations was calculated.
3.3.1 | Example: Prediction based on shake flask data versus prediction based on shake flask data and one bioreactor scale As an example, Figure 3 shows the predicted temporal courses F I G U R E 3 Predicted time courses of six state variables, viable and total cell concentration X v and X t , concentration of glucose c Glc , glutamine c Gln , lactate c Lac , and ammonia c Amm as well as performance measures of prediction, for the smallest seed train bioreactor scale (40 L filling volume). As measures of accuracy the within band score (percentage of test data falling within prediction band) and the rel. error (the relative deviation between predicted values and subsequently added test data) are presented. The amount of uncertainty (only presented by numbers on the right) is expressed by the relative half bandwidth, that is, half width of prediction interval of viable cell density at the last time point of each scale, describing how many deviation from the predicted value is expected. The prediction was performed given data from shake flask scales (six diagrams above) and from data of another cultivation from the same campaign in the same bioreactor scale (six diagrams below) HERNÁNDEZ RODRÍGUEZ ET AL.

| 2953
scale data (SF1.1-SF4.3). This means that the posterior 1 distributions of the model parameters (compare with Figure 2) were used for prediction. The six diagrams in Figure 3 (below) show the prediction based on the updated information, based on shake flask data and another cultivation from the same campaign in the same bioreactor scale (here R1.3, 40 L). This means that the posterior 2 distributions (compare with Figure 2) were used for prediction. In both cases the starting concentration values were varied according to the coefficients of variation, presented in Table 3.
It can be seen comparing the six diagrams above and below in Figure 3, that especially for viable cell density X v as well as for total cell density X t the amount of uncertainty, represented by the width of the prediction band, is reduced significantly meaning that the precision is increased.

| Prediction performance of a single bioreactor scale
The impact of available information on prediction performance of three single bioreactor scales (N-3 : 40 L, N-2 : 320 L, and N-1 : 2,160 L) was investigated for 10 seed trains comprising these bioreactor scales. The labels used in this section are listed in Table 1.
Prediction of a single bioreactor based on information from shake flask data, which was expressed by the corresponding model parameter distributions (compare with posterior 1 in Figure 2), lead to the results presented in the first column (SF) of Table 4 To investigate the impact of information from bioreactor scales of other seed trains, 10 seed trains ST1,…, ST10 were used as training data and seed trains ST11, … , ST20 as test data, whereby only one out of 10 seed train bioreactor scales was considered at a time  Table 4.
It should be noted that no scale up parameters were considered within the underlying model, although differences in cell growth at different bioreactor scales were not excluded. In addition, process variability due to biological variability ("batch-to-batch variability") was expected. But such differences or variabilities would be expressed by corresponding changes in model parameter distributions (e.g., by the increase or decrease of the average maximum growth rate), as soon as respective data would be included for updating parameter distributions. These aspects will be discussed later on based on the presented findings.
Several aspects concerning propagation of uncertainty as well as prediction accuracy become apparent from the results presented in Table 4. Prediction of single bioreactor scales, only based on shake flask scale data was possible showing relative errors not exceeding 15% (for X v ) and 10% (in total), concerning predictions based on the Bayes estimator (MAP estimator, see Section 2.5). At least 88% (in case of X v ) and 91% (in total) of the test data are falling within the 90% prediction band. Nevertheless, predictions include between 34% and 44% of uncertainty (represented by the relative half bandwidth).
The inclusion of information from one bioreactor scale of another seed train bioreactor of the same campaign led to a reduction of predictive uncertainty (=increased precision) to 22-29% relative half bandwidth.
In terms of prediction accuracy, what stands out most is that predicting a bioreactor scale 1, a significantly higher accuracy was reached if another bioreactor scale 1 dataset was used for training  (93) Rel. error for X v (in total; %) BR 1 15 (10) 7 (8) 14 (10) 13 (10) -BR 2 7 (9) 5 (8) 7 (8) 6 (8) 5 (8) BR 3 8 (9) 11 (10) 9 (9) 8 (9) 9 (8) Note: As a measure for precision the relative half bandwidth for viable cell density before transfer was computed (low percentage, less uncertainty and therefore high precision) and as measures of accuracy the within band score and the rel. error were computed and presented in %, both for X v and in total, meaning averaged over all six variables (viable and total cell concentration X v and X t , concentration of glucose c Glc , glutamine c Gln , lactate c Lac , and ammonia c Amm ). The following scales were used for training: Shake flasks (SF), shake flasks and another bioreactor scale (BR 1, BR 2 or BR 3), the previous bioreactor scale of the same cultivation (Previous scale).
(see row 7 [BR 1] in Table 4). This indicates that there are sometimes small effects when cells are passaged from shaken conditions to stirred conditions (here this happens between shake flak scales and bioreactor scale 1). Prediction of a bioreactor scale 2 instead shows a high accuracy (rel. error: 5-8%, within band score 88-95%) independently of which bioreactor scale was used for training or even if information from shake flaks scales was used. Prediction of a bioreactors scale 3 turned out to perform best if another bioreactor scale 3 or even shake flask scales were used for training but also good results were reached if another bioreactor scale 2 was considered. A brief posterior analysis (after analyzing prediction performance) revealed a lower cell growth on average in reactor scale 1 compared with reactor scale 3, which was expressed by corresponding probability distributions.
This predictive performance has been further improved by using the information from the previous scale of the running cultivation to update the posterior distributions of model parameters one more time (see Table 4, last column). Predicting bioreactor scale 2, 100% (in case of X v ) and 90% (in total) of the test data are falling within the 90% prediction band which was reduced to 21% relative half bandwidth and the relative error states 5% (for X v ) and 8% in total.
Predicting bioreactor scale 3, 88% (in case of X v ) and 93% (in total) of the test data are falling within the 90% prediction band which was reduced to 23% relative half bandwidth and the relative error states 9% (for X v ) and 8% in total. These results reveal that batch-to-batch variability can be considered by adaption of model parameter distributions through Bayesian updating. It hast to be mentioned that for each Bayesian update, only 4-5 measurements per quantity of a training dataset were used as additional information. It is expected that by adding more process data describing similar cell growth, the amount of predictive uncertainty decreases further. On the other hand, less measurement uncertainty would also lead to less uncertainty in the models outcome, because input uncertainty is propagated to uncertainty in the outcomes.

| Prediction of seed trains
In the previous sections it has been shown how single bioreactors could be predicted and how these predictions could be updated integrating information from additional data via Bayesian updating. Now, the complete bioreactor part of the seed train, comprising three consecutive bioreactor scales (40, 320, and 2,160 L) before the production bioreactor, is predicted. It should be noted, that in addition to the already considered sources of uncertainty (in model parameters and initial concentrations) uncertainty in the passaging process, which can be caused by different reasons like unknown volume when flushing the sampling valve or deviation of actual substrate concentration in the medium from the intended value (e.g., in case of glutamine in media), must be considered for the prediction of more than one seed train scale. This uncertainty was estimated evaluating the passaging processes of four seed trains (used as training data). In a first step, an exemplary seed train prediction, only based on small shake flask scale data will be illustrated. Afterwards, prediction performance is evaluated, taking further data from bioreactor scales into account.

| Seed train prediction based on shake flask data-Example and performance
A seed train prediction, only based on small scale shake flask data and considering the above-mentioned sources of uncertainty is illustrated as an example in Figure 4

| 2955
The optimized prediction of an exemplary seed train is also illustrated in Figure 4. Time profiles for all six state variables, viable and total cell concentration, X v and X t , concentration of glucose c Glc , glutamine c Gln , lactate c Lac , and ammonia c Amm , were predicted at the beginning of the seed train (top right), after collecting data from scale 1 (bottom left), and after collecting data from scale 2 (bottom right).
Here again, predictive time profiles are shown by solid lines, and 90% prediction bands are illustrated by dashed lines. The corresponding precision and accuracy values are shown in Table 5, row 3 (ST13).
It can be seen that the prediction uncertainty for the remaining "future" time span is reduced (from 42% to 24% half bandwidth, see Table 5, row 3) after each update step indicated by narrow prediction bands (in the Figure, the "past" is shown in light gray, the "future" in the dark gray). Also notable is the fact that the high accuracy (within band score 96% in total [i.e., concerning all six variables] and 92% for X v [i.e., concerning only viable cell density]) and rel. error of 10% in total and 7% for X v , achieved for the prediction of bioreactor scale 1, 2, and 3 has been further improved by updating after cultivation of each scale.
After two updating steps the within band score yielded 100% for X v and 96% in total and the rel. error yielded 1% for X v and 5% in total (see also Following the same procedure, 10 seed trains (from six different campaigns) were predicted, each based on one seed train used as training data (e.g., ST11 was predicted based on information F I G U R E 4 Prediction of an exemplary seed train, that is, three consecutive bioreactor scales of 40, 320, and 2,160 L for the six state variables, viable and total cell concentration X v and X t , concentration of glucose c Glc , glutamine c Gln , lactate c Lac , and ammonia c Amm , for four scenarios: Only based on initial concentrations at reactor scale 1 (40 L) and posterior parameter distributions from shake flask scales (top left); based on initial concentrations at reactor scale 1 and posterior parameter distributions from another seed train (top right); based on initial concentrations at reactor scale 2 or 3 and including parameter distributions from the previous reactor scale (bottom left, bottom right) from ST1 and so on) and updated as soon as data from one scale of the current cultivation were available. The corresponding results concerning prediction performance are presented in Table 5.
Application of Bayesian updating led to significant narrowing of the prediction, meaning a reduction of uncertainty, while reaching or maintaining a high prediction accuracy. The amount of uncertainty concerning relative half bandwidth was reduced from 41% to 31% on average after a Bayesian update step using process data of the previous scale 1 for prediction of scale 2 (see last row "Mean" of Table 5). This uncertainty was further reduced to 21% on average after a Bayesian update step using process data of the previous scale 2. This improvement of precision was achieved without a loss of accuracy, because at least 90% (for X v ) and 87% (in total) of the test data were falling within the prediction band, while the rel. error has been even decreased from 13% to 8% (for X v ) and from 15% to 9% (in total) on average over 10 seed trains.
Nevertheless, the results show that not all seed trains could be predicted well only based on the information (updated parameter distribution) from another randomly sampled seed train as this is the case for seed train 17 (ST17) which was predicted based on information of ST7 (see row 4, ST17, first column of within band score and first column of relative error in Table 5

| CONCLUSION
In this contribution, the application of a Bayesian approach for parameter estimation and prediction of an industrial cell culture seed train, enabling the integration of prior knowledge and the consideration of uncertainty, is presented. Subject of investigations is the bioreactor part of an industrial CHO cell culture seed train Note: Within band score and rel. error only for X v and in total (viable and total cell concentration X v , X t , concentration of glucose c Glc , glutamine c Gln , lactate c Lac , and ammonia c Amm ); relative half bandwidth of prediction interval at last point in time for X v . I) Predictions based on initial concentrations at R1 and on posterior parameter distributions from another R1 cultivation of the same campaign. II) Prediction after running R1 and update using initial concentrations of R2 and posterior parameter distributions of R1 for prediction of R2. III) Prediction after running R2 and update using initial concentrations of R3 and posterior parameter distributions from previous scales.
comprising three consecutive bioreactor scales (40, 320, and 2,160 L filling volume) under consideration of shake flask experiments under equal cultivation conditions.
It has been shown that Bayesian parameter estimation, performed using MCMC simulations, in combination with a mechanistic model, describing the time profiles of viable and total cell density as well as concentrations of glucose, glutamine, lactate, and ammonia, is a suitable statistical method for seed train prediction. It provides the capability of propagating information content (including input uncertainty) provided by prior knowledge and experimental data to prediction uncertainty, expressed by predictions intervals. This way, process relevant decisions can be made based on probabilities of certain events. It should be noted that the same mechanistic model was applied for all scales, from shake flask scales to large bioreactor scales (up to 2,160 L filling volume). It became apparent that despite batch-to-batch variability (e.g., due to biological variability) a high predictive accuracy can be reached, by taking data of the running seed train cultivation into account performing Bayesian updating.
This approach provides various practical advantages concerning applications within the field of bioprocessing. One potential advantage is the capability of the design of robust and optimal seed train protocols, saving experimental work by using prior knowledge (which is currently subject of investigation