Team:Heidelberg/Model/RNA Combinatorics


Curve fitting





Growth curve model

Background

Bacterial growth

The basic model of bacterial growth is a 4 phase model often shown as a growth curve. A distinction is made between a lag phase, log phase or exponential phase, stationary phase, and death phase.
The lag phase describes the time period when bacteria adapt to growing conditions without cell division. During the log phase bacteria begin to divide with a constant doubling rate from which growth rate characteristic for the bacterial strain can be calculated. The stationary phase is reached when growth-limiting conditions such as lack of nutrition or space occur. If there is a lack of nutritions or other harmful conditions bacteria might enter death phase. [1]. This may sound very trivial at first, but it is the basis for working with bacteria. The findings of this model are widely used, whether it is for the preparation of competent cells or natural transformation efficiency in the exponential phase.

For our project, the growth behavior of our bacterial strains Escherichia coli DH5ɑ, Escherichia coli BL21, Bacillus subtilis DSM 10, Bacillus subtilis 168 and Acinetobacter baylyi ADP1 plays an essential role in several of our experiments such as co-culture or selective advantage. As it is hard to imitate the gut microbial environment under laboratory conditions, we aim to model these complex relations as realistic as possible. Therefore, a good model of bacterial growth behaviour is required.

Curveball and its models

The growth curves were calculated using the curveball package (https://curveball.yoavram.com). To make the process easier, we decided to use preprocessed plate reader data. The fit_model() function in curveball.models needs a data frame with the time and optical density (OD).

When no specific model is selected, the fit_model() function analyses the data using every available model. The models implemented in curveball (https://curveball.yoavram.com/models) are:

The logistic model

The logistic model is the most basic model. The ordinary differential equation (ODE) that describes the model is:
d N d t = r N ( 1 N K ) N ( t ) = K 1 ( 1 K N 0 ) e r t
N: population size
N0: initial population size
r: initial per capita growth rate
K: maximum population size

The Richards model

The Richards model expands the logistic model. It includes the curvature parameter v. The formula is:
d N d t = r N ( 1 ( N K ) ν ) N ( t ) = K [ 1 ( 1 ( K N 0 ) ν ) e r ν t ] 1 / ν
y0: initial population size
r: initial per capita growth rate
K: maximum population size
v: curvature of the logistic term

This means that the Richards model includes the logistic model. If v = 1, both are the same.

The Baranyi-Roberts model

This model is again an expansion of the previous model. Here, a lag phase is taken into account. A lag phase is the result of the population adapting to the new medium it is placed into. In this phase, the growth is slower than usual. The formula is:
d N d t = r α ( t ) N ( 1 ( N K ) ν ) N ( t ) = K [ 1 ( 1 ( K N 0 ) ν ) e r ν A ( t ) ] 1 / ν A ( t ) = 0 t α ( s ) d s = 0 t q 0 q 0 + e v s d s = t + 1 v log ( e v t + q 0 1 + q 0 )
Here, α(t) is the function that the Baranyi-Roberts model uses to expand the Richard model. The formula is:
α ( t ) = q 0 q 0 + e v t
N0: initial population size
r: initial per capita growth rate
K: maximum population size
v: curvature of the logistic term
q0: initial adjustment to current environment
v: adjustment rate

As the logistics model is a part of the Baranyi-Roberts model, curveball has also implemented a logistic model lag phases.

The models are evaluated using the Bayesian Information Criteria (BIC). The model with the lowest BIC value should be chosen for further analysis. Both an increasing error and an increasing number of parameters increases the BIC value. By that, BIC tries to balance the accuracy and the number of parameters to prevent overfitting.






Our function???????????????????????????????????





Results

In order to modulate growth curves, one needs experimentally obtained data. In our case, we took the data from our co-culture experiments to fit a model to the growth curves. We took the data from four bacterial strains grown together in co-culture. We selected four experiments with slightly different growth behaviors (Fig. 1). First, we chose a growth curve of E. coli RFP measured at 635 nm in a co-culture with Bacillus subtilis GFP. The growth curves from this experiment grow in sigmoidal form (Fig. 1A). Moreover, we used the data from a co-culture from A. baylyi mCherry in a co-culture with E. coli GFP (Fig. 1B). All curves in figure 1B are increasing steadily, even though they do not reach the uniformity of figure 1 A. We selected Bacillus subtilis GFP against E. coli RFP (Fig. 1 C) and A. baylyi mCherry (Fig. 1 D) as examples of less uniform data. The growth curves in these last two experiments do not increase steadily, but rather fluctuate.

Figure 1: Growth curves used for testing the modeling. These growth curves from bacteria mono- and co-cultures represent the wide variety of potentially observed growth curves in biological experiments. (A) E. coli with RFP and Bacillus subtilis GFP co-culture, measured at 635 nm with gain set to 60. The co-cultures and the E. coli monocultures grow in a sigmoidal curve. (B) A. baylyi mCherry and E. coli GFP co-culture, measured at 635 nm with gain set to 60. These growth curves show the exponential and steady state of the bacteria growth, but with more variety than in (A). (C) Bacillus subtilis GFP and E. coli RFP co-culture, measured at 535 nm with gain set to 70. (D) Bacillus subtilis GFP and A. baylyi mCherry co-culture, measured at 535 nm with gain set to 70. In (C+D) all growth curves in both experiments first decrease before the typical bacterial growth is observed.

The growth curves from these experiments are used for the modeling. The python function “growth_model()” uses two dimensional data and the python package curveball with its models. In curveball six models are implemented and can be compared by their BIC. The lower the BIC is, the better the modelling works.
To start our analysis we used the growth curve of the Bacillus subtilis GFP monoculture shown in figure 1 A. We applied all six curveball models to determine the best fitting (Fig. 2). The best model with a BIC of 838.384 is the BaranyiRoberts model (Fig. 2 A) and the worst the Logistic model (BIC: 945.906) (Fig. 2 F). With increasing BIC the overlap of experimental data and model decreased.

Figure 2: Modeling of E. coli monoculture growth curve with different models. Our function contains six different models. All models have been applied on Bacillus subtilis GFP monoculture data and a difference in quality was observed. The BIC value describes the quality of a model while a lower number represents a higher quality. (A) BaranyiRoberts (B) RichardsLag1 (C) Richards (D) LogisticLag2 (E) LogisticLag1 (F) Logistic.

In general, it only makes sense to model data that actually can be observed and meaningless data resulting from unmeasurable conditions should not be modelled. In our used data, the biological meaningless data includes the growth of GFP expressing bacteria, when only the emission at 635 nm was measured while GFP emission is only detectable at 535 nm. However, for the sake of trying and observing what can happen, we modelled the growth of E. coli with GFP from figure 1 B. The resulting model can be seen in figure 3. For E. coli with GFP a Logistic model was chosen as the best, however even this "best" model does not describe the observed behavior of this bacteria. This is most likely due to the lack of emission at 635 nm, and therefore we advise a pre-sorting of data before it is modelled.

Figure 3: Example of how modelled growth curves should not look like. This growth curve was modelled on the emission data at 635 nm while the used bacteria, E. coli with GFP, does not emit at this wavelength. Even the best modelling that is shown here, does not represent the observed growth.

But unsuitable data is not the only problem one can encounter during modelling. In all functions shown in figure 1 C+D a decrease in emission can be seen in the first five hours. We tested the magnitude this decrease has on the modelling by either including these measurements (Fig. 4 A) or excluding them (Fig. 4 B) in the modelling of a B. subtilis GFP monoculture (from Fig. 1 C). The exclusion results in a ten times lower BIC than with them. We therefore advise users to pre-process their data and cut out death phases before the growth that should be measured.

Figure 4: Selection of data can improve modelling quality. Both growth curves were modelled on a B. subtilis with GFP monoculture. (A) Between 0 to 5 h the bacterial growth seems to decrease. After 5 h an increase is measured. This results in some problems for the modelling. (B) By leaving out the first five hours, a much better model is achieved. The BIC is now 23.375 instead of 235.096.

On all the growth curves, no matter if monoculture or co-culture, with biological relevance we applied the six curveball models. The BIC were compared to determine the best and worst models. The B. subtilis GFP monoculture in figure 1 C+D is based on the same data, resulting in a total of 15 modelling trials. The counts for each model (best and worst) are shown in table 1. In eight of the 15 growth curves the RichardLag1 model had the lowest BIC, followed by BaranyiRoberts with 5 counts, but when the top 3 models were compared, these two were not always present for all growth curves. However, for the determination of the worst model it was far easier. In two thirds of the cases the Logistic model was the worst and in the remaining five Logistic was one of the last three.

Table 1: Best and worst models for modelling growth curves. For each of the 15 trials one count goes to the best and one to the worst model fitting our data. This table shows the counts for all specific models. Only data that is meaningful in the biological context was included.

Model Best model Worst model
BaranyiRoberts 5 0
RichardsLag1 8 1
Richards 0 3
Logistic 0 10
LogisticLag1 1 1
LogisticLag2 1 0

Discussion

To sum it up, the best model is difficult to define because the data are varying strongly. The models RicharLag1and BaranyiRoberts were working well for the many curves, but not for all. The model which was working the worst for all growth cures was Logistic.

As advice for users for the handling of this application it is important to mention that the data should be chosen wisely and preprocessed if necessary. The code is working the most efficiently if only the standard phases like initial phase (lag), exponential (log) and stationary phase are included in the measured curve used for fitting.

Our modelling approach with fitting growth curves was just the tip of the iceberg. Growth curve modelling could be used to focus on interactions between strains. In our co-culture experiments we showed in detail the growth of two bacterial strains next to each other. For further evaluations of three or more bacterial strains in one culture the fast and efficient calculation of a growth curve helps to detect interactions that lead to growth advantages or disadvantages between the bacterial strains.

Additionally, we performed selective advantage experiments in the wetlab. Due to a low number of taken samples at different time points the experiments we performed are not optimal for evaluation by modelling. In a plate reader assay we could measure at more time points, for example in an interval of 30 minutes, as it was done with the data used in the modelling here. The effect of a selection solution on the growth of bacteria can be manifested on specific parameters such as reaching the stationary phase defining the population size or a relative growth rate. Therefore this topic is interesting for the modelling as well.

References

[1] Zwietering, M. H., Jongenburger, I., Rombouts, F. M., & van 't Riet, K. (1990). Modeling of the bacterial growth curve. Applied and environmental microbiology, 56(6), 1875–1881. https://doi.org/10.1128/aem.56.6.1875-1881.1990