Updated on 2021-11-18: You might found this file, which we prepared for the Judging Session, helpful to understand our model.
#1. Key Problems
Our T7 expression model enables host bacterial enzyme production with 3 basic parts: genes encoding T7 phage polymerase, inhibitor protein gp, and target Bst (GFP in our lab experiments) with T7P promoter.
Gp2 and gp5.7 are two inhibitors of host bacteria genes by binding specifically to transcription factors σ70 and σs. Thus, to purify our target protein, it’s necessary to downregulate the endogenous protein of E. coli. We designed two ways to express Igp5.7 which will inhibit ss, and the first aim of our model is to demonstrate the better one.
At the same time, T7P which can produce our target protein Bst is influenced by the concentration of s70 and ss. If the concentration of the two transcription factors decreases, then we will expect that the concentration of T7P mRNA would decrease along with them and then the T7P concentration. Here we can expect that if the duration of induction is long enough, and then the decrease of endogenous transcription will affect our target protein which can only be expressed by T7P. the second aim of our modeling is to find a proper time to harvest Bst post-induction.
In Brief, Two Main Aims for Our Model:
- High Expression and Purity of Target Protein Bst (written GFP in our model).
- Better Way to Downregulate Endogenous Transcription
#2. Key Parameters and Processes
To accurately describe our reactions and system behavior, our model is based on Kinetic Equations and Modeling. Our model basically involves four parts: 1. Kinetic Constant, 2. Key Variable, 3. Key Process, 4. Key Differential Equation.
# Key Constants in Model
Table 2.1 List of all parameters used and our rationale. Kinetic constants for matching processes are divided into 7 categories: transcription, translation, dissociation, binding(inhibition), degradation, active transportation, and others. Assumptions and references are listed below in 1.1 and 1.2.
# Assumptions
- The materials given are sufficient to conduct all reactions mentioned (peptide, nucleotide, ribosome, etc.)
- Quasi-steady state. We assumed that the material used for production is sufficient for our bacteria. Thus, to steady the whole model, long-term induction or expression of proteins must converge to fixed values.
- Translation kinetic constants for all proteins vary in a negligible range and are assumed to be equal in our model. E.g. Kσs = Kσ70 = KP1 [4]
- Transcription kinetic constants for all mRNA with the same promoter vary in negligible range. E.g. K70MP1 = K70Igp2 etc.
- Degradation kinetic constants for all protein/mRNA vary in negligible range. E.g., λ mRNA1 = λ mRNA2, λ Protein1 = λ Protein 2 etc.
- Elements involved in reaction don’t work in quantization. In our model, we assumed a value, however small(e.g. 1.03 * 10^-5 or even smaller), for every variable. This is how during simulation we can find the curve smooth and derivable at every point. Numeric floating points in simulation can also participate in biochemical reactions.
- Bacteria stress in metabolism is an abstract factor often overlooked in model simulation We visualize the stress as the rate of endogenous protein expression. Decrease of the protein to some extend can reflect bacteria is close to stationary phase.
- All inducers (lactose & arabinose) are taken into bacteria through active transport. To simplify simulation, we set this process as linear.
# References of Constants
[1]: Stamatakis M, Mantzaris N V. Comparison of Deterministic and Stochastic Models of the lac Operon Genetic Network[J]. Biophysical Journal, 2009, 96(3):887-906.
[2]: újvári, Andrea, Martin C T. Thermodynamic and Kinetic Measurements of Promoter Binding by T7 RNA Polymerase[J]. Biochemistry, 1996, 35(46):14574-14582.
[3] Lee J, Ramirez W F. Mathematical modeling of induced foreign protein production by recombinant bacteria[J]. Biotechnology and bioengineering, 1992, 39(6): 635-646.
[4] https://2019.igem.org/Team:Fudan-TSI/Model#
# Key Variables in Model
Table 2.2 List of all the key variables in 5 categories: transcription factors, mRNAs, proteins, inducers, complexes. Concentration unit is mmol/mL.
References of Variables
[5] Battesti A, Majdalani N, Gottesman S. The RpoS-mediated general stress response in Escherichia coli[J]. Annual review of microbiology, 2011, 65: 189-213.
[6] Tabib-Salazar A, Liu B, Barker D, et al. T7 phage factor required for managing RpoS in Escherichia coli[J]. Proceedings of the National Academy of Sciences, 2018, 115(23): E5353-E5362.
# Key Processes in Model
To clearly simulate our model, we set the processes below to explain our understanding of the whole reaction. All processes are presented with a kinetic model. Parameters on the left show the Substances Involved in the Reaction. Parameters on the arrow with a “K” are the kinetic constant present the Speed of this reaction. Parameters on the right are the Products of corresponding reaction. Thus, we can briefly introduce our model.
Table 2.3 List of all key processes divided (similar to parameters) into 8 categories.
Figure 2.1 Describes our summary of all the biochemical processes involved, further simplified into 3 categories: state transitions, processes that facilitate transitions, and processes inhibiting transition. All reactions apart from degradation and dissociation are marked accordingly in this network.
# Key Differential Equations in Model
We assumed all process as first-ordered reactions. To simulate the whole processes including transcription, translation, induction, degradation, inhibition, release. We next arranged the processes into differential equations of Time to solve our problem. The left shows the Rate of Concentration of corresponding parameter. The right shows all parameters that influence the rate.
Parameter itself represents its own concentration. To: 1. Choose the best plan for Igp5.7 expression, 2. Seek the best induction time. We here list some main parameters.
Table 2.4 List of all main parameters found in multiple equations.
To simiplifed the presentation on thie wiki page, we only explained the important equations. Below is the link for all equations:
Equations in modeling ver.2021.10.21
#3. Parameter Tuning and Validate
# Simulate with Runge-Kutta Method
Due to the complexity of the whole model, the analytical solution of function cannot be analyzed in elementary function. Floating points are essential in our model as we mentioned. Therefore, Euler's Method as a general simulation method cannot meet the accuracy. To match real curve to the most, we choose Runge-Kutta Method rather than Euler’s Method.
# Parameter Tuning
We assumed that the material in the model is sufficient to produce no matter inter or external products. Thus, to steady the model, the concentration of every parameter must be convergent. Fundamentally the convergence of mRNA is dependent on degradation kinetic constant and transcription kinetic constant while the initial concentration of transcription factors merely influences convergence value. In conclusion, mRNA degradation and transcription are tightly paired during adjustment. Degradation kinetic constants of certain proteins were found in articles [1], and thus transcription, translation, and degradation kinetic constants are determined.
The translation is adjusted the same way as transcription. Although it’s clear that there are several mechanisms to degrade protein due to molecular size, we here assumed that all protein involved shares the same degradation kinetic constant. RpoS (σs) levels are very low and increase as bacteria enter stationary phase. [3] We considered σs in our model. s70 is replaced by σs when bacteria get into stationary phase (Figure 2.1).
The figure below is simulated with ode45 of Matlab. We can find initiation concentration of σ70 gradually goes down and finally convergents to 1.4. At the same time, the concentration of σs transcription factor goes up and reaches a steady phase with a concentration of 1.5. Endogenous protein produced by endogenous transcription factors, although some fluctuations were seen, can steady at 2.8. We thus complete our initiation and parameter tuning.
Endogenous Protein = P1 + P2
Figure 3.1 Concentration of σ70 & Mσ70, σs & Mσs and Endogenous protein convergents within limited time
σ70, free s70 transcription factor. σ70bind, bound s70 in transcription. Mσ70, mRNA of s70.
σs, free ss transcription factor. σsbind bound ss in transcription. Mσs, mRNA of σs.
Endogenous protein, protein expressed by both s70 & σs.
# Validation
As we can find in the constant table, most kinetic constants are involving transcription, translation, and degradation. We examined the kinetic curve of our model with experiments. Although the absolute concentration is not the same as we get from the experiment, the result matches the experiments satisfyingly in total 6 parallel experiments after normalization. We also found that the overall expression time is not totally matched. Because the plasmid will not start expression immediately after transduction. Due to complexity, we did not simulate this period of time in the modeling. Therefore, we look for evidence from the real-time fluorescence data in other literature.[7] Generally, the experimental data can be seen within two hours under some experimental conditions.
With our assumptions, we find the kinetic constants of transcription, translation, and degradation paired. In this circumstance, the relationship between the parameters is determined and validated.
Figure 3.2 Model protein matches well with experiment data after normalization and linear time correction
P1, blue line, model protein.
GFP1~6 are experiment data, points
Reference:
[7]: Arnoldini M, Heck T, Blanco-Fernández A, et al. Monitoring of dynamic microbiological processes using real-time flow cytometry[J]. PloS one, 2013, 8(11): e80117.
#4. Results
# Induction Addition
Two states of inducers include intercellular and extracellular. We assumed active transport for inducer transmembrane. Set aside the quantity of internal ATP, we considered the kinetic modeling of transport as linear. Thus, the intercellular inducer concentration can be measured more effectively. Here we show a figure of lactose. It's clear that extracellular lactose is transported effectively while intercellular lactose is consumed indicating that the protein induced by lactose begins to express.
Figure 4.1 Transport of inducer.
lac. Inlac, intercellular lac.
Outlac, extracellular lac
# Inhibition of Transcription Factors
After initiation, we gave value to concentration of inducers. We can find an evident decrease of both transcription factors (blue lines) to one-third of the original concentration after inducer addition.
Figure 4.2 Concentration of σ70 & Mσ70, σs & Mσs decreases after inducer addition.
σ70, free s70 transcription factor. σ70bind, bound s70 in transcription. Mσ70, mRNA of s70.
σs, free ss transcription factor. σsbind bound ss in transcription. Mσs, mRNA of σs.
# Igp5.7 Expression Simulation
Two ways of Igp5.7 expression are simulated with our established model. In the absence of inducer, we find that Igp5.7 without inducer will evidently inhibit σs and decrease the concentration while at the same time we can still find active transcription of σs mRNA. With low transcription factors, the endogenous protein will also decrease, suggesting a low growth rate of bacteria. Here we suppose that Igp5.7 without inducer would largely influence the growth rate of bacteria and finally reduce the efficiency of production.
Figure 4.3 The plan that Igp5.7 with inducer shows better growth rate of bacteria than the other
σ70, free s70 transcription factor. σ70bind, bound s70 in transcription. Mσ70, mRNA of s70.
σs, free ss transcription factor. σsbind bound ss in transcription. Mσs, mRNA of σs.
Endogenous protein, protein expressed by both s70 & σs, can represent the growth rate of bacteria.
Igp5.7, the inhibitor. MIgp5.7, the mRNA of Igp5.7. σsIgp5.7, σs-Igp5.7 complex.
# Harvest Target Protein with High Purity
First, We measure the purity of our target protein with the equation:
Apart from purity, the absolute concentration of GFP is also considered. We can find two normal distribution-shaped
curves that represent the ratio and absolute concentration respectively. To steady the model, as we mentioned, we set degradation kinetic constant to convergent the terminal value while in experimental circumstance
with limited material protein expressed may not degrade in such a rapid way. Thus, the range of harvest time may not satisfy real-time experiments. Whereas in the assumption that all protein shares the same degradation
kinetic constant, parameter adjustment merely brings linear-like transform to our model, we can hold the point that the best harvest time may be in an interval rather than the longer the better.
Figure 4.4 Concentrations of GFP and Purity of GFP are normal distribution like curves
The range marked is considered the proper time to harvest
Then, another important question we hope to answer is how the timespan between adding inducers lactose and arabinose could influence the purity of target protein. We set an additional condition: following lactose induction, arabinose is only transported through membrane after concentration of extracellular lactose has dropped below a certain threshold. As the concentration of extracellular lactose decreases linearly under our conditions, we can effectively simulate adding lactose and arabinose at separate times and adjust this timespan by setting different threshold values.
We derived a third-order function showing how target protein purity changes temporally when lactose and arabinose are added under different timespans.
Figure 4.4 Time-dependent purity of GFP curve with different inducer addition interval
x-label: time unit: min
y-label:timespan
z-label: purity unit: 1
The most important findings from our results are that the purity-time curve varies very little along the timespan axis. This would indicate a counterintuitive conclusion: the timespan between adding inducers has little influence on the purity of target protein.
Figure 4.5 Time-dependent purity of GFP curve
x-label: time unit: min
z-label: purity unit: 1
The result is more visually evident if we observe the purity-time quadrant and compare the curves. Significant differences can be found only over a long period of time (over 100 min), where a longer timespan between inducer additions leads to a slightly less decrease in purity following the peak.
It's important to note that the optimum harvesting time falls under 100 min, meaning to achieve maximum target protein purity, the harvesting time would be too low for timespan between inducer additions to make any difference.
This would mean that harvesting protein at the time interval we identified is sufficient for satisfactory yield and purity. There is one less major factor than we originally anticipated influencing Bst production, which is an important insight for our experiment and even future implementation.
# Limitations
Assumptions above can bring this model error. Some assumptions are unwritten but fundamental for protein expression. For example, physiologically inducer promoters can still bind to transcription factors and start to transcript [2]. Whereas, in our model, we gave tacit consent to the fact that protein would not leak without inducer addition. For general proteins, this numerical fluctuation would not influence the whole model. However, leakage of proteins like gp5.7 that are designed to inhibit internal protein expression would cause error.
Figure 4.6 below shows the relation between free gp5.7 and Total gp5.7 (Free + Bound). In our model, we can tell that although the total concentration of gp5.7 is considerable, free gp5.7 concentration is extremely low compared to total concentration due to high affinity and kinetic constant to bind transcription factors. Maybe only a little leakage can bring a considerable difference to transcription factor. But after all, the leakage is relatively neither hard to operate nor to detect for its low dose. In the end, we bring up the affinity and kinetic constant in transcription factor inhibition to compensate for the error brought by gp leakage.
A
B
Figure 4.6
A: Comparison between Free gp5.7 and Total gp5.7. Concentration of total is far more than that of free
B: Concentration of free gp5.7
Total = Free + Bound.
#5. Summary
Please download our modeling script and result figures from https://github.com/Fudan-iGEM/2021-model.
To summarize, we achieved an accurate description of the T7-E. coli expression model with a complex network of interrelated biochemical reactions. We included all processes (transcription, translation, inhibition, transmembrane transportation, etc.) in our ODEs and chose to run Runge-Kutta method with ode45 to achieve optimum fitting.
We derived all parameters reliably through references and rationale. We further tuned our parameters based on several important and reasonable assumptions:(i)We reasoned that the ideal curve for all variables must converge in a system with unlimited materials. This gave us a direct way to examine whether unknown parameters have been tuned correctly. (ii) We described bacteria metabolic stress with the rate of endogenous protein expression. (iii) We took minimum values into consideration to reflect how extremely small amounts of elements could participate in reactions.
Our model was validated with experimental data. In turn, we instructed the experiment by (i) finding the best strategy for composite parts; (ii) predetermining the optimum time interval for protein harvest; (iii) discovering that the timespan between inducer addition had little effect on the yield and purity of target protein.
Future teams can reference our validated host bacteria vector system model with other objectives, and learn from our insightful assumptions and rationale when tuning their own parameters.