Team:TU Darmstadt/model

Genetic Circuit – TUDA iGEM 2021

Genetic Circuit

Goal of the Model

The core of our project is a genetic circuit that triggers bacteriophage release upon induction through a quorum sensing (QS) signal. Knowledge of the relationship between the number of QS signaling molecules as input at the cell exterior and the number of bacteriophages as output is crucial for the design of our genetic circuit. Therefore, we have developed a deterministic model in which the concerning molecular processes are represented as ordinary differential equations. The model was thought to improve the design of our genetic circuit as well as help us to gain a deeper understanding of the molecular processes we are using to build our phage-defense system against invading pathogens.

In this model we considered the following parts to simulate our genetic circuit:

  1. Diffusion of QS signaling molecules inside the cell
  2. Binding of signaling receptors to signaling molecules
  3. Expression of target genes

As in our proof of concept experiments, we used the green fluorescent protein as representative genetic output. In Figure 1, you can find a schematic overview of the molecular processes we considered in our model. However, our model provides potential for expansion. The model could be completed by adding differential equations that represent phage expression or even phage release. As a closer step the expression of RecA730 could be modeled instead of GFP. We or future iGEM Teams just need to change the concerning parameters. Therefore, our model can be easily modified.

Figure 1. Representation of the biochemical reactions considered in our mathematical model. 1) The diffusion of the QS molecules acyl homoserine lactone (AHL) through the E. coli cell wall can be described by the diffusion coefficient D. 2) Formation of the QscR-QS molecule complex, described with a Langmuir-Hill equation. 3) The expression of the target protein GFP can be described by the dependency of the concentration from the transcription factor complex bound to the inducible promoter. Transcription (ktransk), translation (ktransl) and respective degradation rates (dmRNA, dProt) are also taken into consideration.

To simplify the processes in the cell we made some assumptions. To verify these we also developed an extensive model. Moreover, we incorporated fitting parameters based on laboratory results in order to improve our model. We were going to determine the basal expression of the promoter PPA1897 as well as Plux which gets activated by the allosteric transcription factor (aTF) QscR respectively LasR in complex with the signaling molecule. Also, we were going to compare the lab results of fluorescent measurements to the results from our model. Unfortunately, our lab time was limited and we were not able to get the needed results. Therefore, we used parameters from literature.

Since the concentration of QS molecules released by pathogens can be very low, we wanted to examine the need for cascades as well as optimization of promoters, ribosome bindings sites (RBS) and signaling receptors.

The model is based on equations from the Modeling Webinars of the iGEM Engineering Committee.​1​ All variables used in the equations are listed in Table 1. Before iGEM none of us had prior experience in modeling. The Webinars have been very helpful and we want to encourage future iGEM Teams to have a go at project modeling despite lack of experience. Our model helped us to understand our genetic circuit more deeply and improved the design process.

Table 1. Definition of the used abbreviations.
AbbreviationDefinition
AAcyl homoserine lactone (AHL)
AextExtracellular AHL
AintIntracellular AHL
CN Copy number
DDiffusion coefficient
KdDissociation constant
KoffRate constant for dissociation
KonRate constant for binding reaction
nHill coefficient
QQscR
QAQscR-AHL complex
ρPercentage of volume in a culture that is occupied by cells
θ Proportion of QscR molecules bound to AHL

Diffusion in E. coli

\begin{equation}\dot{A_{int}} = D\cdot A_{ext}-D\cdot A_{int} \label{E1} \end{equation}

\begin{equation} \dot{A_{ext}} = D\cdot A_{int} – D\cdot \rho \cdot A_{ext} \label{E2} \end{equation}

As explained in the previous section, our genetic circuit depends on the presence of signaling molecules inside the sleeper cell. In our project, we use AHL quorum sensing molecules. The sleeper cell contains the genetic circuit we designed for sensing pathogens, as explained here. Inside the cell, the signaling molecule binds to an allosteric transcription factor (aTF). The resulting complex then initiates the transcription of a target gene.​2​

As mentioned above, the aTF-signaling molecule complex is formed inside the cell. Therefore, the amount of complex depends on the intracellular amount of the signaling molecule, which in turn depends on its extracellular amount. In our case, AHL are secreted by the pathogen Pseudomonas aeruginosa. Depending on the number of pathogenic cells, the extracellular concentration of AHL varies. The signaling molecule must overcome the cell membrane to get into the sleeper cell. In ;P. aeruginosa the AHL quorum sensing molecules diffuse freely across the cellular membrane.​3​ It has also been shown that these quorum sensing molecules are able to diffuse into Escherichia coli cells.​2​ Therefore, we can assume that the diffusion of AHLs into the E. coli sleeper cells is free as well.

We can set up the following biochemical equation shown in Figure 2 to describe the diffusion process:

Figure 2. Diffusion of the QS molecules (AHL) from the cell exterior in the cell. The molecules pass the cell membrane freely.

The diffusion of the signaling molecules is described by the diffusion coefficient D. This coefficient depends on the cell surface area of the E. coli sleeper cell, the membrane permeability of the signaling molecule and the cell volume.​3​ Moreover, the diffusion process is influenced by the factor ρ, which describes the percentage of volume in a cell culture that is occupied by our sleeper cells.​4​This factor is needed to adjust the extracellular concentration to an intracellular concentration. Considering this, the differential equations [1] and [2] can be derived to describe the rate of change in the number of AHL inside and outside the cell, respectively.

These equations combined can be used to describe the time-dependent change of signaling molecules inside the sleeper cell in dependence of the external amount of signaling molecules.

When plotting the solution of differential equations [1] and [2], it becomes apparent that the diffusion process is very fast compared to the following transcription and translation processes of the target gene (Figure 3). Therefore, we can assume a steady state of signaling molecules inside the cell for describing our genetic circuit mathematically.

Figure 3. a. The graph shows the number of AHL molecules outside the cell depending on time. The extracellular number of the signaling molecule decreases due to diffusion. After a short period of time a steady state is reached. b. The graph shows the number of AHL molecules inside the cell depending on time. The intracellular number of signaling molecules increases due to diffusion. After a short period of time a steady state is reached.

All in all, by using equation [1] we can calculate the intracellular amount of signaling molecules for a given extracellular concentration. The intracellular amount is crucial for the following formation of a transcription factor complex and therefore the initiation of the target gene expression.

The Tran­scrip­tion Factor Com­plex

\begin{equation} \Theta = \frac{[A]^n}{K_d+[A]^n} = \frac{[QA]}{[Q]} \label{E3} \end{equation}

Our sensing system is based on QscR, an allosteric transcription factor (aTF). This means that QscR alone does not activate our expression system. Only when it is bound to its corresponding signaling molecule, AHL, QscR reaches its active form. Transcriptional activation is then performed by the resulting aTF-signaling molecule complex. In consequence, the activation of our transcription system is not a direct result of the aTF concentration, but rather of the concentration of the aTF-signaling molecule complex (Figure 4).​5​

Figure 4. Formation of the aTF-signaling molecule complex by QscR and AHL. Kon and Koff are the rate constants of binding and dissociation reaction.

In order to describe the binding of aTFs to their signaling molecule, we decided to use the Hill-Langmuir equation. This equation serves as a simple model for quantifying the reversible association of binding proteins to their ligand molecules.​6​ Due to its simplicity, the equation only requires the ligand’s concentration as well as the specific equilibrium constant Kd of the described reaction. Kd is equal to the quotient of the respective constants Kon and Koff. With Kon being the second order rate constant for the binding reaction and Koff describing the dissociation as a first order rate constant (Figure 4).​7​

\begin{equation} K_d = \frac{K_{on}}{K_{off}} \label{E4} \end{equation}

As a function of this ligand concentration, it then delivers θ, the proportion of binders that are bound to their ligand. In our case QscR is the binding protein and AHL is the ligand. θ is then equivalent to the proportion of QscR molecules that exist in an aTF-signaling molecule complex with AHL.

Since we do not want to obtain a proportion, such as θ, but the absolute concentration of QscR-AHL complexes, we add another calculation to this equation. By multiplying θ by the concentration of QscR in total, the wanted QscR-AHL complex concentration can be obtained:

\begin{equation} [QA] = \Theta\cdot [Q] \label{E5} \end{equation}

Hill Equation

\begin{equation} [GFP]=\alpha_{max}\left(\beta_0+(1-\beta_0)\frac{[AQ]^{n}}{K_d+[AQ]^n}\right) \label{E6} \end{equation}

The gene expression regulation system of our sleeper cells is described mathematically by a Hill function based on the iGEM Modeling Webinar Series.​1​ The regulation is based on the availability of signaling molecules inside the cell, more specifically its complex formed with the regulator protein QscR. The whole expression system is depicted schematically in Figure 5.

Figure 5. Scheme of all biochemical reactions inside the genetic circuit for the expression of our target protein. The expression of the target protein GFP can be described by the dependency of the concentration from the transcription factor complex bound to the inducible promoter. Transcription (ktransk), translation (ktransl) and respective degradation rates (dmRNA, dProt) are also taken into consideration.

Assumptions:

  • the RNA polymerase as well as the ribosomes are not limiting the kinetics
  • much faster binding/unbinding than translation/transcription
  • much faster mRNA degradation than protein degradation
  • protein degradation includes growth associated dilution

Derivation of the equations:

The kinetics of the reactions in our genetic circuit can be represented by the following differential equations:

\begin{equation} [\dot{P}] = -k_{on}[P][AQ]+k_{off}[PAQ] \label{E7} \end{equation}

\begin{equation} [\dot{AQ}] = -k_{on}[P][AQ]+k_{off}[PAQ] \label{E8} \end{equation}

\begin{equation} [\dot{PAQ}] = k_{on}[P][AQ]-k_{off}[PAQ] \label{E9} \end{equation}

\begin{equation} [\dot{mRNA}] = -k_{1}[PAQ]-d_{1}[mRNA] \label{E10} \end{equation}

\begin{equation} [\dot{GFP}] = -k_{2}[PAQ]-d_{2}[GFP] \label{E11} \end{equation}

If the equations for the promoter as well as the promoter-transcription factor complex are added up, it follows that:

\begin{equation} [\dot{PAQ}]+[\dot{P}]=0 \label{E12} \end{equation}

Therefore:

\begin{equation} [P]=C_N-[PAQ] \label{E13} \end{equation}

This is logically consistent with the fact that the number of promoters in the cell does not change over time.

The rate of change of the complex concentration can be neglected because of the large difference in time scales between formation of the complex and transcription/translation. With this quasi-stationary state, it follows that:

\begin{equation} 0=k_{on}[P][AQ]-k_{off}[PAQ] \label{E14} \end{equation}

Additionally, equation \(\eqref{E13}\) can be inserted to replace the promoter concentration:

\begin{equation} 0=k_{on}(C_N-[PAQ])[AQ]-k_{off}[PAQ] \label{E15} \end{equation}

This can be solved for the concentration of the complex, which yields:

\begin{equation} [PAQ]=C_N \frac{[AQ]}{K_d+[AQ]} \label{E16} \end{equation}

which in turn can be used to simplify equation \(\eqref{E10}\) :

\begin{equation} [\dot{mRNA}]=k_1C_N\frac{[AQ]}{K_d+[AQ]}-d_1[mRNA] \label{E17} \end{equation}

For applying this to the translation step of the system, the mRNA concentration is approximated as constant, too:

\begin{equation} 0=k_1C_N\frac{[AQ]}{K_d+[AQ]}-d_1[mRNA] \label{E18} \end{equation}

and solved for the mRNA concentration:

\begin{equation} [mRNA]=\frac{k_1}{d_1}c_N\frac{[AQ]}{K_d+[AQ]} \label{E19} \end{equation}

For simplification purposes, the constants are summarized in \(\alpha\):

\begin{equation} \alpha=K_2\frac{K_1}{d_1}C_N \label{E20} \end{equation}

Equation \(\eqref{E19}\) is now used to simplify equation \(\eqref{E11}\) :

\begin{equation} [\dot{GFP}]=\alpha\frac{[AQ]}{K_d+[AQ]}-d_2[GFP] \label{E21} \end{equation}

To be able to compare this with experimental data, the steady state approximation is used again to get the protein concentration in the equilibrium state:

\begin{equation} [\dot{GFP}]=0 \label{E22} \end{equation}

\begin{equation} [GFP]=\frac{\alpha}{d_2}\frac{[AQ]}{K_d+[AQ]} \label{E23} \end{equation}

This equation can be expanded by adding the basal expression as well as the Hill coefficient \(\eqref{E6}\). The constants \(\alpha\) and d2 are summarized in \(\alpha_{max}\), which represents the maximal protein expression.

\begin{equation} [GFP]=\alpha_{max}\left(\beta_0+(1-\beta_0)\frac{[AQ]^{n}}{K_d+[AQ]^n}\right) \label{E24} \end{equation}

\begin{equation} \alpha_{max}=k_2\frac{k_1}{d_1d_2}c_N \label{E25} \end{equation}

The used approximations enable the calculation of the protein output with equation \(\eqref{E24}\) and without having to solve every differential equation separately. Since the accuracy of this model and the parameters would have been tested through the measurement of protein concentrations, this is the most efficient way to approach the genetic circuit mathematically.

Extensive model

The biochemical reactions that represent our genetic circuit are described via a system of differential equations, displayed by the equations \(\eqref{E1}\) and \(\eqref{E7}\)- \(\eqref{E11}\). For the establishment of the simple and easy to use Hill equation \(\eqref{E24}\) we made several assumptions. To justify those, we decided to develop a more extensive model in which the system of differential equations is solved using the Python module scipy.​8​ This model can be used to compare the time steps needed to reach a steady state in the different sections of the circuit (namely diffusion of the AHL into the cell, complex formation, transcription and translation). Figure 6 shows the individual concentrations of the molecules described above as they reach an equilibrium. Note that the scaling of the time axis differs due to different kinetics of the considered processes.

Figure 6. The number of the individual molecules as a function of time. (a) Extracellular and (b) intracellular AHL.
Figure 6. The number of the individual molecules as a function of time. (c) QscR molecules and (d) AHL-QscR-complex.
Figure 6. The number of the individual molecules as a function of time. (e) Free promoters and (f) AHL-QscR-Promoter-complex
Figure 6. The number of the individual molecules as a function of time. (g) mRNA and (h) output protein.

The figure shows that the diffusion of AHL into the cell is faster than translation and transcription, supporting the assumption of a constant concentration of AHL molecules inside the cell. This means that the diffusion equation can be replaced with a linear relation to describe the AHL concentration inside the cell when only calculating the protein output.

This difference in timescales also occurs for the binding reactions, meaning the formation of the QscR-AHL complex as well as the binding of this complex to the promoter. Both reach a steady state after approximately 60 seconds (the binding to the promoter can also be much faster depending on the AHL concentration). In the derivation of the Hill equation, the concentration of the mRNA is also approximated as constant in respect to the translation reaction. This is shown by the plots (Figure 6) as well. The amount of mRNA reaches a steady state at about 15 minutes, whereas the protein concentration takes over 6 hours to reach an equilibrium. In contrast to the previously considered equilibria, however, this is achieved by the fact that the degradation rate of the protein is significantly lower than that of the mRNA and not by a large difference in the rate constants.

In conclusion, the data obtained from the more extensive model shows, that the assumptions used for the derivation of the final Hill equation in the modeling webinar are feasible for our own system. This enables us to calculate the concentration of the output protein more effectively and with a much simpler code.

Results and Discussion

Our initial goal was to establish a mathematical model which not only represents our sensing circuit but also can help us to adapt our experimental setup in the lab.

Therefore, we valued a close connection between our model and experiments. Our assays performed in the lab were based on fluorescence measurements. When signaling molecules are present in the cell culture, the inducible promoter is activated by an aTF-signaling molecule complex. This should lead to the expression of GFP resulting in an increased fluorescence signal. Our model utilizes differential equations to describe the reporter expression and determine the concentration of the output protein. To use our model as strong prediction tool for the DBTL optimization of our genetic circuit we used parameters (transcription and degradation rates) specific for GFP for our calculations. Moreover, we wanted to determine the required parameter of basal expression of the inducible promoter PPA1897 through fluorescence measurements. GFP for example could therefore be used as a quantitative fluorescence reporter, obeying a linear relation of fluorescence intensity to concentration.​9​ Unfortunately, we were not able to conduct these measurements due to very limited lab access. This is why we used values found in literature for these parameters making a direct comparison between the calculated and observed outputs unrealistic.

Nevertheless, our model helps us to learn how the used parameters, especially the input concentration and translation rate, influence the concentration of the output protein. Thanks to that, we know which output to expect when testing different variations of our circuit and the experimental setting can be adapted to reach the desired output.

Figure 7 shows the number of the output protein depending on the particle concentration of the input molecule (signaling molecule).

Figure 7: Number of the output protein GFP depending on the concentration of signaling molecule. This graph also shows the effect of varying translation rates on the number of output protein.

The graphs calculated for different translation rates all follow a sigmoidal course, which is typical for induced expression. The output remains constant for low particle concentrations (up to 1×1015 molecules/L) of signaling molecule slightly varying from zero due to basal expression of the inducible promoter.

Starting at 1×1015 signaling molecules/L increasing inducer amounts lead to continuously higher expression rates of GFP and thus increasing output concentrations. After the rapid increase the curve flattens until approaching minimal effects for higher signaling molecule concentrations. This effect can be observed at signaling molecule concentrations of about 2×1016 molecules/ L. In this range, the amount of aTF-signaling molecule complex rises and the expression of the output protein is initiated. The plateau is reached because all aTF are bound to a signaling molecule, so an increase in the amount of signaling molecules has no significant impact on gene expression. To sum up, our model enables the determination of the output concentration by adjusting the used amount of signaling molecule.

Additionally, different translation rates influence the maximum output concentration, but not the behavior of the genetic circuit itself. In Figure 7 graphs for translation rates between 2 min-1 and 5 min-1 are shown. Those values are the lower and upper limit for translation rates for GFP expected in Escherichia coli, respectively. In our genetic circuit, the translation rate can be adapted by using different RBS sequences or adapting the used codons.​10​ If a certain output concentration is necessary for an application of the circuit, a suitable RBS with an appropriate translation rate can be chosen.

Generally, in our model further parameters can be varied to evaluate their effect on inducible gene expression.

Our model is currently specific for the expression of GFP as the output protein of the established sensing circuit in E. coli. By adjusting the used parameters to any output protein and host organism, our model gets operational for different expression systems. The only requirement is that the considered system relies on an inducible promoter which is activated by an aTF-signaling molecule complex. This is also a reason why we wanted to keep our model as simple as possible.

As explained in the section Extensive model, we also used a system of ordinary differential equations (ODE) to describe our circuit. This extensive model is computationally intensive and not suitable for easy usage. But it allowed us to verify the assumptions made in the iGEM Modeling Webinar​1​ specifically in regard to our own system. To compare both models, we plotted the number of output protein depending on the number of input molecules using the simplified model and the extensive ODE model. The resulting graphs are shown in Figure 8. As can be seen, both graphs share the same course. The deviation of the graphs amounts to 3% and can be traced back to the assumptions explained in the section Hill equation made to simplify the model.

Figure 8: The number of output protein depending on the number of input molecules is shown. The graphs were established using the Hill equation (pink) and extensive ODE model (yellow).

Moreover, our model can be expanded to simulate multiple gene expression systems within our sleeper cells. For instance, the genetic circuit for controlled phage induction could be considered. For that, the ongoing biochemical reactions can be described by differential equations and integrated into our model.

For the implementation of our model in our actual sleeper cell, the response of the sensing circuit to signaling molecules could be extended by adding a module describing phage expression and assembly. This extended model could help us to optimize the timing of phage release by holin expression induced at threshold phage concentrations and many more.

In summary, we provided a mathematical model to describe a genetic circuit which is based on an inducible promoter. The number of output protein can be simulated for a certain input concentration. Moreover, influencing parameters can be varied to see the effect on the circuit. We studied the effect of varied translation rates on the number of output protein, but also the variation of other parameters is possible. Because we designed our model as simple as possible, it can be adapted to describe diverse expression systems in different host organisms.

Overall, our model was very valuable and again we want to encourage future iGEM Teams to have a go at project modeling despite lack of experience.

References

  1. 1. Vignoni A. Modeling Webinar Series. igem.org. [accessed 2021 Oct 12]. https://2021.igem.org/Engineering/Webinars
  2. 2. Wu Y, Wang C-W, Wang D, Wei N. A Whole-Cell Biosensor for Point-of-Care Detection of Waterborne Bacterial Pathogens. ACS Synthetic Biology. 2021:333–344. http://dx.doi.org/10.1021/acssynbio.0c00491. doi:10.1021/acssynbio.0c00491
  3. 3. Boada Y, Vignoni A, Picó J. Promoter and transcription factor dynamics tune protein mean and noise strength in a quorum sensing-based feedback synthetic circuit. 2017. http://dx.doi.org/10.1101/106229. doi:10.1101/106229
  4. 4. Dockery J. A Mathematical Model for Quorum Sensing in Pseudomonas aeruginosa. Bulletin of Mathematical Biology. 2001:95–116. http://dx.doi.org/10.1006/bulm.2000.0205. doi:10.1006/bulm.2000.0205
  5. 5. Oinuma K-I, Greenberg EP. Acyl-Homoserine Lactone Binding to and Stability of the Orphan Pseudomonas aeruginosa Quorum-Sensing Signal Receptor QscR. Journal of Bacteriology. 2011:421–428. http://dx.doi.org/10.1128/JB.01041-10. doi:10.1128/jb.01041-10
  6. 6. Gesztelyi R, Zsuga J, Kemeny-Beke A, Varga B, Juhasz B, Tosaki A. The Hill equation and the origin of quantitative pharmacology. Archive for History of Exact Sciences. 2012:427–438. http://dx.doi.org/10.1007/s00407-012-0098-5. doi:10.1007/s00407-012-0098-5
  7. 7. Corzo J. Time, the forgotten dimension of ligand binding teaching. Biochemistry and Molecular Biology Education. 2006:413–416. http://dx.doi.org/10.1002/bmb.2006.494034062678. doi:10.1002/bmb.2006.494034062678
  8. 8. Virtanen P, Gommers R, Oliphant TE. SciPy 1.0: fundamental algorithms for scientific computing in Python. nature methods. 2020;17:261–272. doi:https://doi.org/10.1038/s41592-019-0686-2
  9. 9. Bongaerts RJM, Hautefort I, Sidebotham JM, Hinton JCD. Green fluorescent protein as a marker for conditional gene expression in bacterial cells. Bacterial Pathogenesis Part C: Identification, Regulation, and Function of Virulence Factors. 2002:43–66. http://dx.doi.org/10.1016/S0076-6879(02)58080-0. doi:10.1016/s0076-6879(02)58080-0
  10. 10. Reeve B, Hargest T, Gilbert C, Ellis T. Predicting Translation Initiation Rates for Designing Synthetic Biology. Frontiers in Bioengineering and Biotechnology. 2014. http://dx.doi.org/10.3389/fbioe.2014.00001. doi:10.3389/fbioe.2014.00001

Eppendorf hilgenberg Zymo Research New England Biolabs Inc.
IDT Integrated DNA Technologies Snapgene Biebertaler Blutegelzucht Promega
DWK Life Sciences Science Birds Twist Bioscience Microsynth SEQLAB
TU Darmstadt
Supertext Brand
Carl Roth Sparkasse
 Quantum Design Europe