Team:TEC COSTA RICA/Model

Model

The main purposes to develop our mathematical model were to gain better understanding of our project, predict if it is likely to work and optimize parameters for the lab. We adopted different approaches to fulfill these requirements.

Our system is a genetic circuit which is based on recombinases, promoters and terminators. Specifically we are relying on serine integrases, because they have the property that in the presence of their recombination directionality factor(RDF), they can switch back and forth between one state or the other, which is essential in the design of our system. We adopted the convention that the two states are called “PB”, and “LR”, to switch from PB to LR you just need the recombinase, to transition from LR to PB you need the recombinase plus its RDF.

We present a detailed model of serine recombination based on the paper “A simplified mathematical model of directional DNA site-specific recombination by serine integrases” [1].

Serine integrases have recently attracted much attention as potential tools for experimental and applied genetic manipulations, because of their recombination efficiency, their short DNA recombination sites (att sites) (typically 40–50 bp) and absence of host factor requirements[1].

Here are the equations we are basing on:

\[ \frac{d[LRI_1]}{dt} = k_{+r}[PBI] - k_{-r1}[LRI_1] + k_{-syn}[LRI_2] - k_{+syn}[LRI_1] \]

\[ \frac{d[PBIR_1]}{dt} = k_{+r}[LRIR] - k_{-r2}[PBIR_1] + k_{-synr}[PBIR_2] - k_{+synr}[PBIR_1] \]

\[ \frac{d[PB_{tot}]}{dt} = k_{-r1}[LRI_1] - k_{+r}[PBI] + k_{+r}[LRIR] - k_{-r2}[PBIR_1] \]

\[ [IR] = \frac{[I][R]}{K_{ir}} \]

\[ [PBI] = \frac{[I]^4[PB]}{K_{bI}} \]

\[ [LRI_2] = \frac{[I]^4[LR]}{K_{bI}} \]

\[ [LRIR] = \frac{[IR]^4[LR]}{K_{bI}} \]

\[ [PBIR_2] = \frac{[IR]^4[PB]}{K_{bI}} \]

For detailed explanation on how they modeled the process of serine recombination and how the equations above came up, please refer to our Annex at the bottom of the page.

We made some changes to the equations, because we wanted to model serine recombination in vivo and not in vitro as presented in the paper. For that we modified the equations of the integrase and RDF to follow the central dogma of molecular biology, with the respective processes of synthesis and degradation, and not assuming them as constants, as if it were in in vitro conditions, as they did in the paper.

Here are the modifications we implemented:

\[ \frac{d[mRNA_{Recombinase}]}{dt} = k_{tscr}Pulse(t) - k_{rna}[mRNA_{Recombinase}] \]

\[ \frac{d[Recombinase]}{dt} = k_{int}[mRNA_{Recombinase}] - k_{dil}[Recombinase] \]

\[ \frac{d[mRNA_{RDF}]}{dt} = k_{tscr}Pulse(t) - k_{rna}[mRNA_{RDF}] \]

\[ \frac{d[RDF]}{dt} = k_{rdf\_tsl}[mRNA_{RDF}] - k_{dil}[RDF] \]

\[ Pulse(t) = 0.5(tanh(\frac{t-ara_{on}}{k_t}) - tanh(\frac{t-ara_{off}}{k_t})) \]

The first four equations represent the process of transcription and translation of the Recombinases and its respective recombination directionality factor(RDF), following the central dogma of molecular biology.
The last equation is how we took into account the action of the promoter, which contributes to the synthesis of the mRNA molecules. Basically we modeled it as a pulse with some on and off state, as presented in [2].

  • Kinetics of recombinases PhiC31, Bxb1 and TP901 are similar
  • The mRNA of RDF has similar behaviour as mRNA of its integrase
  • Recombinase protein and its Recombination Directionality Factor(RDF) have the same dilution/degradation rate.
Now here begins the part in which we used modeling to obtain useful information about our system.

We are interested in the recombinases Bxb1 and TP901, {proof of concept}. First we calculated the optimal RBS for the recombinase, for that we did a sweep of a range of values. We measured the percentage of recombination at 1 hour after the process began, and found out the best RBS to be 4 times higher than the one that was reported in the paper. Also, it throwed another interesting result stating that the RBS of the RDF should be around 10 times the one from recombinase{Wet Lab}. You can find the code we used for this optimization in our Github page, in the file “Sweep.m”

We characterized the time it would take the two promoters we would like to use(Arabinose promoter(Pbad) and the cell cycle promoter (Pnrd)) to produce recombination.

For the Pbad, the concentration of the pulse was the same as the total DNA present, and the duration of the pulse was 12 minutes, as reported in [1]. For the cellular cycle promoter(Pnrd) we calculated the relative strength of the promoter with respect to the Pbad, using our developed method proposed in {Measurement}. We found out the strength to be about 10 times less than the Pbad.

Figure 1. Forward recombination with Pnrd promoter with Bxb1 recombinase. LR,PB and Inducer units are normalized by the total DNA, which is 10nM. Integrase is normalized against its maximum value to be able to see it in the plot, in reality Integrase levels are 3 times higher than shown.

Defining recombination time as the time when the LR state reaches 70%, we can see from the Figure 1 that it would take the counter around 28 minutes to reach this level, so the cell cycle promoter would be non-viable for our circuit, because it expresses every 20 minutes(doubling time)[2]{Wet Lab}.

So hereafter we use Pbad for the modeling,which we predict to need around 23 minutes to produce recombination{Wet Lab}.

In order to calculate this threshold at which recombination occurs, we developed a script in Matlab called “Threshold.m”, which you can access in our Github page.

Also, we characterized the optimal pulse of inducer concentration and the time of exposure of the promoter, as seen in the plot below, which happened to be at 1 uM of concentration relative to the total DNA concentration and with a pulse length of 0.2 hours. In the graph red means higher values and blue means lower values.

In order to do this optimization, we developed a script in Matlab called “Plot3D.m”, which you can access in our Github page.

Figure 2. Percentage of recombination as a function of inducer concentration and induced time.

Now we wanted to simulate the process of how the species´ concentrations would behave as if we were doing the counting, and also to know if that would be sufficient to reach recombination.

Our models are based on using two recombinases, Bxb1 and TP901, so with we would be able to count up to four using our genetic counting circuit, making three steps of transitions of recombinations: Bxb1 forward, TP901 forward and Bxb1 reverse(using its RDF), see {slides Marco}. Reminder: our counter with n recombinases allows counting up to 2^n.

Figure 3. Forward recombination of Bxb1 using Pbad promoter. LR,PB and Inducer units are normalized by the total DNA, which is 10nM. Integrase is normalized against its maximum value to be able to see it in the plot.

In reality Integrase levels are 30 times higher than shown, agreeing with the experimental results reported in [1], which state that in order for recombination to happen, typically much higher concentrations(more than 10-fold) than the concentration of the DNA substrate are needed.

As seen in Figure 3, recombination is effectively performed. After about half an hour, recombination levels remain almost constant.

In order to do this plot, we developed a script in Matlab called “single_counter1.m”, which you can access in our Github page.

Figure 4. Forward recombination of TP901 using Pbad promoter. LR,PB and Inducer units are normalized by the total DNA, which is 10nM. Integrase is normalized against its maximum value to be able to see it in the plot.

In reality Integrase levels are 30 times higher than shown, agreeing with the experimental results reported in [1], which state that in order for recombination to happen, typically much higher concentrations(more than 10-fold) than the concentration of the DNA substrate are needed.

In order to do this plot, we developed a script in Matlab called “single_counter2.m”, which you can access in our Github page.

Figure 5. Reverse recombination of Bxb1 using Pbad promoter. Integrase and RDF are normalized against the maximum value of RDF, to be able to see it in the plot.



In reality Integrase and RDF levels are 30 and 42 times higher than shown, respectively, agreeing with the experimental results reported in [1], which state that in order for recombination to happen, typically much higher concentrations(more than 10-fold) than the concentration of the DNA substrate are needed, and the RDF concentration needs to be higher than the Integrase´s one.

In order to do this plot, we developed a script in Matlab called “single_counter3.m”, which you can access in our Github page.

As seen in the figures above, satisfactory recombination occurs in each of the three steps, so with those initial concentrations of reactants it is predicted that our circuit should perform the counting well.

Parameter Meaning Value
\(k_{int}\) Recombinase translation rate \(3 \mu M h^{-1}\)
\(k_{dil}\) Recombinase/RDF dilution/degradation rate \(2h^{-1}\)
\(k_{tscr}\) Recombinase mRNA transcription rate \(120h^{-1}\)
\(k_{rna}\) Recombinase mRNA degradation rate \(4h^{-1}\)
\(k_{rdf\_{tsl}}\) RDF translation rate \(0.3h^{-1}\)
\(k_{t}\) Characteristic time of inducer dilution due to cell division \(0.3 h\)
\(ara_{on}\) Time at which pulse of inducer is turn on \(0 h\)
\(ara_{off}\) Time at which pulse of inducer is turn off \(0.2 h\)
Values estimated on 20 minutes doubling time. Extracted from [2].

All of the following is a excerpt from [1]:

Serine integrases catalyse integration of a circular bacteriophage genomic DNA molecule into the bacterial host chromosomal DNA, by recombination between an attP site in the phage DNA and an attB site in the host DNA. In the resulting ‘lysogenic’ state, the phage genome is integrated in the host genome, and is flanked by recombinant attL and attR sites, each consisting of an attP half and an attB half (figure 1a). To resume its replicative life cycle, the phage DNA must be excised from its bacterial host genome. To accomplish this, a phage-encoded recombination directionality factor (the RDF protein) is expressed together with the integrase protein. RDF interacts with integrase and alters its properties so that it recombines the attL and attR sites to release the circular phage genomic DNA with an attP site, and leave an attB site in the host genome (figure 1a). (Hereafter, we refer to recombination between attP and attB as PxB recombination, and recombination between attL and attR as LxR recombination)

PxB recombination is efficient (i.e. most of the substrate molecules are recombined) when only integrase protein is present, and the reaction is unidirectional (i.e. no LxR recombination is observed under these conditions). However, when RDF is also present,LxR recombination is efficient and most of the sites are converted to attP and attB products. In addition to stimulating the LxR reaction, the presence of RDF inhibits the PxB reaction.

Figure 1. Schematic representations of serine integrase-mediated recombination reactions. (a) Overview, showing integrase (I) reacting to convert a PB (attP + attB) substrate to LR (attL + attR) product in the absence of RDF (red), whereas integrase plus RDF (I + R) converts LR (attL + attR) to PB (blue). (b) Scheme of reaction steps in Model M. Blue and red solid arrows show integrase-catalysed steps with and without RDF respectively. PBI, PBIR1, PBIR2 and LRI1, LRI2, LRIR are complexes containing four molecules of the integrase protein with PB or LR DNA substrates (with or without four molecules of RDF). The PxB(-R) reaction starts with the binding of four molecules of integrase (I) to PB substrate, followed by a recombination step and formation of the final product LRI1. The LxR(+R) reaction starts with the binding of four molecules of an integrase–RDF complex (IR) to LR substrates, followed by a recombination step and formation of the final product PBIR1. The ‘forbidden’ PxB(+R) and LxR(-R) reactions form ‘blocked’ LRI2 and PBIR2 complexes, which delay recombination due to their very slow conformational change to the productive LRI1 and PBIR1 complexes (grey dotted arrows). The favourable directions of reaction steps are shown by big arrowheads. Step names are shown near arrows. The cartoons show hypothetical structures of intermediates; note that alternative structures might be involved. (c) Core structure of reactions after reduction of the fast variables. Substrates and products are shown by cartoons. Very slow steps are indicated by grey lines.

The minimal model (referred to hereafter as ‘Model M’) was fitted to in vitro experimental data on recombination by phiC31 integrase (I) with its RDF gp3 (R). In the experiments used to produce these data, the extent of recombination at different concentrations of integrase and RDF was determined after 3 h reactions. Recombination was intramolecular, between attP and attB sites (PB) or attL and attR sites (LR) in inverted repeat orientation on supercoiled plasmid substrates.

Model M consists of a set of reversible reaction steps, each described by first or second order kinetics. The PxB(-R) reaction starts by binding of four molecules of I to the PB substrate (two molecules to each att site) and formation of the PB synapse (PBI), in which the attP and attB sites in the PB plasmid are held together by an integrase tetramer. Model M describes the formation of PBI synapse from free PB substrate as a single binding step (step ‘b1’), although the process requires multiple molecular events. This simplification was based on our earlier assumption that integrase binding is faster than synapsis and therefore is not a rate-limiting step. The next step (recombination ‘r1’) transforms PBI to LRI1, comprising recombinant attL and attR sites bound together by an integrase tetramer which synapses the two sites (figure 1b). Here again, this single step describes a series of molecular events. Model M proposes that LRI1 is the predominant endpoint of the PxB(-R) reaction at short reaction times (of the order of 3 h), because the next step on the pathway is very slow.

In Model M, the ‘forbidden’ attL attR reaction in the absence of RDF (LxR(-R)) starts with the binding of four I molecules to LR (step b2 in figure 1b), forming the LRI2 complex. Crucially, the LRI2 complex is conformationally distinct from LRI1, the immediate product of P B recombination. Conversion of LRI2 into LRI1 (step ‘syn’) and its reverse are assumed to be very slow reactions, of the order of days). A hypothetical structure-based interpretation of these two complexes has been presented (figure 1c). The very slow interconversion of LRI1 and LRI2 complexes is the key feature that explains why the LxR(-R) reaction does not yield detectable levels of PB recombination product in short reactions (of the order of hours).

In Model M, the reaction of the LR substrate in the presence of RDF (LxR(+R)) starts by binding of integrase–RDF complexes to the attL and attR sites. We assume (based on published data) that each integrase monomer interacts in solution with one RDF monomer to form a 1:1 complex (IR). Therefore, four IR complexes bind to LR, forming a synaptic complex LRIR (figure 1b, step ‘b3’. Synapsis is followed by strand exchange step ‘r2’, forming a PB product synapse comprising IR-bound recombinant attP and attB sites, PBIR1 (figure 1b). Analogously to our hypothesis for PxB(-R) recombination (see above), we propose that PBIR1 is the typical endpoint of the LxR(+R) reaction. In the ‘forbidden’ PxB(+R) reaction, we assume that a different complex PBIR2 is formed by binding four IR complexes to PB, and that conversion of PBIR2 to PBIR1 (step ‘synr’) and its reverse are very slow.

We assume that all reaction steps except recombination (r1, r2) and the slow synaptic conformational change steps (syn,synr) are fast. This assumption allows us to reduce the dimensionality of the model by applying rapid equilibrium approximations to the other steps, including formation of integrase–RDF complex IR in solution and binding of integrase(with or without RDF) to DNA. After these simplifications, four slowly changing variables (the concentrations of LRI1, PBIR1, total PB and total LR) remain, of which only three are independent, because of conservation of the total DNA pool. We chose the three quantities LRI1, PBIR1 and total PB as the independent slow dynamic variables, with total LR being determined by the difference between total DNA and total PB.

Applying the rapid equilibrium approximation yields expressions for the concentrations of the equilibrating complexes as functions of free I, R, LR and PB. Kir, KbI and KLRi are the dissociation constants for the respective complexes. For simplicity and because it was sufficient to describe the data, we assume that all of the complexes of integrase (or integrase and RDF) with DNA (complexes formed in steps b1, b2, b3, b4) have the same dissociation constant KbI.

For more detail please refer to the paper in [1].

[1]Pokhilko, Alexandra & Zhao, Jia & Stark, W. & Colloms, Sean & Ebenhöh, Oliver. (2017). A simplified mathematical model of directional DNA site-specific recombination by serine integrases. Journal of The Royal Society Interface. 14. 20160618. 10.1098/rsif.2016.0618.


[2] Zhao J, Pokhilko A, Ebenhöh O, Rosser SJ, Colloms SD. A single-input binary counting module based on serine integrase site-specific recombination. Nucleic Acids Res. 2019 May 21;47(9):4896-4909. doi: 10.1093/nar/gkz245. PMID: 30957849; PMCID: PMC6511857.


[3] Pokhilko A, Zhao J, Ebenhöh O, Smith MCM, Stark WM, Colloms SD. 2016.The mechanism of φC31 integrase directionality: experimental analysis and computational modelling. Nucleic Acids Res. 44, 7360–7372. (doi:10.1093/nar/gkw616) PubMed, ISI, Google Scholar


[4] W Marshall Stark, Making serine integrases work for us, Current Opinion in Microbiology, Volume 38, 2017, Pages 130-136, ISSN 1369-5274, https://doi.org/10.1016/j.mib.2017.04.006.


[5] Bowyer, Jack & Zhao, Jia & Subsoontorn, Pakpoom & Wong, Wilson & Rosser, Susan & Bates, Declan. (2016). Mechanistic Modeling of a Rewritable Recombinase Addressable Data Module. IEEE Transactions on Biomedical Circuits and Systems. 10. 1-10. 10.1109/TBCAS.2016.2526668.


[6] Pokhilko, Alexandra & Ebenhöh, Oliver & Stark, W. & Colloms, Sean. (2018). Mathematical model of a serine integrase-controlled toggle switch with a single input. Journal of The Royal Society Interface. 15. 20180160. 10.1098/rsif.2018.0160.

Traditional continuous and deterministic biochemical rate equations do not accurately predict cellular reactions since they rely on bulk reactions that require the interactions of millions of molecules. They are typically modeled as a set of coupled ordinary differential equations.

We would like to explore the different possible outcomes and have a look at the extreme cases of what could happen to our genetic circuit assuming the premise of randomness of the reactions to be true, to properly model the inherent uncertainty of the system. For this part we used the Gillespie Algorithm, which generates a statistically correct trajectory (possible solution) of a stochastic equation system for which the reaction rates are known.

Figure 1. Stochastic trajectory for forward recombination
Figure 2. Stochastic trajectory for reverse recombination

Figures 1,2 assume an initial population of 50 molecules. We can see how the stochastic trajectories are in good agreement with the ones from the deterministic approach. As one increases the number of molecules, the concentrations´ trajectories are seen cleaner and converge to the results obtained from the ODE model. So we would expect even if we use a small number of concentrations the process of recombination still would happen.

Here we present a document with all the steps to execute and reproduce our implementation of the Gillespie Algorithm, hopefully it will be helpful for some iGEM teams.

Also, you can find our code and instructions on how to execute it in the files “GillespiePB.m” and “GillespieLR.m” in our Github page.

A Markov model is a stochastic model used to model pseudo-randomly changing systems. It is assumed that future states depend only on the current state, not on the events that occurred before it (that is, it assumes the Markov property). Generally, this assumption enables reasoning and computation with the model that would otherwise be intractable. It is highly used in modeling finite state machines. We think that this description perfectly fits in our case, because as we are working with a sequential logic device, our circuit is basically a state machine.
In Figure 1 we can see the transition probabilities diagram for the case of two recombinases. Transition probabilities were extracted from [1].

Figure 1. Transition probabilities diagram for the proposed Markov model



Based on Figure 1 we can fill our transition probability matrix:

( 0.01 0.99 0 0 0 0.1 0.9 0 0 0 0.19 0.81 0 0 0 1 )

Where each row represents the probability to move to other states, being the first row for state 1, row 2 state 2, row 3 state 3 and row 4 state 4. The rows must sum up to 1. Columns represent the state to which we arrive, being column 1 state 1 and so on.

Our initial vector would be [1 0 0 0], which means that we want to start at the state 1 of our counter and what we want to calculate is the probability to be at state 4(end of the counter) after a certain number of steps, let's say n steps. Following the Markov processes theory this can be found by multiplying the initial vector times the probability transition matrix raised to the nth power, that would give us a vector and as we are interested in finding the probability to reach the final state so we have to retrieve the last entry of the vector.

Figure 2. Cumulative probability of having reached the final state of the counter.

In Figure 2 we can see the probability of bacteria to complete the counting cycle for 2 recombinases, in a certain number of steps. As there is a certain probability that in a step does not occur recombination, so that step would need to be completed again, so the Markov model takes into account all those possible combinations of delays in the counting sequence. As seen in Figure 2, by step 7 we can expect the whole population to finish the process. Notice that for example for the ideal case(making only 3 recombinations to finish the process) there is a probability of 70% to happen. For this model is only needed as an input the number of recombinases and their respective probability to recombine.

Figure 3. Probability to reach the final state of the counter in a certain number of steps.

Figure 3 is a variation of Figure 2, in this case it is useful to know the proportion of the whole population of bacteria that will complete the process in exactly a certain number of steps. As we can see a greater percentage of the bacteria will finish the counting without any delay(3 recombinations), and from this point on the probability is decreasing with every step.

In order to do the plots and calculate the probabilities, we developed a script in Matlab called “MarkovModel.m”, which you can access in our Github page.

References
[1] Song-Yuan, Zhang & Qiu, Jianhui. (2018). Design of recombinase and terminator-based genetic switches for cell state control.

We thought of modeling to help us at all stages of the project, therefore we imagined the situation in which we already liberated the bacteria with our counter and how the population would change in time in those cases. For example, if we want to use bacteria to decompose oil in the open sea, how many initial bacteria would we need to do such a task? How do we know if that bacteria will eventually die, and not colonize the ecosystem?

To answer these questions, we propose this model, which predicts the population of bacteria over time. It is tightly connected with the markov model we presented, because it uses the probabilities given by it. The model works as follows: Each 20 minutes(doubling time) we perform a “killing” stage given by the probability of the markov model of the percentage of the whole population that should be alive given that it already completed the counting cycle and therefore activated the killswitch mechanism and died, but also the quantity of bacteria left after that process duplicates following binary fission. It is important to mention that we assume that every child cell inherits the counting stage from its parent cell.

We did an application aiming to be useful if someone wants to simulate such a scenario. You only need to proportionate the number of recombinases you are using and the initial population of bacteria. You can find it as “Max_population.mlapp” in our Github repository.

Figure 1. Population of bacteria over time, using 4 recombinases with an initial population of a million bacteria.

This model presents a consensus between the growth of the population and how the counting circuit would affect this, assuming the bacteria are in an exponential phase.

We also kept track of the total population of bacteria that was present in the process, which was for this case 89564 million of bacteria.

As a part of the engineering cycle, we got some feedback about this model, and realised that the premise of exponential growth is not applicable in many cases, due to lack of infinite resources in order for the bacteria to grow indefinitely. For that reason, we also implemented a scenario in which the bacteria follow the typical growth curve and its phases(lag, exponential, log and dead), to be more realistic and useful for example in environments such as laboratories where no infinite resources or space exists.

Figure 2. Population of bacteria over time, using 4 recombinases with an initial population of a million bacteria, following bacterial growth phases.

As we see in Figure 2, although the form of the graph does not change, the maximum value attained is 100 times smaller than the one reported in Figure 1. The total population of bacteria that was present in the process was 1399 million of bacteria, about 64 times less than the reported for the case in Figure 1. These results were expected as in this limited growth condition the overall population will be fewer than in the infinite resource case.

This model allows us to predict the sequence of the counting circuit(which of the integrase expresses at each time). Also, as the way the circuit does the counting seems to not make sense, we can translate the way it looks into the number it represents to have a clear conceptualization, and also we can determine given a specific decimal number, how it would translate into the architecture of our counter.

To develop a single model that can give us the answer to all of the above, we proposed a concept used in computer science and other branches of engineering and mathematics, a decision tree. A decision tree is a graph, in which each node has two children, and it is constructed in a manner that follows simple rules. For example, in our tree the parent node(the one at the top), will always be the recombinase that encompasses the others in our genetic circuit or equivalently the one that is furthest to the right in the initial state. We are following a convention of colors inspired by this animation {link slides Marco}, in which green recombinase encompasses the blue one, the pink encompasses the green one and the blue, and the yellow recombinase encompasses all of the former. In that way we construct each layer of the tree. Now, to construct how we could walk the tree, we can ask a simple question: Is the terminator corresponding to each recombinase down in the sequence? If the answer is “no”, then go always to the left of the node, if the answer is “yes” then go always to the right.

Figure 1. Construction of our recombinase-based decision tree.

By following this construction we found that all of the numbers from 1 to 2^n(where n is the number of recombinases) can be achieved, and more surprisingly it does the counting in an ordered way, as seen in Figure 1.

Another important and interesting aspect of this construction is its modularity. For example, look in Figure 1 that the tree for 4 recombinases is just composed by two decision trees of 3 recombinases, the tree for 3 recombinases is composed by two trees of 2 recombinases and so on. This modularity makes it possible to construct the tree recursively regardless of its size.

We can derived a way to translate from the genetic circuit sequence to a decimal number:

Figure 2. Animation of how the counting is done and what recombinases are expressed at each time.

As seen in Figure 2, we can know (based on the rules of construction of the tree) which terminators are going to be up or down in every step of the counting, so just by seeing them we can know which decimal number they represent. It is important to notice that each combination of terminators up or down for every recombinase is unique by the construction rules of the tree i.e. you can not have two numbers that possess the same combination of terminators up or down.

Figure 3. Representation of decimal number 11 in our circuit architecture

For example, by seeing only the genetic circuit in Figure 3, how can we know which number represents that configuration? Well, we just have to ask ourselves the question, are terminators down? So for the yellow recombinase that would be a yes, for the pink no, for the green yes and for the blue no. So we come back to Figure 2 and search for the path yes->no->yes->no, and as we see we encounter number 11 at the end of the road.

Also, we can translate from the decimal number into how it will look in our genetic circuit:

By the construction of our circuit, we designed it in a way such that when the furthest recombinase to the right in the initial state recombinases(or flips), the others are flipped with it as well, because its recombinant sides are placed on the edges of the circuit. Notice that in this case of 4 recombinases, once the yellow recombinases flip there is nothing we can do to change its state because there is nothing in the counter which allows us to flip it again, so if we see the terminators of the counter as four empty ordered boxes, the terminator for the yellow recombinase can be only either in the las box up of in the first one down, if it already made the process of recombination. The same reasoning applies for the rest, the pink recombinase can only be either in the last or in the first box available depending on if it has already flipped and so on.

So, let's suppose we want to know how number 7 would look like in our architecture. So again we refer to Figure 2, we stand on where number 7 is and walk the tree upwards saving the “yes” and “no”´s that we find. That would be: yellow no, pink yes, green yes and blue no.
So as the yellow terminator is up, it means it has not been flipped or has not made the process of recombination yet, so the yellow terminator would be in the last of the four available spaces up. The pink terminator is down, which means it has already flipped and has to be in the first space on the left that is available, so that would be position one. The green terminator is down, so it is in the second position from left to right and finally the blue terminator can only be in the third position. We can see our correct prediction in Figure 4.

Figure 4. Representation of decimal number 7 in our circuit architecture

Finally we found another interesting property, taking a concept from binary search trees, if we traverse the tree In-Order, we get the sequence of what recombinase is expressed at what time. This is very important because it allows us to predict how many times a recombinase will flip, and base on that take decisions, for example the blue recombinase expresses once every two recombinations, the green once every four, the pink once every 8, the yellow once every 16 recombinations and so on, so it makes sense to put the recombinase that exhibits the best flipping efficiency as the first one in our circuit to increase the probability of success, since it is the one which will be more used.

Figure 5. In-Order traversal of our tree.

The algorithm for In-Order traversal is as follows:
1. Traverse the left subtree, i.e., call Inorder(left-subtree)
2. Visit the root.
3. Traverse the right subtree, i.e., call Inorder(right-subtree)

A more casual and easy way to visualize it is if we imagine that there is a huge piano at the top of the tree, and when it falls crashes the tree nodes projecting them into the floor, as seen in Figure 5.

As stated by Austin UTexas iGEM team 2019: “A major problem in synthetic biology is evolutionary instability: when a genetic construct is transformed into a cell, some resources are allocated towards expression of the construct. This introduces a cellular resource competition between it and the host genome, creating additional cellular burden. This makes engineered bacterial populations less fit than the wild type. Over time, cells accumulate loss-of-function mutations within the construct, freeing cellular resources for the genome and increasing cell fitness. The population cannot withstand the burden associated with the construct for a sustained number of generations, so individual cells' constructs evolve as a way to relieve this burden and, eventually, the mutation sweeps the population” [1].



We used their approach in order to study the impact of metabolic burden in our system´s performance. The 2020 Austin UTexas iGEM team created a list of measurements of burden value for 304 biobricks. We were able to find 7 that are part of our circuit. You can find the entire list in [2].

Part Burden Value
BBa_B0010 -2.1 +- 3.4%
BBa_E1010 -1.1 +- 2.5%
BBa_B0030 0.2 +- 0.9%
BBa_B0032 3 +- 6.2%
BBa_B1006 4.3 +- 4.4%
BBa_I0500 6.7 +- 9.5%
BBa_J23100 20 +- 9.9%
Table 1. Parts of our circuit with their respective metabolic burden value.

The model shows when parts break within a population of E. coli according to that part’s burden value. Therefore, the model marries the concepts of burden and evolutionary stability within our project to create a predictive tool simulating the evolutionary stability of parts based on their burden values.
Higher burden values (percent reduction in growth rate), correspond to parts breaking at an earlier generation time. For example, a part that has a burden value of 0.4 and thus imparts a 40% reduction in growth rate.
Also they made a tool to simulate the behaviour of the part, you can change the burden value,mutation rate, the culture in which you are going to perform the experiment and stochastic simulations. You can use the tool in [3].
As to date only individual parts burden has been studied, it's currently hard to predict the burden of an entire device from that of its parts. So we adopted an approach of modeling the part with the higher burden value, in our case BBa_J23100, to have an idea that in a best case scenario how our circuit is going to behave.

Figure 1. Evolutionary Failure of an Engineered Cell Population

As seen in the above figure, it is predicted that our circuit would likely break at 100 generations or cell doublings. The red lines are stochastic simulations and the black one the deterministic. The straight blue lines mean the number of generations at which saturation is reached, depending on the environment in which we perform the experiment, from left to right they are colony, test tube, flask and bioreactor.

References

[1] https://2019.igem.org/Team:Austin_UTexas/Model
[2] https://2020.igem.org/Team:Austin_UTexas/Contribution
[3] https://barricklab.org/shiny/burden-model/