Team:Thessaly/Model

FlexStart Bootstrap Template - Index

Overview

For this year’s Modeling, we designed and implemented a mathematical model that describes the biological part of our project in a formalistic language. The aim of our model is to support the wet lab design by providing an in silico description that depicts and highlights its modularity, and complement it by using robust methods to simulate its parts.

The goals of our modeling work are to:

Perform an independent, easy-to-use simulation for all the parts of our design regarding SCFAs detection.
Analyze their function both qualitatively and quantitatively.

Predict their behavior and response in a range of conditions.
Utilize the data produced to improve the design of the wet lab work.

G Protein-Coupled Receptor (GPCR) Module

G protein-coupled receptors are the largest and most diverse group of membrane receptors in eukaryotes (O’Connor & Adams, et al.,2010). These cell surface receptors act like an “message inbox” for the eukaryotic cell and through their function compartmentalize the environment-to-cell signaling process, which makes them extremely useful and versatile modules for synthetic applications.

For the modeling of our FFAR2 GPCR, in the beginning we worked on a previously documented and developed SBML model (Heitzler, et al.,2012), consulting the BioModels database (Li, et al., 2010). Many modifications were needed, though, because our research was only narrowed down to receptors that conduct β arrestin-mediated signaling, and not to ones of the specific FFAR2 type, mainly due to the scarcity of documentation on this receptor. We also ignored various second messengers and signals and fine-tuned the model, to better depict our project design.

How it works

The species that consist the Tango-GPCR module are the receptor-repressor complex (R_LacI), the β arrestin 2-TEV protease complex (barr2_TEV), the complex of these two previous complexes (SCFAs:R_LacI:barr2_TEV), SCFAs (whose diffusion into the cell we assume to be completed prior), their complexes with the receptor and the receptor-repressor (SCFAs_R , SCFAs:R_LacI accordingly) and, last, the repressor after TEV-induced cleavage (LacI) which gives us the output of the whole circuit and the input of the previously analyzed NOT Gate. When there is an SCFA concentration capable of triggering the system, the SCFAs bind to the receptor-repressor complex, the receptor is sensitized (and desensitized soon after in presence of β arrestin 2) and the whole complex is finally cleaved by TEV protease, leaving the repressor as the final product and the receptor as a by-product. If we do not have a protein degradation step, we can say that we assume that receptor degradation after cleavage does not occur, but we plan to include such reactions in an updated version of the model.

Figure 1: The FFAR2 GPCR genetic circuit. If we do not have a protein degradation step, we can say that we assume that receptor degradation after cleavage does not occur, but we plan to include such reactions in an updated version of the model.

System Testing

As mentioned before, alterations and adjustments had to be made on the initial model, concerning mainly the parameters and the initial quantities of the species in the system. Yet, changes in these values come with the burden of instability, resulting in oscillating signals. Building the GPCR modeling in a bottom-up fashion, at first we did not take into consideration the reaction with which the SCFAs bind to the receptor-repressor complex. In this case, the cause of the locally unstable behavior - according to Control Theory and Systems Biology - is the obvious closed feedback loop in the system. We experimentally proved that the way to avoid this behavior is to tune the barr2_TEV and SCFAs:R_LacI initial values so that the barr2_TEV to SCFAs:R_LacI ratio is increased and, accordingly, that the decrease of this ratio introduces instability. At the same time, an increase in the initial value of SCFAs:R_LacI produces more LacI and thus a stronger signal, so one can observe that there is a tradeoff between stability and output signal strength. Other interesting observations for a system analysis are the facts that initial SCFAs:R_LacI value has control over the convergence time of the final signal and that the same applies with barr2_TEV and the oscillations’ amplitude. However, when we included the SCFAs and receptor-repressor complex binding, we inevitably added an extra layer of control over the system and observed that the only thing that holds power over the output signal’s response is the barr2_TEV value. This way the GPCR performs β arrestin-mediated signaling and dampens completely the effect of other factors.

The simulations below indicate that the balance between the main effectors of the detection module, beta-arrestin and GPCR, need to be in an optimal ratio at protein level, for the best signal to be produced. In future iterations, the wet lab team will experiment on different combinations of promoters/RBSs and inducer concentration, to achieve optimal signal production.

Figure 2: Normal function of the GPCR. The yellow curve represents β-arrestin 2 concentration
and the red curve the LacI output signal.

Figure 3: The GPCR’s output signal (LacI) as the concentration of SCFAs increases.

Figure 4: The GPCR’s output signal (LacI) as β-arrestin 2 concentration increases. Bigger β-arrestin 2 concentration results in faster LacI convergence to a final value and oscillations of smaller amplitude.

lac operon: the NOT Gate

One of the main characteristics observed amongst well-engineered systems is modularity, an attribute that allows the designer to view the systems’ parts as separate entities, “black boxes”, with individual functions, properties, inputs and outputs, and also grants them the ability of troubleshooting, debugging and, in general, manipulating the system one piece at a time. This is one of the attributes the aforementioned systems share with our NOT Gate design. Taking advantage of its modular nature, we started off modeling our system with a deterministic, bottom-up approach in mind.

To model the NOT Gate of our detection system, we employed components of the lac operon, which is a well-documented paradigm of genetic regulation, with bi-stable, switch-like behavior and is utilized in the wet lab design. We focused on the center of the whole negation logic, which is the repressor-operator interaction in the lac operon and the chemical kinetics surrounding it. The model reactions, parameters and rate rules were derived from Stamatakis and Mantzaris (2009) therefore the first version of our model did too.

A standard approach on the lac operon

The dynamics of the lac operon, as seen in Stamatakis and Mantzaris (2009) , feature a repressor (lacI), an operator (lacO), a permease (lacY) and an inducer (Isopropyl β-D-1-thiogalactopyranoside in our case) whose concentrations are measured intra- (IPTG) and extracellularly (IPTGex). The reaction network of the lac operon is as follows:

\[\oslash \rightarrow \ M_{R} \\ \ M_{R} \rightarrow \ M_{R} \ + \ R \\ \ 2R \leftrightarrow \ R_{2} \\ \ R_{2} \ + \ O \leftrightarrow \ R_{2} O \\ \ 2I \ + \ R_{2} O \leftrightarrow \ I_{2} R_{2} \ + \ O \\ \ O \rightarrow \ O \ + \ M_{Y} \\ \ R_{2} O \rightarrow \ R_{2} O \ + \ M_{Y} \\ \ M_{Y} \rightarrow \ M_{Y} \ + \ Y \\ \ Y \ + I_{ex} \leftrightarrow \ YI_{ex} \rightarrow \ Y \ + \ I \\I_{ex} \leftrightarrow \ I\]

Degradation reactions:

\[\ M_{R} \rightarrow \oslash \\ \ M_{Y} \rightarrow \oslash \\ \ R \rightarrow \oslash \\ \ R_{2} \rightarrow \oslash \\ \ Y \rightarrow \ \oslash \\ \ YI_{ex} \rightarrow \ I \\ \ I_{2} \ R_{2} \rightarrow \ 2I \]

Given that [O]T = [O] + [R2O] (the total operator concentration is constant and equal to the free operator concentration plus that bound to the repressor), the mass balances are depicted in the following equations:

\[\frac{d[M_{R}]}{dt} = k_{sMR} \ - \lambda_{MR}[M_{R}] \]

\[\frac{d[R]}{dt} = k_{sR}[M_{R}] \ - \ 2k_{2R}[R]^{2} \ + 2k_{-2R}[R_{2}] \ - \lambda_{R}[R] \]

\[\frac{d[R_{2}]}{dt} \ = k_{2R}[R]^{2} \ - \ k_{-2R}[R_{2}] \ - \ k_{r}[R_{2}][O] \ + \ k_{-r}([O]_{T} \ - [O]) \ - \ k_{dr1}[R_{2}][I]^2 \\ \ + \ k_{-dr1}[I_{2}R_{2}] \ - \lambda_{R2}[R_{2}]\]

\[\frac{d[O]}{dt} \ = \ -k_{r}[R_{2}][O] \ + \ k_{-r} \ ([O_{T}] \ - \ [O]) \ + \ k_{dr2} \ ([O]_{T} \ - \ [O])[I]^2 \ - \ k_{-dr2}[O][I_{2}R_{2}]\]

\[\frac{d[I]}{dt} \ = \ - 2k_{dr1}[R_{2}][I]^2 \ + 2k_{-dr1}[I_{2}R_{2}] \ - 2k_{dr2}([O]_{T} \ - \ [O])[I]^2 \ + \ 2k_{-dr2}[O][I_{2}R_{2}] \\ \ + \ k_{ft} [YI_{ex}] \ + \ k_{t}([I_{ex}] \ - \ [I]) \ + \ 2\lambda_{I_{2}R_{2}}[I_{2}R_{2}] \ + \ \lambda_{YIex}[YI_{ex}]\]

\[\frac{d[I_{2}R_{2}]}{dt} = k_{dr1}[R_{2}][I]^2 -k_{-dr1}[I_{2}R_{2}]+k_{dr2}([O]_{T}-[O])[I]^2-k_{-dr2}[O][I_{2}R_{2}]-\lambda_{I_{2}R_{2}}[I_{2}R_{2}]\]

\[\frac{d[M_{Y}]}{dt}=k_{s_{OMY}}([O]_{T} - [O])+k_{s_{1MY}}[O]-\lambda_{MY}[M_{Y}]\]

\[\frac{d[Y]}{dt}=k_{s_{Y}}[M_{Y}]+(k_{ft}+k_{-p})[YI_{ex}]-k_{p}[Y][I_{ex}]-\lambda_{Y}[Y]\]

\[\frac{d[YI_{ex}]}{dt}=-(k_{ft}+k_{-p})[YI_{ex}]+k_{p}[Y][I_{ex}]-\lambda_{YIex}[YI_{ex}]\]

The notation used for the species is explained in the following table:

Symbol	Species Denoted
MR	LacI repressor mRNA
R	LacI repressor monomer
R2	LacI repressor dimer
O	lacO operator
R2O	Repressor-operator complex
I	Intracellular inducer IPTG
Iex	Extracellular inducer IPTG
I2R2	Repressor-inducer complex
MY	LacY permease mRNA
Y	LacY permease
YIex	Permease-inducer complex

The parameters that appear in the reaction model:

Symbol	Value	Units	Description
[O]T	~2.8	nM	operator content
ksMR	0.23	nM*min-1	lacI transcription rate
ksR	15	min-1	LacI monomer translation rate constante
k2R	50	nM-1*min-1	LacI dimerization constant
k-2R	10-3	min-1	LacI dimer dissociation rate constant
kr	960	nM-1*min-1	association rate constant for repression
k-r	2.4	min-1	dissociation rate constant for repression
kdr1	3*10-7	nM-2*min-1	association rate for 1st derepression
k-dr1	12	min-1	dissociation rate for 1st derepression
kdr2	3*10-7	nM-2*min-1	association rate for 2st derepression
k-dr2	4.8*103	nM-1*min-1	dissociation rate for 2st derepression
ks1MY	0.5	min-1	lacY transcription rate constant
ksOMY	0.01	min-1	leak lacY transcription rate constant
ksY	30	min-1	lacY translation rate constant
kp	0.12	nM-1*min-1	LacY-IPTGex association rate constant
k-p	0.1	min-1	LacY-IPTGex dissociation rate constant
kft	6*104	min-1	IPTG-facilitated transport constant
kt	0.92	min-1	IPTG passive diffusion constant
λMR	0.462	min-1	lacI mRNA degradation constant
λMY	0.462	min-1	lacY mRNA degradation constant
λR	0.2	min-1	repressor monomer degradation constant
λR2	0.2	min-1	repressor dimer degradation constant
λY	0.2	min-1	LacY degradation constant
λYIex	0.2	min-1	LacY-inducer degradation constant
λI2R2	0.2	min-1	repressor-inducer degradation constant

Figure 5: The standard lac operon genetic circuit. To create a detectable signal, we used GFP (Green Fluorescent Protein) for our simulations. Specifically, we replaced the model

Modifying the lac operon

To create a detectable signal, we used GFP (Green Fluorescent Protein) for our simulations. Specifically, we replaced the model species of the lacY scope {mRNA_LacY, LacY} with the corresponding GFP ones {mRNA_GFP, GFP}.

Instead of producing LacY and activating the IPTG positive feedback loop, our engineered system expresses GFP. Thus, we determined a standard input and a standard output for this module. The solution for this problem was to actually open the lac operon genetic circuit, by cutting off the feedback loop the permease is responsible for.

Nevertheless, the mathematical modeling of GFP’s expression kinetics was a difficult topic to find explicit literature on, mainly because most of the literature we encountered features GFP as a reporter module in case studies of cells of particular type and provides at best experimental expression/fluorescence data fitted with various methods, rather than deterministic kinetic equations for its expression.

Lastly, we set lacI at the protein level as the system’s input, ignoring only the mRNA_LacI species, since it is an input from the previous in-line detection module and not expressed, and updated our equations with all of the above changes to ensure they still follow mass balance principles. For a comprehensive design of the Detection System, see our Engineering page

In the diagrams below, concentrations of the various species during model simulations can be observed, when it is induced or not by IPTG, the system’s fail-safe inducer.

Figure 6: The lac operon modified, “open” genetic circuit.

Figure 7.1: Comments on figure 7.2.

Figure 7.2: The response of the system when it is not induced. In both cases, the extracellular IPTG is equal to zero, with the difference
that in the first case LacI is present and in the second is absent. As a result of non-induction, the final quantity of GFP is pretty near to zero, meaning that no reporting sequence is triggered. The repression, on the other hand, happens independently with stray repressor and operator quantities in the model.

Figure 8.1: Comments on figure 8.2.

Figure 8.2: The response of the system when it is induced. There is a non-zero initial quantity of extracellular IPTG in both cases.
In the first one LacI is absent, while in the second one LacI is present. With a little tuning in the initial extracellular IPTG value, we can see that when initial LacI is equal to zero, LacO (the operator) is temporarily repressed by stray quantities of LacI dimer and then produced normally, whereas when initial LacI is non-zero, LacO is repressed to a negligible value. This encapsulates the NOT gate logic. Of course, as opposed to the previous diagrams, here we can observe a very large production of GFP.

Reporter Module

Various techniques can be employed to visualize and quantify changes in a genetic mechanism, the simplest of all being the concurrent expression of a fluorescent protein such as GFP. However, our system design requires a more robust reporting sequence that also grants monitorability through changes in electrical current, and therefore can communicate with electronic hardware parts to channel the information. Electrochemical reporters are very promising constructs that bring us closer to groundbreaking technologies such as lab-on-chip, gut-on-chip, and of course lead the way to many cutting-edge Internet of Things applications in human endoscopy such as AMALTHEA.

The circuit’s function

As soon as Tyrosine is expressed, it reacts with the enzymatic help of the endogenous Tyrosinase-AIDA, undergoing metabolization, to produce L-DOPA; L-DOPA (l-3,4-dihydroxyphenylalanine) is an amino acid that has an observably important role in signal transmission in the human brain, neurons, and human biology, in general (it is the precursor of neurotransmitters, for instance). In turn, it is oxidized to L-DOPAquinone, a reaction that -much like any oxidation reaction- releases electrons and produces a detectable electrical current.

At first, so as to understand the way our reporter module behaves in terms of modeling, we had to go through a vast literature that dealt with electronic interrogation of biological systems that performed redox signaling. Redox signaling is signal transduction that involves redox reactions and consequently changes in a solution’s charge, and strictly depends on the kinetics and thermodynamics of the electron movement. In our research, we also encountered systems that interestingly enough performed redox cycling - that is, the consecutive happening of a reduction and an oxidation reaction back and forth in a periodical manner, which is a procedure that models systems such as genetic oscillators. So, one intriguing question we had to answer was: “Does our reporter perform redox cycling?”. It turned out, after some discussion with the wet lab, that in our system there are no oscillatory redox phenomena, the production of L-DOPA quinone from L-DOPA is a one-way reaction (oxidation), and the release of electrons is constant, producing a steady signal. Note that this realization was the fruit of our better understanding of the system’s inherent characteristics, rather than some kind of assumption.

Inputs & Outputs

If we view our reporter from the circuit/module perspective, it is in dire need of clear inputs and outputs. As for the latter, we know that the final product is L-DOPA quinone, whose production implies the release of electrons. Even though we treat L-DOPA quinone as the final product for simplification purposes, what interests us is the charge present in the solution, which we assume proportional to the concentration of L-DOPA quinone. The input, on the other hand, needs to be tyrosine, treated as if exogenous to the reporter module, endogenous to the rest of the reporter system (for the sake of modular logic), as opposed to tyrosinase, which is considered “native” to the module. If we broaden our perception of the whole system by not looking solely at this module, nonetheless, we can understand that both tyrosine and tyrosinase are endogenous, their only difference being that tyrosine is produced only as a response to the absence of SCFAs. This, among others, proves that the modeling was done in full correspondence with the wet lab design.

That being said, our final version of the lac operon module has to produce tyrosine to activate the reporting sequence, and GFP needs to be replaced with tyrosine. We need to be cautious not to lose any information while taking this “replacement” step, though: GFP is an imaging, semi-quantitative indicator of a biological process. Its expression is not associated with the process it reports linearly, but is an approximate gauge of how much of it is happening. Also, its expression is an operation that introduces a lot of noise, it is modeled only with the help of stochastic methods and algorithms across the literature, and the only way to extract a kinetic formula for its expression is fitting experimental data. The questions that arise are: “Does tyrosine behave the same way?” and “What are the differences in tyrosine’s expression kinetics?”. In our modeling, all lac operon’s expression kinetics are deterministic equations based on mass balance principles, and we make no exception for tyrosine, or GFP. In fact, our initial equations are more likely to predict tyrosine’s kinetics accurately (that usually is assumed to follow formulas such as Michaelis-Menten), than GFP’s vague, random expression. This way, adopting our initial equations for tyrosine too makes an assumption that is in no way far-fetched, and results in a more precise modeling overall.

Figure 9: The Reporter Module genetic circuit.

Determinism versus Stochasticity

For the simulation of the reporter module applied two different techniques: deterministic simulation and stochastic simulation. A deterministic system is a system that has standard outputs for standard inputs, meaning that its function can be sufficiently described by a particular formula (or formulae). A stochastic system’s dynamics is more vague; an output cannot be assigned to a single input, the series of events happening to produce the output are of probabilistic nature, and so are the formulae used. The same terminology applies accordingly for “simulations” or “methods”, and not only “systems”. Both simulation philosophies have their strengths and weaknesses, and each one is better than the other for specific purposes. Generally, a deterministic simulation is better for more classic, predictable systems, whereas a stochastic one takes the most information out of a more unpredictable, noisy, sensitive and/or inconsistent system. One of our goals for the reporter module was to follow both and compare them to see what better suits ours.

To simulate the reporter module’s first-order reactions deterministically we used Mass Action kinetics, one of the most general-purpose, versatile options SimBiology provides for a reaction. We also experimentally tuned tyrosine and tyrosinase initial values so that we see the results within a specific time frame (approx. 12 minutes). As expected, the concentration curves change very smoothly and converge with certainty to their final values, while L-DOPA quinone -our final product, whose concentration quantifies the final signal- comes to a saturation which can be translated to a steady, constant signal. An interesting observation is, that by using the ode45 solver we get curves that are more angular, less precise and the simulation program runs for longer, while at the same time when using ode15s the results are smoother, more refined and quicker, meaning that a solver made to solve stiff problems works better for this module. For more on solvers, go to the Technical Details section.

Figure 10: The results of the deterministic simulation using the ode15s solver, and the according legend showing the color coding.

In the stochastic simulation, after some searching, we used the implicit tau solver as the best for this case (we will not expand on the implicit tau leaping method, as it would most probably drive our analysis off topic, with concepts that are not easily comprehensible). Using the same set of initial values as before, the curves appear to follow roughly the same paths, but in a linear, steeper fashion. Specifically in the L-DOPA quinone concentration we observe a spike-like rise somewhere between 2 minutes and 8 minutes most of the time. Note that the moment of appearance of that rise, as well as its slope, vary at each run of the simulation program. Also, in a respectable percentage of runs the concentration of L-DOPA quinone remains at zero level, exhibiting the effects of intrinsic noise and randomness– noise is always present, but those are the runs that it is observed at full potential.

Figure 11: The results of stochastic simulation with the implicit tau solver, for 25 runs. Paths that appear more intense and bold are more frequently followed.

Having solved the problem in two different ways, we are now able to compare them. We are confident that the fruit of such a comparison will at least provide a helpful tool for our better understanding of the problem, if not the ideal solution. The deterministic solution guarantees smooth predictable curves. They can be easily discretized, quantized down to measurement points at given moments, and then recreated with no – or little – loss of information with mathematical methods (interpolation, regression etc.). This makes an effortlessly processable final signal. Nevertheless, it is not as realistic as it should be, considering it treats the problem as a textbook one, isolated from any source of external noise or disturbance, a factor that is well integrated in the stochastic solution. Stochasticity depicts fairly the effects of randomness in the genetic circuit. In spite of that, it doesn’t grant us high-quality, precise data: when a circuit’s components are few in number and the circuit is relatively small and simple, the possible routes the curve is about to follow on the 2D plane – or, formally, the potential states of the system – reduce drastically, and so does noise, making a stochastic simulation seem inaccurate and pointless. That happens because the number of factors contributing to any divergence from a standard curve are very few, just like the number of the circuit’s components. This proportionality between the number of components and the “realism” of noise discourages us from using a purely stochastic solution, in the same way that the lack of random phenomena or noise prevents us from applying a genuinely deterministic one. If forced to make a choice between the two, we would opt for determinism, for its quality of data, consistency and ease of processing. We regard both methods as suitable or unsuitable for different reasons, though, and this is why the ideal approach that would offer optimal data quality would be a hybrid approach, one that would combine the two methods. This remark certainly inspires a next future version of our modeling of the Reporter Module.

Signal Processing: A Noise Study

We have repeatedly mentioned noise and its impact on the study of a biological process, yet little has been said about how to tackle this issue, while keeping the useful information stochasticity has to offer. This section is devoted to the solution of this exact problem and marks the most intriguing part of our modeling by introducing our own, custom approach on solving the biological noise problem. Our focus will remain on the steps we followed to improve the quality of the data we extract from the stochastic simulation of the reporter module, along with the tools required to do that.

On noise and noise sources

The characterization “noise” can be inherited by any random disturbance in a measurement or a signal that causes unpredictable deviation from a standard representation of its information and distorts the signal of interest. The capsule system all together is a complex multi-component system that is home to a lot of functions and therefore to a lot of noise sources. If we were to pinpoint the most significant of them we would surely consider:

Thermal Noise: Also known as Johnson-Nyquist noise, is the electronic noise caused by the movement of the electrons in an electrical conductor, and is present in electronic circuits regardless of any applied voltage.

Biological Noise: Our main field of study. The biological noise is unwanted divergence in a measurement, behavior or process on the gene level due to the instability and randomness of protein translation or transcription. In the work of Swain, Elowitz & Siggia (2002), a clear distinction is being made between intrinsic and extrinsic noise. As intrinsic noise we name the inherent stochasticity of translation and transcription, and as extrinsic the variations caused by fluctuations in the amount or states of other cellular components.

Roundoff Noise: Purely computational term. Roundoff noise is noise produced by quantization errors within a filter when it is applied on a signal. Simply put, it is the information that is being lost because of tolerances in the design of a digital filter - after all, nothing is 100% accurate in computer simulations. We will explain the definition and use of filters later in this section.

This study primarily focuses on biological noise, and especially intrinsic biological noise. Even though it may be normally hard to distinguish it from extrinsic, knowing our model only simulates isolated processes of gene expression makes it easy to understand that the only simulation noise observed will be intrinsic. We will also very briefly concern ourselves with roundoff noise.

Theoretical Background

Let us start the exploration of the tools used in this section by clarifying some basic concepts necessary for the mathematical representation and manipulation of noise as a signal. The first question that comes to one’s mind when reading this, is, “What is a signal, and why do we need to introduce this specific concept for this analysis?”. A signal is a measurement, a quantification of a physical attribute that changes with respect to time (and other metrics, but for now we will leave it right there) and it can be continuous (its value changes over a spectrum with no gaps between any two random measurements) or discrete (the signal is intermittent, it is a set of periodical samples). The modern world, especially due to the rapid development of computational and communication systems over the last decades, uses signals to transmit and process information, with a great emphasis on digital ones, which fall in the category of discrete signals.

Figure 12: Continuous and discrete signal. Pictures retrieved from: https://www.chegg.com/homework-help/definitions/continuous-and-discrete-signals-4.

The next question that would arise is “How and why do we process a signal?”. Of course, there are numerous techniques, but if we were to pick one point to start that would definitely be no other than Fourier analysis. Fourier analysis is the representation and analysis of a signal with Fourier series, a mathematical way to describe a signal which initially is a function of time, "s(t)" with respect to the signal’s spectrum. Or, formally:

\[\ S_{P}(t) \ = \ \sum_{-\infty}^{+\infty}S[k]e^{(ik2πt)/P}\]

Basically, it uses separate periodical signals -sinusoidal, actually- in the form of complex exponentials e(ik2πt)/P where k is the number of samples in an interval of length P), each one multiplied by a different coefficient, and adds them up to make a new, approximative mathematical representation of the signal. It is a linear combination of periodical functions. But what is the spectrum quantity, or else, the coefficients of that linear combination, S[k], and what does it represent?

\[ \ S[k] \ = \frac{1}{P} \int_{P}^{}S_{P}(t) \ e^{-(ik2πt)/P}dt \]

This is the Continuous Time Fourier Transform (CTFT), also known as the spectrum of a signal. Complex exponentials have by nature the ability to describe periodically changing quantities with ease -since they are, too- and that is one good intuitive reason to justify their presence in the formula. Each individual complex exponential function is a periodical signal that oscillates in time with a certain frequency; as a piece of information represents a particular frequency that can be observed in the initial signal - the initial signal hardly ever “contains” or, better yet, ranges over just one frequency. An infinite summation (i.e. integration) of all these “coefficient” elementary signals would give us the initial signal’s spectrum, a new signal, that represents the initial one as a function of frequency, and not time. With this algebraic manoeuvre we officially leave the time domain and enter the frequency domain. Of course, this is a completely invertible process, so we can retrieve the signal’s time-domain representation at will.

Having distinguished “continuous” from “discrete”, we should be able to notice that the nature of the quantity shown above is purely continuous with respect to time, being a weighted summation of complex exponentials. Naturally, quantities that contain a sum are discrete sequences, while ones that integrate over a variable in their formula are continuous. Bearing this informal yet on-point observation in mind, one would say that the Fourier Series formula is discrete, but that is not true, since it is an infinite sum. Hence, both representations are continuous-time representations, something we could not claim for these two representations when they are examined with respect to frequency: as we noted before, k is the number of samples in an interval of length P, which means that the initial signal is assumed to be periodical and its frequency content is sampled. Generally, every method is used in correspondence with the data we process. The CTFT is applied only on continuous time data, for example. To move on, we will need a more computationally realistic representation that not only is discrete with respect to frequency, but also is the same regarding time, and that would be the Discrete Fourier Transform (DFT):

\[\ S[k] = \frac{1}{N}\sum_nS_{N}[n]e^{-(ik2πt)/P}\]

Admittedly, it differs from the previous representations thanks to little, yet crucial details. In order to be able to compute it efficiently, we used the Fast Fourier Transform (FFT). FFT, as its name suggests, is an algorithm designed to compute the DFT in a much smaller execution time and is the most popular way to perform this computation programmatically, rightfully earning the title of the most ingenious algorithm developed in the 20th century.

As the DFT expression indicates, the calculations it does are linear, polynomial, it is a summation of different components multiplied by specific coefficients. Without immersing our analysis in linear algebra concepts, we will pay our brief homage to the Fast Fourier Transform simply by emphasizing that the ingenuity of this algorithm lies in the way it represents these components as polynomials using vector notation and optimizes the process of multiplying and adding, breaking the two operations into smaller and smaller ones; this is a technique widely known as “divide and conquer”, however there are many possible variations and implementations of this algorithm. In terms of Algorithm Analysis and complexity, FFT performs O(Nlog2(N)) operations, at the same time that the conventional DFT would take O(N2) operations. Roughly, this means that for the computation of DFT on N points the number of algebraic operations needed by the FFT is proportional to Nlog2(N) , orders of magnitude lower than the N2 of operations the DFT would require.

Figure 13: Butterfly calculations, graphical representation of the operations the FFT performs. Picture retrieved from: https://en.wikipedia.org/wiki/Butterfly_diagram.

As we already have mentioned, the simulation data we process are not only categorized as continuous or discrete, but also as deterministic or stochastic. Formalistically speaking, the methods examined above are strictly applied on deterministic signals, so, on paper, we had to follow another route: the autocorrelation function:

\[\ R_{XX}(t_{1},t_{2}) \ = \ E[X_{t_{1}} \bar X_{t_{2}}]\]

or, for wide-sense stationary processes:

\[\ R_{XX}(τ) \ = \ E[X_{t_{1}} \bar X_{t_{2}+τ}]\]

In this case, our information signal is represented as a stochastic process {Xt} in which the bar notation represents complex conjugation and E[] is the expected value. The autocorrelation function is a mathematical function that measures the statistical correlation between any two consecutive values of a given stochastic process, meaning that it quantifies the extent to which two discrete, chronologically neighboring values of that process are codependent and statistically affect one another. Intuitively, it is a representation that helps us understand how abruptly the signal of interest changes and a crude measure of how random it is. The part where it contributes to our analysis, though, except for the fact that it provides us with a time-domain representation of the signal’s information, is when it enables us to extract a formal frequency-domain representation using Power Spectral Density:

\[S_{x}(f) \ = \int_{-\infty}^{+\infty} R_{XX}(t)e^{-2πfτ} \ dτ\]

This is a usual tactic worth mentioning that fills the shoes of the Fourier Transform and covers the need for a tool that allows us to study signals in the frequency domain in many disciplines that deal with stochastic phenomena. Theoretically, this was the way to go, but in computing terms things are a little different. In order to write a MATLAB script which calculates the autocorrelation function, we had to use a function which on low level uses Fourier transform (the FFT algorithm) to compute the autocorrelation in the frequency domain and then converts back to the time domain using an inverse Fourier transform, as we learned from the function’s documentation (according to “Autocorrelation function in the MATLAB language”, see References). Consequently, there was no use in implementing our code this way, since we had no profit in the sense of performance or quality of the solutions. That said, it is easy for one to see why we chose to use FFT directly from the beginning instead of using the other route, and, to be fair, this method is certainly not unusual in applications that address non-deterministic signal processing such as speech processing and sound processing.

Filtering

Having set the theoretical background, we are now able to take advantage of our full potential in the frequency domain. So far, we have the ability to represent signals as functions of frequency and see how they change with respect to it. What is very intriguing about this, is that when in the time domain all noise effects are by definition inseparable from the main information, that is not the case with the frequency domain: noise is usually observed in high frequencies, where regularly there is little or no information of the average signal - that varies according to the noise model we use, but still stands as a general rule, especially if we assume the noise to be Gaussian, as in Gomez-Uribe, Verghese & Mirny (2007). This means that, in terms of frequency, the noise is isolated and separable from the rest of the signal, both visually and mathematically. The only thing we need is a tool to remove it now that it’s distinguishable, and that is exactly why we employ filters. Filters are constructs, either mathematical (functions) or real (circuits, software tools), which allow us to be selective in the frequency bands that actually reach their output and thus give us the ability to attenuate noise while keeping the main information intact when the signal of interest passes through them.

The application of a filter in the domain of frequency (known as filtering) can be as simple as a multiplication of two signal functions. As a curve, the filter is equal to zero at the frequency bands where the noise is present, and non-zero in the bands that carry the actual information. This multiplication represents the process during which the signal passes through the filter. A full explanation of this “passage” would also carry a heavy intuitive load of information and a time-relative approach, but it’s a process we will describe only in the frequency domain for the sake of convenience, given that time domain makes things far more complicated and requires the unnecessary introduction of concepts new to the reader.

Filter Implementation

Our main goal for the Signal Processing part was to find a way to statistically process the stochastic results of the reporter module -and hence the whole system’s output- and perform an automated noise reduction. There are numerous techniques to design a filter and many categories of filters, that vary from each other primarily because of their different magnitude responses. Their magnitude responses (their representation in the frequency domain) are standard curves whose slope is affected by the value of specific parameters - they affect the filter’s accuracy as well, nothing is ideal in computer simulations. The magnitude response, among some other characteristics, make every filter or category of filters different tools that are optimal for different occasions. That does not necessarily imply a lock-and-key relationship, but at times it is so.

To see the optimal technique for our analysis, we once again did a bibliographical research, and we found out there is an extremely wide range of filtering applications on biological signals for a variety of purposes. We observed that in the field of attenuating intrinsic noise created by a stochastic simulation, high end techniques mostly use non-linear filtering (i.e. filters whose outputs are not linear functions of their inputs). As for the way they cut off the undesirable frequencies, these techniques feature low-pass (cutting off high frequencies), band-pass (cutting off all frequencies except for a specific band) and band-stop (cutting off a specific band) filters. The vast majority of the methods we encountered dictated the application of a low-pass one, which in some special occasions comes with the risk of a non-negligible signal distortion on account of accuracy issues. Particularly popular filtering techniques are: Bessel filters (Chung & Kennedy (1991)), Bierwirth & Goellner (2007)), Kalman filters (Andrews, Yi & Inglesias, (2006)) and Butterworth filters used for E.coli chemotaxis optimization (Chung & Kennedy (1991)). Also some honorable mentions would be: Gaussian and Laplacian filters used for image analysis (Walker, Nghe & Tans (2016), Bokde (2015)) and Least Mean Squares filters used for cancelling noise in ECG signals (Bokde (2015)).

After considering all the possible options we had and consulting Oppenheim & Schafer (2010) , we came to the conclusion that a very good solution to the noise problem is to design and use an Infinite Impulse Response (IIR) low-pass Butterworth filter, since we realized it is a type of filter that combines simplicity, popularity and relatively small loss of information in various applications (general-purpose or not). And so we did, by constructing the following data pipeline:

SimBiology offers us the ability to run multiple simulations, store all the results into datasheets, extract them to an .xlsx file, and save them there as independent and static data. With this in mind, we developed a MATLAB script that performs an automated analysis on them. Firstly, it parses the data, meaning it “reads” them and stores them into data structures the MATLAB language can manipulate. Then, it stores the average of the reporter module’s output values of all the runs for each specific moment (time value) in the simulation - averaging is a useful tool here, considering we are talking about a stochastic simulation. Finally, it transforms the data to frequency-domain data, applies the Butterworth filter as we previously described, converts them back to their time-domain representation and plots them accordingly, leaving us with smoother, more regular curves.

Figure 14: The averaged results before and after filtering, depicted in average quantity versus peak time moments. As peak time moments we define the time moments where L-DOPA quinone suddenly rises in the stochastic simulations.

There is one important thing to observe here: as mentioned in the Reporter Module section, due to its simplicity, there are few possible sources contributing to noise and therefore a stochastic model gives us really steep, switch-like curves that are zero until they linearly ascend to a final value with a significant rate. This results in many samples of the curves being equal to or near zero, which really affects the averaging process and brings the whole outcome near to zero. This way we prove the point made in the previous section about complexity in the reporter module, and confirms that a purely stochastic simulation for the reporter module takes a toll on accuracy. Please also note that this lack of accuracy is genuinely a consequence of the stochastic simulation and not the noise reduction process we designed, which makes the latter a tool that can be applied independently. Any magnitude distortion observed is a consequence of the filtering process, but, again, may be negligible for a more complex reaction network. At last, the seemingly irrational fluctuations of the curve in the beginning and the end of the measurements that also become negative at particular intervals, is most probably an overshoot caused by the peculiar manner in which the Fourier series behaves at jump discontinuities, widely known as Gibbs phenomenon. Of course, it is reduced as we manually increase the accuracy of the FFT, consequently increasing execution time.

Figure 15: The magnitude response (spectrum) of the filter we designed, and the power spectrum of the round-off noise accordingly.

Technical Details

Model Assumptions

The magnitude response (spectrum) of the filter we designed, and the power spectrum of the round-off noise accordingly.

Scope	Assumption
lac operon	Repressor dimer binds to operator with 1-1 stoichiometry
Inducer binding reaction with repressor vs repressor-operator	Cooperativity is equal to 2
lac operon	Operator-repressor complex does not degrade
lac operon	IPTG is non metabolizable, it does not degrade
lac operon	Repressor does not degrade when bound to the operator
lac operon	All degradation reactions follow first-order kinetics
lac operon	The total DNA and operator concentration are constant, the cell’s volume is constant. Cell division and growth are neglected
lac operon	Extracellular IPTG concentration is constant
lac operon	All the species follow Mass-Action kinetics
Reporter Module	The charge present is directly proportional to L-DOPA concentration
Noise Study	The intrinsic biological noise is white and Gaussian

Platforms

Biological systems are of great complexity, variability and instability in comparison with classic, textbook engineered systems. The problems we have to solve in order to study them are fluid in nature, not well defined, and, more often than not, engineers that solve these problems have to deal with the lack of experimental data, since usually an in silico model is built a priori (prior to the in vitro one), as Dr Klapa confirmed.

To solve these problems, we had to begin with opting for the right simulation platform, and, certainly, there are numerous tools to choose from when it comes to genetic circuits simulation (Appleton et al. (2017), MacDonald et al. (2011), Marchisio, Mario & Stelling (2009)). The main criteria were: level of abstraction, language, usability, compatibility and popularity. We chose MATLAB’s SimBiology tool: its level of abstraction is on the genetic circuit scale (a tool that would work on the DNA sequence scale would be too “low-level”, and a tool that would not be able to simulate on the genetic scale would be too “high level”), the programmatic language it uses is the well known MATLAB language, it is very user-friendly, compatible with many document types (SBML files is a very good example) and, finally, very popular for synthetic biology applications with a big community supporting it. Lastly, it gives us the option of choosing between programmatically creating the model and the simulation programs with MATLAB scripts and using its Graphical User Interface – we used each option as we saw fit.

Methods, Solvers & Stiffness

As it is accustomed in many engineering fields – and also being encouraged by MATLAB’s great numerical analysis capabilities - we adopted numerical solutions to our problems. These include methods and algorithms that approximate the result we want to extract, not only because of the accuracy-efficiency tradeoff, but also because errors are an inherent attribute of all computational systems. Terms relative to a numerical algorithm we’ll need for this analysis are: the error, which indicates the updateable divergence from the final solution, the step (or step size), which indicates how much we change the problem’s variables every time to approximate the final solution, and stability, which is a property that determines if the algorithm will converge to a final solution – or roughly, the chances of that happening. All numerical algorithms are iterative, which means that the above quantities are updated on each iteration the algorithms do, in order to better approach the solution.

Now, in the numerical methods domain, the capriciousness of the biological systems we mentioned before is manifested with stiffness. An ODE problem is stiff when the equation(s) describing it are very sensitive to specific terms they include, which results in great, non-predictable variations between nearby solutions (solutions found at iterations close to one another, chronologically). This, in turn, can possibly make the error skyrocket, so the method of interest needs to adjust (reduce, basically) its step to follow the rapid changes in the dynamically improving solutions. Stiffness has a major impact on a method’s stability – many methods are ruled out as inappropriate when a stiff problem is at hand, because of their failure to converge when solving one, or to do so efficiently. The methods that are indeed appropriate, make many small steps, and not few big ones, so, mainly, stiffness is an efficiency issue, considering we can solve such problems but with the cost of a significant rise in simulation time.

Choosing a numerical solving method in our case means choosing a solver. Starting off to build a deterministic model for the lac operon using a set of ODEs, we used the standard ode45 (Dormand-Prince) solver, which under the hood uses the classic Runge-Kutta method which is pretty much one’s very first choice for solving an ODE problem (Runge-Kutta is basically a family of iterative methods that numerically solve ODE problems, the “RK4” member of which is labeled as the “classic Runge–Kutta method” or simply “the Runge–Kutta method”). Nevertheless, we observed the enormous simulation times that choice brought, and soon suspected the stiffness of our equations, so we considered experimentally changing our solvers. The ones that stood out as the wisest options were ode23s and ode15s. The “s” in their names stands for “stiff”, meaning they specialize in solving such problems. Dr Stamatakis later confirmed our switch on the solvers, as well as our suspicion about the nature of the problem. Although among the two of them there is no easy choice, we stuck with ode15s, sacrificing a little speed for some accuracy.

Figure 16: The solvers in SimBiology that can solve stiff problems, their accuracy level, and a brief description. Source: https://www.mathworks.com/help/matlab/math/choose-an-ode-solver.html two of them there is no easy choice, we stuck with ode15s, sacrificing a little speed for some accuracy.

Differential Rules & Kinetics

Another technical hurdle we had to overcome for the deterministic part of our modeling was choosing between rate rules and reaction rates. SimBiology provides many tools for the user to describe the reactions in a formal algebraic manner (‘Assignments’, ‘Rules’ etc.), and when it comes to kinetics and differential formulas the way to do so is either with rate rules or with reaction rates. Reaction rates refer to the kinetics of a single reaction (so it is valid to suppose each reaction has its own) and it can be a default or a custom one. Rate rules, on the contrary, are strictly custom differential formulas that give us how one single species of the model changes with respect to time. We chose the latter, bearing in mind that in the lac operon, our equations refer to the changes in the concentration of each compound separately, not to any particular reaction. The fact that the lac operon module kinetics are more species-oriented than reaction-oriented, besides making an interesting observation, helped us gain more knowledge on the system and be conscious about our choices.

The species-oriented philosophy of our model, as well as the useful advice of Dr. Stamatakis , led to us choosing Mass Action kinetics to depict its dynamics. Mass Action Kinetics is a technique popular among transcription units whose proteins (monomers), once synthesized, dimerize and it can provide a model receptive to assumptions which simplify it without distorting the information we extract. Mass Action, based on the homonymous law, introduces a set of ODEs, each of which is associated with each species separately and then it reduces it to a linear system represented by a stoichiometric matrix that is solved numerically. Certain reactions which were enzymatic were modeled with Michaelis-Menten kinetics, which are usually employed when we have a reaction where an enzyme converts a substrate into a product. In these cases, calculation via Mass Action Kinetics would require the solution of a standard system of ODEs, something that is not necessary with Michaelis-Menten, since it allows us to make simplifications: when the binding reaction of the enzyme to the substrate is much faster than both the dissociation of the enzyme from substrate and the conversion of the complex into the product with the release of free enzyme molecules, the enzyme-substrate complex changes so slowly over time that it can be treated as if it were at the equilibrium (Marchisio M. A., (2018)).

References

Amalthea: A modular platform for monitoring gastrointestinal health. (2020). https://2020.Igem.org/. https://2020.igem.org/Team:Thessaly/Model

Andrews, B. W., Yi, T. M., & Iglesias, P. A. (2006). Optimal noise filtering in the chemotactic response of Escherichia coli. PLoS Computational Biology, 2(11), 1407–1418. https://doi.org/10.1371/journal.pcbi.0020154

Appleton, E., Madsen, C., Roehner, N., & Densmore, D. (2017). Design automation in synthetic biology. Cold Spring Harbor Perspectives in Biology, 9(4). https://doi.org/10.1101/cshperspect.a023978

Bartley, B. A., Kim, K., Medley, J. K., & Sauro, H. M. (2017). Synthetic Biology: Engineering Living Systems from Biophysical Principles. Biophysical Journal, 112(6), 1050–1058. https://doi.org/10.1016/j.bpj.2017.02.013

Bierwirth, M., & Goellner, J. (2007). Modeling of electrochemical noise transients. Materials and Corrosion, 58(12), 992–996. https://doi.org/10.1002/maco.200704093

Bokde, P. R. (2015). Implementation of Adaptive filtering algorithms for removal of Noise from ECG signal. 6(1), 51–56.

Cabrita, L. D., Gilis, D., Robertson, A. L., Dehouck, Y., Rooman, M., & Bottomley, S. P. (2007). Enhancing the stability and solubility of TEV protease using in silico design. Protein Science, 16(11), 2360–2367.https://doi.org/10.1110/ps.072822507

Chung, S. H., & Kennedy, R. A. (1991). Forward-backward non-linear filtering technique for extracting small biological signals from noise. Journal of Neuroscience Methods, 40(1), 71–86.https://doi.org/10.1016/0165-0270(91)90118-J

Dogra, S., Sona, C., Kumar, A., & Yadav, P. N. (2016). Tango assay for ligand-induced GPCR-β-arrestin2 interaction: Application in drug discovery. In Methods in Cell Biology (Vol. 132). Elsevier Ltdhttps://doi.org/10.1016/bs.mcb.2015.11.001

Gomez-Uribe, C., Verghese, G. C., & Mirny, L. A. (2007). Operating regimes of signaling cycles: Statics, dynamics, and noise filtering. PLoS Computational Biology, 3(12), 2487–2497. https://doi.org/10.1371/journal.pcbi.0030246

Heitzler, D., Durand, G., Gallay, N., Rizk, A., Ahn, S., Kim, J., Violin, J. D., Dupuy, L., Gauthier, C., Piketty, V., Crépieux, P., Poupon, A., Clément, F., Fages, F., Lefkowitz, R. J., & Reiter, E. (2012). Competing G protein-coupled receptor kinases balance G protein and β-arrestin signaling. Molecular Systems Biology, 8(590).https://doi.org/10.1038/msb.2012.22

V. Hsiao, A. Swaminathan and R. M. Murray (2018), "Control Theory for Synthetic Biology: Recent Advances in System Characterization, Control Design, and Controller Implementation for Synthetic Biology,". IEEE Control Systems Magazine, vol. 38, no. 3, pp. 32-62, June, doi: 10.1109/MCS.2018.2810459

Ji, Z., Yan, K., Li, W., Hu, H., & Zhu, X. (2017). Mathematical and Computational Modeling in Complex Biological Systems. BioMed Research International, 2017.https://doi.org/10.1155/2017/5958321

Li, C., Donizelli, M., Rodriguez, N., Dharuri, H., Endler, L., Chelliah, V., Li, L., He, E., Henry, A., Stefan, M. I., Snoep, J. L., Hucka, M., Le Novère, N., & Laibe, C. (2010). BioModels Database: An enhanced, curated and annotated resource for published quantitative kinetic models. BMC Systems Biology, 4https://doi.org/10.1186/1752-0509-4-92

MacDonald, J. T., Barnes, C., Kitney, R. I., Freemont, P. S., & Stan, G. B. V. (2011). Computational design approaches and tools for synthetic biology. Integrative Biology, 3(2), 97–108https://doi.org/10.1039/c0ib00077a

Marchisio M. A., & Stelling, J. (2009). Computational design tools for synthetic biology. Current Opinion in Biotechnology, 20(4), 479–485.https://doi.org/10.1016/j.copbio.2009.08.007

Marchisio M. A.. (2018). Learning Materials in Biosciences. Introduction in Synthetic Biology. http://www.springer.com/series/15430

Nam, H., Hwang, B. J., Choi, D. Y., Shin, S., & Choi, M. (2020). Tobacco etch virus (TEV) protease with multiple mutations to improve solubility and reduce self-cleavage exhibits enhanced enzymatic activity. FEBS Open Bio, 10(4), 619–626.https://doi.org/10.1002/2211-5463.12828

Stamatakis, M., & Mantzaris, N. V. (2009). Comparison of deterministic and stochastic models of the lac operon genetic network. Biophysical Journal, 96(3), 887–906. https://doi.org/10.1016/j.bpj.2008.10.028

Swain, P. S., Elowitz, M. B., & Siggia, E. D. (2002). Intrinsic and extrinsic contributions to stochasticity in gene expression. Proceedings of the National Academy of Sciences of the United States of America, 99(20), 12795–12800https://doi.org/10.1073/pnas.162041399

Vanarsdale, E., Hörnström, D., Sjöberg, G., Järbur, I., Pitzer, J., Payne, G. F., Van Maris, A. J. A., & Bentley, W. E. (2020). A coculture based tyrosine-tyrosinase electrochemical gene circuit for connecting cellular communication with electronic networks. ACS Synthetic Biology, 9(5), 1117–1128https://doi.org/10.1021/acssynbio.9b00469

Walker, N., Nghe, P., & Tans, S. J. (2016). Generation and filtering of gene expression noise by the bacterial cell cycle. BMC Biology, 14(1), 1–10. https://doi.org/10.1186/s12915-016-0231-z

A.V. Oppenheim and R.W. Schafer (2010). Discrete-Time Signal Processing, 3rd edition, Pearson.

O'Connor, C. M. & Adams, J. U. (2010). Essentials of Cell Biology. Cambridge, MA: NPG Education.

https://doi.org/10.1186/s12915-016-0231-z autocorrelation function in the MATLAB language

https://doi.org/10.1186/s12915-016-0231-z Solvers on Simbiology