Team:TU Darmstadt/phoenix

Phoenix – TUDA iGEM 2021

Phoenix

Software Introduction

Introducing our Software Concept Phoenix

In nature, bacteria frequently occur in microbial communities where different strains and even different species co-exist and benefit from each other.​1,2​ Examples for such communities include biofilms, and as such, we thought about the varying aspects of creating biofilm co-cultures and how they could support our project.
The terms ‘co-culture’, ‘community’ and ‘consortium’ in the following sections are used interchangeably, referring to strains/species of bacteria sharing an extracellular space and actively exchanging metabolites.

There are many useful reasons for bacteria to naturally form microbial communities or rather, for us to design such microbial communities for our own benefit. They can aid in researching human health, advancing metabolic engineering and environmental sustainability.​3​ One major aspect is the ability of bacteria to engage in cross-feeding, which means exchanging metabolites amongst each other. This reduces the need for any individual strain to produce every essential metabolite themselves and instead allows them to divide that labor, i.e., they specialize. Consequently, the communities are provided an evolutionary advantage by surviving under constraints that each strain could not individually compete in. Diversity within these communities is naturally maintained and tasks which are metabolically intensive are allocated in separate strains.​1​

From an engineering perspective, such division of labor can increase efficiency and reduce the cost of high-throughput bacterial productions. If applied to our biofilm, we could ensure a longer life span of our sleeper cells due to the lessened metabolic burden on the cells.​2​
One facet we carefully considered is metabolic dependency. For example, using different Bacillus subtilis strains that are dependent on each other for essential metabolites, would ensure an even and homogenous spread of the sleeper cells within the biofilm. Thus, any invading pathogenic cell could be targeted regardless of the place from which it entered and ensure easy access for the subsequent phage infection.
Simultaneously, there is less of a risk of individual cells escaping the biofilm, hence ensuring biosafety on top of our kill-switch.

In our research we came across DOLMN​1,3​, a constraint-based modeling approach to explore feasible (multi-strain) metabolic networks, and FLYCOP​2​, a software framework for modeling microbial communities.
DOLMN can be used to model metabolic dependency, which would be a great addition to our project as outlined above. FLYCOP can assess the long-term stability of multi-strain metabolic networks. A combination of these programs would not only enhance our biofilm design but also allow us to evaluate which community configurations would remain stable and thus most suitable to our project.

However, both programs require very specific, nonstandard input formats where files need to be specifically parsed for that purpose. This is a very tedious and time-consuming process which requires previous programming knowledge. Hence, the bar of entry is quite high, especially for people with limited experience in informatics, like many iGEM students. Therefore, we conceptualized the idea of a software which would combine both programs and manage the pre- and post-processing involved in the design of microbial communities. It would significantly enhance ease of use and enable future iGEM Teams to generate feasible metabolic networks and assess their long-term stability. A flow chart of the steps involved in our software can be seen in figure 1. We named this software Phoenix in honor of our project acronym PHIRE BYRD.

Figure 1. A flow chart of our software concept to design co-dependent, long-term stable consortia. First, data is acquired from the database BiGG​4​, then parsed and formatted into a community genome-scale metabolic network (cGMN). These are fed into an algorithm based on DOLMN​1​ which is run multiple times to generate different results of metabolic dependencies within said community. Afterwards, a chosen preselection of the output vectors is formatted into suitable consortia configurations for FLYCOP​2​ along with a corresponding fitness function. FLYCOP then analyzes these configurations, predicts long-term stability and evaluates the fitness and quality of the results before finally giving one optimal consortium configuration as readout in the form of a cGMN.

The BiGG database​4​ forms the foundation of our software concept. It’s a knowledgebase of genome-scale metabolic network reconstructions. It allows our software to pull all the data needed for analysis of metabolic models, like the entirety of metabolic reactions taking place in a bacterial strain as well as the corresponding metabolites.

Using a parser, we can then convert all the individual species’ data pulled from BiGG into a suitable format for consortia: a community genome-scale metabolic network, a cGMN​5​. A cGMN is a combination of n genome-scale models.

This cGMN is used as input for the modeling of metabolic dependence based on an algorithm called DOLMN​1,3​, which stands for Division of Labor Metabolic Network. Part of this algorithm is a community flux balance analysis (cFBA)​6​. A cFBA can predict many aspects of a consortium, for example the growth and growth rate, the required rates of metabolic reactions within and between the microbes and the relative species’ abundance.​6​

The algorithm solves a non-deterministic polynomial problem and thus falls into the heuristic category, as such this process can be reiterated as needed. We settled on a repeat process of 5 to 20 times to allow for a number of results for the subsequent step without taking too long or at too high of a computational cost.​1​

From these different results of metabolic dependencies configurations, Phoenix chooses a preselection as input for FLYCOP which analyzes the suggested consortia configurations and evaluates the generated solutions in terms of fitness and quality. FLYCOP enables researchers to design microbial consortia and browse thousands of possible consortium configurations with multiple criteria to consider at every corner. The result is one optimal consortium configuration, an optimized cGMN, with the desired metabolic dependencies, yield and other attributes.​2​

Creating the Community Scale Matrix

In order to be able to use genomic scale metabolic networks as a base for modeling the metabolic dependency of a co-culture, it is necessary to obtain information about the metabolites and reactions of the involved bacteria.  To make this process user-friendly we decided to write a program, that can retrieve the necessary information from the BiGG Database, which has an Application Programming Interface (API), making the data easily accessible. Following that, a parser converts the downloaded reactions and metabolites of one organism into a matrix describing the Genome scale Metabolic Network (GMN). For this, the GMNs are downloaded in the format JSON.  The reactions are set up as follows: A + B -> C (Figure 2 R3). With the help of the libraries JSON and Pandas a Flux Balance Analysis (FBA) is modeled in Python which leads to the mentioned matrix. The mathematical background of the FBA is based on Khandelwal et al.​6​, while our explanation follows Dr. Meghan Thommes​3​.

The stoichiometric matrix (S) (Figure 2) consists of the metabolites forming the rows, and the reactions between them forming the columns. Therefore, the stoichiometric matrix reaches a dimension of m × n with m = number of metabolites and n = number of reactions. To describe the flow of metabolites through the network we introduced a vector (x), which represents the metabolic fluxes. In the next step, upper (xup) and lower (xlb) constraints were set to reduce the number of possibilities to a realistic range. The last factor for the basic description of the FBA determines the weighting of metabolic flux in achieving the specified goal. This weight coefficient vector (c) is then used with (x) to formulate a goal function.

In order to reach the specified goal, for example the maximation of biomass gain, a target function has to be maximized. In this example the goal function would be (cTx). To simplify the problem a quasi-steady-state assumption (QSSA, (Sx=0)) is made.

$$maximize_x\quad c^Tx \\ such\ that\quad Sx=0\\ x_{lb}\leq x\leq x_{ub}$$

Figure 2. Simplified representation of the stoichiometric matrix of one species (‘i’). The reactions (R1-R5) form the columns and the metabolites (Ai-Di) the rows. The columns with only positive values are highlighted as import reactions (green), while the columns containing only negative values are highlighted as export reactions (red). For further simplifications all values shown in the matrix are set to 1.

Figure 3. Simplified representation of the community scale matrix. The extracellular space of the community scale matrix is created using the import (green) and export (red) reactions of two organisms (species ‘i’, pink, and species ‘j’, yellow). This leads to an exchange of metabolites (such as glucose or fructose) between the two organisms, making them metabolically dependent on one another.

In Figure 3 the construction of the community scale matrix (Sc) is shown. It is build using the matrices for each organism (pink for species i and yellow for species j) generated by the parser and matrices describing the extracellular space (purple). Therefore, just the reactions, where metabolites are only produced (green) or consumed (red) by one of the organisms, are considered. We can assume, that these are export (red) or import (green) reactions because a metabolite cannot emerge from nothing (only positive values in one reaction) or vanish into nothing (only negative values in one reaction). To create the extracellular matrix the import and export reactions are used, however with an inverted sign.  Metabolic dependency arises from the fact that the extracellular space allows the two organisms to exchange metabolites that can only be produced by the other one (Figure 3 right side). Transport reactions combine the previously defined export and import reactions.

This results in a redefinition of the quantity of metabolites and reactions. Thus, the quantity of metabolites is now composed of intracellular and extracellular metabolites, whereas the quantity of reactions is composed of intracellular, extracellular, and transport reactions. This leads to a matrix (S) containing all the metabolites and reactions, regardless of which strain they come from.

A community matrix (Sc) is then created from this matrix (S) by defining that all strains use the same extracellular space and all strains can theoretically run all reactions. This leads to (Sc) having a sub-matrix for extracellular reactions and metabolites and sub-matrices with internal and transport reactions, one for each organism (Figure 3).

Modeling Metabolic Dependence

Our goal is to model metabolic dependencies, thereby obtaining a strategy for interdependence of two or more organisms. Genomic scale metabolic networks are used, which are taken from the BiGG database​4​. These are converted to a community scale matrix (Sc)​1​ Using the mathematical background also used by the program DOLMN​1,3​ we conduct flux balance analysis in order to determine how to approach metabolic dependency between the different species of strains (i.e. which metabolic pathways (and thus genes) need to be knocked out to create metabolic dependency). To obtain metabolic dependence of two organisms after creating a community scale matrix we need to reduce the amount of allowed internal and external reactions to a desired goal after restricting the organism to biomass creating reactions. For better efficiency the obtained results are then striped of unnecessary reactions. Through this approach, a local optimum is determined. After the community matrix has been established, it is necessary to determine which reactions take place in the individual organisms, for this purpose a vector (t) is introduced. This vector can only take the values 0 or 1 in each field and describes whether a reaction occurs in an organism or not. Therefore, (t) spans the whole width and indicates for each organism whether the respective reaction is utilized by it. By decreasing the number of internal reactions, we can create a metabolic dependency between organisms. Due to the limitation of internal and transport reactions for each organism, the organisms are dependent on exchanging substances with one another, resulting in a metabolic dependency. Consequently, our next problem is to determine which reactions should take place in which organism, mathematically this means determining the vector (t). This results in a modified version of the equation X. Where the matrix (S) is replaced with (Sc) and the additional constraints for (t). To be able to multiply (t) with (x), we convert the vector (x) into a diagonal matrix (diag(x)) with (x) being the main diagonal. To limit the number of active reactions in (t) the regularization matrix R is introduced, and the mentioned constraints are set (tmin & tmax).

$$maximize_x\quad c^Tx \\ such\ that\quad Sx=0\\ x_{lb}\leq x\leq x_{ub}\\ diag(x_{lb})t\leq x\leq diag(x_{ub})t\\ t_j \in\{0,1\}\\ t_{min}\leq Rt\leq t_{max}$$

For a successful co-culture the biomass of all organisms in the community has to increase. This is done by constraining the biomass flux (xbiom) of each organism (k) to a positive value:

$$x_{biom_1}=…=x_{biom_K}\leq 0.1$$

The constraint for the maximal allowed number of reactions per organism (k) is divided into a condition for the internal and the transport reactions. Therefore, the constraints for (Rt) are substituted with:

$$\sum_{j\in TR_k}t_j\leq T_{TR},\quad k\in\{1, …\ K\}\\ \sum_{j\in IN_k}t_j\leq T_{IN},\quad k\in\{1, …\ K\}$$

With (TTR) being the upper limit for the transport reactions and (TIN) being the upper limit for internal reactions. The best solutions from this optimisation problem (x* and t*) are now used to decrease the number of redundant reactions using the l1 norm:

$$such\ that\quad S^cx=0\\ x_{lb}\leq x\leq x_{ub}\\ x_{biom_1}=x_{biom_K}^{*}, k\in\{0,1\}\\ diag(x_{lb})t^{*} \leq x\leq diag(x_{ub})t^{*} $$

With the used equations, the internal and external reactions could now be limited, which promotes the formation of division of labor. In addition, redundant reactions could be stripped out by minimization.

FLYCOP

FLexible sYnthetic Consortium OPtimization

As already explained in the introduction, one aspect of our software flow (Figure 1) involves a software framework called FLYCOP (FLexible sYnthetic Consortium OPtimization)​2​. From the previously generated co-dependency configurations of our cGMNs using DOLMN, a preselection is prepared as input for FLYCOP. FLYCOP then analyzes, evaluates and optimizes given configurations with a fitness function and assesses the quality of the given values, also examining the long-term stability of consortia. Finally, the best configuration of a specific microbial consortium found by FLYCOP is given as output alongside a report which includes further details.​2​

Since the output of our co-dependency algorithm comes in the form of binary reaction vectors and continuous flux vectors for all reactions​1​, those vectors need to be prepared into a suitable format for FLYCOP. For proper comparison, FLYCOP needs many consortium configurations as input, specifically the genome-scale metabolic models (GEMs) of the microbial strains within said community and the corresponding parameters, which each have various values with a suitable range. Taking the evolution of such consortia into account also requires a fitness function which allows FLYCOP to compare generated solutions. This function is also vital to defining the optimization objective (or in the case of multiple objectives, the weighted sum of the individual objectives) of the consortium specific to the entire purpose of the engineering process in question.​2​
The optimization process consists of a so-called stochastic local search to approach a multiple objective problem instead of modulating each data point itself. Afterwards, the results are iteratively assessed with a score which determines their quality, taking into consideration how certain alterations in values influence the consortium as a whole.​2​

Usually, to work with FLYCOP, the following steps need to be done (as outlined in García-Jiménez et al. 2018​2​):

  1. Community conceptual design, where the consortium and its strains as well as the cross-feeding relationships are defined to suit the aim of the engineering endeavor.
  2. Single strain design, which involves the base genome-scale models of selected bacterial species and modifies those for co-living in a community by adding, removing or modulating required reactions and their bounds.
  3. Culture medium composition and uptake of metabolites definition, for clearly defining the framework conditions of the consortium as well as limiting factors, such as oxygen.
  4. Selecting parameters and range of values, to represent variables whose values’ combinations characterize each evaluated consortium.  Only discrete values, not continuous ones, are defined so that the process complexity is reduced and less effort is wasted on too similar solutions.  
  5. Defining the optimization objective and fitness function, so that the fitness can be computed depending on the consortium’s targeted purpose.

Our software concept would automate steps 1 through 4, significantly reducing the work usually involved to apply a framework such as FLYCOP.

Using these inputs, FLYCOP then updates the single metabolic models with COBRA​7​, performs consortium simulations with COMETS​8​, and finally selects and evaluates the candidate consortium configurations using SMAC​9​.

Typically, around 500 different configurations are explored to converge a solution and generate solid data. The aforementioned summary report provides further information for additional data mining analysis, such as scatterplots with the explored values, correlation plots, ellipse plots, growth curves, tables of the configurations with parameters, values, evaluated fitness, metabolite concentrations and other such metrics. Furthermore, alternative solutions with similarly high fitness values are included in this summary report.​2​

Finally, the actual output of FLYCOP and ultimately of our whole software concept is just one configuration: the best configuration determined for our needed purpose. This configuration can then be applied to actual lab experiments, for example our project PHIRE BYRD. Dividing the labor of the sensing of the pathogens, the production of the phages and ensuring the homogeneity of the biofilm would reduce the individual stress on the cells and result in a more resistant sleeper cell biofilm which could more effectively target Pseudomonas cells.

Closing Remarks

In regards to our flow chart (Figure 1), we have not managed to implement every step we originally envisioned. Comprehending the literature surrounding DOLMN and FLYCOP required high-level knowledge of both programming and mathematics. As the software task force consists of Bachelor students with no prior programming experience, we had difficulties understanding the source codes of the original programs, which were written in MATLAB. We first had to learn how to code with Python and MATLAB and strived to gain an understanding of the complex mathematics behind microbial consortia configurations. We also put quite an effort into conceptualizing an automation process which would feasibly combine both of the existing algorithms, carefully considering the different input and output formats, the parsing involved in pulling information from databases and the validity of the results. This took a lot of time which is why we were not able to implement part of the post-processing of Phoenix. So for now, Phoenix unfortunately remains a concept.

Nevertheless, we gained significant knowledge of coding with Python and interpreting code. We began to migrate the code for DOLMN from MATLAB to Python, which required us to learn many new skills. All things considered, working on Phoenix was an extremely enriching learning experience for everyone involved.

The concept we have developed could be extremely useful in applications of synthetic biology and microbiology, optimizing the design of communities and biofilms. Phoenix could measurably increase the biological safety involved in such projects and aid iGEM students in the years to come. Perhaps another iGEM Team could pick up where we left off (see our GitHub), just as we did with the B. subtilis kill-switch. Not only are the chosen strains for the microbial communities dependent on each other for survival, thus ensuring biocontainment, but they are also able to divide labor and lighten the metabolic burden on individual strains. This could be useful in any high throughput processes by lengthening the life span of the bacterial cells and increasing their productivity.

References

  1. 1. Thommes M, Wang T, Zhao Q, Paschalidis IC, Segrè D. Designing Metabolic Division of Labor in Microbial Communities Dutton RJ, editor. mSystems. 2019. http://dx.doi.org/10.1128/msystems.00263-18. doi:10.1128/msystems.00263-18
  2. 2. García-Jiménez B, García JL, Nogales J. FLYCOP: metabolic modeling-based analysis and engineering microbial communities. Bioinformatics. 2018:i954–i963. http://dx.doi.org/10.1093/bioinformatics/bty561. doi:10.1093/bioinformatics/bty561
  3. 3. Thommes M. Strategies for engineering microbial communities. Boston: Boston University; 2020. https://open.bu.edu/handle/2144/41030
  4. 4. King ZA, Lu J, Dräger A, Miller P, Federowicz S, Lerman JA, Ebrahim A, Palsson BO, Lewis NE. BiGG Models: A platform for integrating, standardizing and sharing genome-scale models. Nucleic Acids Research. 2015:D515–D522. http://dx.doi.org/10.1093/nar/gkv1049. doi:10.1093/nar/gkv1049
  5. 5. Hao T, Wu D, Zhao L, Wang Q, Wang E, Sun J. The Genome-Scale Integrated Networks in Microorganisms. Frontiers in Microbiology. 2018. http://dx.doi.org/10.3389/fmicb.2018.00296. doi:10.3389/fmicb.2018.00296
  6. 6. Khandelwal RA, Olivier BG, Röling WFM, Teusink B, Bruggeman FJ. Community Flux Balance Analysis for Microbial Consortia at Balanced Growth Vera J, editor. PLoS ONE. 2013:e64567. http://dx.doi.org/10.1371/journal.pone.0064567. doi:10.1371/journal.pone.0064567
  7. 7. Heirendt L, Arreckx S, Pfau T, Mendoza SN, Richelle A, Heinken A, Haraldsdóttir HS, Wachowiak J, Keating SM, Vlasov V, et al. Creation and analysis of biochemical constraint-based models using the COBRA Toolbox v.3.0. Nature Protocols. 2019:639–702. http://dx.doi.org/10.1038/s41596-018-0098-2. doi:10.1038/s41596-018-0098-2
  8. 8. Temprosa M, Moore SC, Zanetti KA, Appel N, Ruggieri D, Mazzilli KM, Chen K, Kelly RS, Lasky-Su JA, Loftfield E, et al. COMETS Analytics: An Online Tool for Analyzing and Meta-Analyzing Metabolomics Data in Large Research Consortia. American Journal of Epidemiology. 2021. http://dx.doi.org/10.1093/aje/kwab120. doi:10.1093/aje/kwab120
  9. 9. Hutter F, Hoos HH, Leyton-Brown K. Sequential Model-Based Optimization for General Algorithm Configuration. Lecture Notes in Computer Science. 2011:507–523. http://dx.doi.org/10.1007/978-3-642-25566-3_40. doi:10.1007/978-3-642-25566-3_40
Eppendorf hilgenberg Zymo Research New England Biolabs Inc.
IDT Integrated DNA Technologies Snapgene Biebertaler Blutegelzucht Promega
DWK Life Sciences Science Birds Twist Bioscience Microsynth SEQLAB
TU Darmstadt
Supertext Brand
Carl Roth Sparkasse
 Quantum Design Europe