Team:Wageningen UR/Software


iGEM Wageningen 2021



Logo of the iGEM PIPE

The iGEM PIPE

In Cattlelyst, we have engineered a number of metabolic pathways to alter the behaviour of Escherichia coli and Pseudomonas putida. This required a lot of research, however wouldn't it be nice to design your Genetically Engineered Machine with just a few clicks? That’s why the Cattlelyst team worked on a program that does all the work for you.

The iGEM Pipeline for Improved Pathway Engineering, the iGEM PIPE is a Python package that offers tools to explore the engineering strategies to make a chassis organism able to grow on atypical carbon sources (e.g. pollutants such as methane) and produce targets (e.g. high-value compounds). The goal of the iGEM PIPE is to suggest the most favourable engineering strategy to obtain strains that are optimal for both bioremediation and bio-production purposes. This tool was built to facilitate the transition between the computational design and the validation of the strategies in vivo. For this purpose a wikibase of standardised biological parts was connected to the PIPE, which is thereby able to suggest the Biobricks that have to be introduced into the chassis for reproducing the designed pathway. Additionally, a variant of a codon harmonisation algorithm has been implemented in the iGEM PIPE with the purpose of generating codon-harmonised restriction site-free Biobrick sequences.

In this page you can find answers to questions such as:

  • Why was there need for such a program?
  • How does it work?
  • How do I install it and use it?

We provided a manual and a list of tutorials to help the users in getting started with this Python package and guide them step-by-step through the main functionalities. The tutorials have been validated by other iGEM teams, the Groningen and Maastricht iGEM teams 2021. We are very grateful to them, since they helped us pointing out bugs and making the installation simpler.


The iGEM PIPE was developed while working on an original engineering objective: the development of a strain of E. coli utilising methane as the sole carbon source and producing L-lactate. We validated the PIPE by using it for the identification of engineering designs in three test cases on the models of two organisms: E. coli and P. putida. Additionally, the PIPE has been used for the identification of possible engineering strategies for establishing auxotrophic co-dependency between the same two microorganisms. This should give you an idea of its relevance for the Cattlelyst project. In our case, we asked the PIPE to suggest how to alter the metabolism of the two bacteria to make them produce an amino acid which the other organism is auxotrophic for.


If you want to know more about the test case and validation and how we made use of the iGEM PIPE in our project follow the links below.

Why is the iGEM PIPE needed?

Biotechnology is particularly suited for targeting two societal needs: the degradation of pollutants and the production of commercially relevant chemicals. Microorganisms have great potential for the completion of both tasks [1], [2]. Their natural variability is harnessed by means of metabolic engineering and synthetic biology with the aim of creating strains with improved phenotypes for bio-degradation/production [3],[4]. Currently, numerous tools are available for the in silico design of microorganisms producing compounds relevant for industry [5], [6]. Nonetheless, few tools exists for the identification of organism-specific strategies for strains to be used in bioremediation. Tools that suggest designs involving both addition and deletion of reactions exist only in the context of improved (growth coupled) production. Thus one of the novel aspects of the PIPE of WUR iGEM team 2021 is the suggestion of engineering strategies allowing growth of genome scale metabolic models on uncommon substrates (e.g. pollutants and xenobiotics).

Additionally, this Python package aims at facilitating the transition from in silico design of the engineering strategy to application in vivo of a microorganism in the laboratory. The transition usually consists in using biological parts, Biobricks, to confer the new functionalities (e.g. gene knock-ins) and other molecular techniques to knock-out other genes. The identification of the Biobrick that matches the reactions that the iGEM PIPE suggests could be a time consuming task to do manually, this is why the PIPE automated this search, by making use of a new version of the Biobrick database, the Biobrick wikibase.

Biobricks can encode for proteins or other regulatory elements such as promoters. When they encode proteins, most of the times they encode for enzymes. However, successful expression of heterologous proteins in a host organism is not a trivial task [7]. Many aspects can hamper protein expression and cause misfolding, which ultimately results in the formation of inclusion bodies that affect cell growth [8]. Differences in codon usage frequencies between organism are often recognised as the cause of poor expression [8]. Hence, this strain design pipeline we present, not only includes the identification of the Biobricks within its workflow, but it also takes into consideration one of the most commonly used methods for the optimization of protein expression, codon harmonization. Functions of this Python package codon harmonizes the coding sequences of the Biobricks in order to have a higher change of successful expression the new host.

Collaborations

The PIPE has been made accessible to other iGEM teams during the final developmental stage and this collaboration have led to the resolution of several bugs and the easing of the installation procedure with all the required packages.

  • Bye-Monia: iGEM team Groningen 2021
  • Methagone iGEM team Maastricht 2021

Read more about how they helped us impoving our tool in the Collaboration page!

Go to Collaborations

Target users

The iGEM PIPE is a Python package that has been developed on the basis of CobraPy package, thus the user should be familiar with the principles of constraint-based analysis. Given the absence of a graphic user interface, the user should ideally have some basic knowledge of coding in Python. This basic knowledge facilitates the usage of the functions of the PIPE independently. To guide the users in getting familiar with this Python package we created tutorials in the form of Jupyter notebooks. The Jupyter notebooks are available at the GitLab repository of our Cattlelyst project. Follow this link to go to the repository of the PIPE in our WUR iGEM 2021 GitHub page.


Some useful links that we used ourselves to familiarize with the mentioned platform are given below:



The PIPE's Architecture

The functionalities of the iGEM PIPE are performed in a series of steps that the program follows one after the other. Here is a brief explanation of these steps, that explain how the PIPE is able to perform its functions.

logo
logo
Figure 1: The flowchart consists of seven steps: 1) Initialization; 2) Reaction search within a database using the Gapfilling algorithm [9], [10] 3) Pathway definition and thermodynamic feasibility assessment; 4) Comparison of the metabolic engineering strategies involving reaction knock-ins; 5) Query of the Biobrick wikibase to find biological parts matching the added reactions; 6) Codon harmonisation of the coding sequences from the Biobricks; 7) Optional identification of reaction knock-outs with the Optknock algorithm [11]. Each step is explained in detail in the following paragraphs.


Here is a simplified image is of the PIPE's set-up (Figure 1). The program uses models of organisms as input. Successively, it performs analyses for the addition and removal of reactions in order to reach the goal set by the user. In this way several design strategies are generated. The next steps involve the evaluation of the strategies and the identification of the Biobricks that have to be used to reproduce the engineering strategy in vivo. The output consists of comma separated values (CSV) files with the detailed information of the analyses and the suggested strategies.

  • 1. Initialization


    The PIPE takes one input, which is a file in csv extension that contains the user-indicated information needed for the analysis. This includes information such as what the microbe should uses to grow (i.e. carbon source) and what it should produce (i.e. target compound).

    Continues...
    arrow_downward


    During the initialization phase the model of the reference organism and the BiGG’s reaction database are loaded and prepared for the analysis. The reaction model is also called universal model since is the sum of all the information stored in BiGG (Biochemically, Genetically and Genomically [16]) Models database. The model of the microorganism is also obtained from BiGG.
    The user should also indicate reactions that are known to be relevant for the conversion. Those are added to the reaction database if they are not already present. This expansion phase is needed because BiGG Models database, despite ensuring well curated models, host a limited number of them, hence some specific reactions might be missing from the universal model.


2. Reaction addition

The PIPE first uses flux balance analysis (FBA [9]) to check whether the model of the organism can grow on the indicated carbon source and produce the target. If the optimization is infeasible it does the following:

  1. It uses Gapfilling algorithm to scan the reaction database for solutions that allow assimilation of the indicated carbon source;
  2. It looks for reactions that allow production of the target compound.
  • 3. Pathways definition and thermodynamic feasibility


    The result of running Gapfilling is a list of different designs involving the addition of reactions, which are then heterologous to the reference organism. Therefore criteria are needed to fish out the best ones. One of the criteria used by the iGEM PIPE is the thermodynamic feasibility, which is assessed by one module of the PIPE. First of all, it identifies the active reactions in the conversion of the substrate into the product. It does so by simulating the production of the target in the model, while minimising the total cellular activity. This corresponds to the constraint-based method called parsimonious flux balance analysis (pFBA) [13]. In simpler terms, after the addition of the candidate reaction, this algorithm is answering the question “what is the minimum number of steps needed to go from the substrate to the product?”.

    Continues...
    arrow_downward

    This will lead to a list of reactions, from which transporter between cellular compartments and reactions for cofactor regeneration are eliminated [14]. In this was only the reactions involved in the key steps of the conversion are selected and constitute the pathway.


    The resulting list of reactions is used for the calculation of max-min driving force (MDF) value [15]. Positive MDF values indicate feasibility, because they mean that at the specified cellular concentration of the metabolites, the pathway has a negative ΔG [15]. MDF is calculated using a Python version of the eQuilibrator tool. The equilibrium constants of the reactions are automatically retrieved from eQuilibrator [16].

4. Evaluation of knock-ins engineering strategies

The results from the identification of knock-ins strategies and pathway evaluation are used to compare the model variants of knock-ins mutants among each other. The comparison criteria include:

  1. the number of reaction knock-ins;
  2. the consumption rate (negative flux) of the substrate when growth is the model’s objective;
  3. the production rate of the target;
  4. the total number of reactions involved in the conversion (pathway length);
  5. the MDF value of the pathway.

The values of each model criterium were first normalised with formulas that return a number between 0 and 1, with 1 indicating the optimum and 0 indicating the worst among the models.


The normalised scores Si,j for criteria a-d for each variant are summed using an additive scoring function in which user-defined weights ranging from 0 to 1 are included. Equation 1 indicates that the sum of all criteria’s weights equals one:

Equation 1

where wj is the weight of criterion j. As show in equation 2, per each model the normalised scores Si,j are multiplied by wj and the weighted scores are summed up to obtain the final score Si :


Equation 2

  • 5. Interaction with Biobrick wikibase


    This module is the core of the iGEM PIPE strength in facilitating th>e transition from computational design to experimental application. Adding a reaction to a model, in lab terms corresponds to making knock-ins of genes encoding for enzymes and cofactors. The Biobrick wikibase is used to find the biological parts that match the heterologous reactions of each design strategy. The link between Biobricks and reactions is given by the Enzyme Commission (EC) numbers which are associated to both. The reactions knock-ins are obtained from the universal model. The Biobrick wiki is then queried to find parts with the same EC number of the heterologous reactions. Reactions correspond to enzymatic functions, which may be catalyzed by multicomplex enzymes, hence multiple genes could be required for the introduction of that functionality in vivo.

    Continues...
    arrow_downward

    The presence of Biobricks for each engineering strategy is evaluated by calculating the percentage of EC number-matching parts over the number of reactions knock-ins in each model variant. There might be a trade-off between the optimal design strategy resulting from the evaluation of the knock-ins mutants and the best engineering option according to Biobrick availability. For instance the best strategy could involve 3 reactions of which one does not have any corresponding Biobrick available. While the second best option consist of 4 reactions but all have matching Biobricks. The user could then choose to try and clone the genes required for knocking in the three genes of the first option or rather chose the second strategy that would save time. To facilitate the users in making a choice the iGEM PIPE includes the functionality of graphically visualize the metabolic score and the score of the Biobircks presence of each knock-ins design strategy.


    Figure 2 shows an example of the two-axes plot distribution of possible outputs identified with labels model0.1, model0.2 and model0.3. The rectangle and circles represent alternative cases. Assuming that a strategy is found in the location of model0.1, that result must be considered as the best strategy for the design problem. This is because model0.1 represents the best result possible, which has both the highest score for the knock-in engineering strategy and the highest percentage of Biobricks matching the heterologous reactions. However, if only the two oval highlighted strategies are found, the choice of the best solution will not be as straightforward, since arguments could be made both for model0.2 and model0.3. On one hand, model0.2 has the highest metabolic score derived by the evaluation of the five criteria, but only half of the reaction knock-ins it requires have a matching part. On the other hand, model0.3 has a lower metabolic score, but all the heterologous reactions have a correspondent Biobrick part.

    graph
    Figure 2: Scatter plot for the visualization of the trade-off between metabolic score and percentage of Biobricks. The metabolic scores are plotted on the ordinate, while the percentage of Biobricks is shown on the abscissa. The different engineering strategies are indicated with labels (e.g. A). In this example, Model A and Model B represent the best solutions.

6. Restriction sites-free codon harmonisation

The Biobricks suggested by the previously described module can encode genes from different organisms, therefore we suggest codon harmonization for improving gene expression in the host of choice. The tool named Codon Harmonizer has been implemented in the iGEM PIPE to provide codon harmonized versions of the coding sequences for the heterologous functions. The recent rewrite of the Codon Harmonizer tool [10] has been chosen for the implementation of codon harmonisation in the PIPE.

This algorithm minimises the parameter called codon harmonization index that evaluates the differences in codon frequencies between the expression host and the native organism. The lower the CHI the better. The codon harmonization algorithm implemented in the PIPE had an additional feature. Which is the elimination of restriction sites (RSs). This is done by adding a check step after the harmonisation of the sequence in which the algorithm looks for one RS at the time. When a match is found, it substitutes the first codon involved in the RS with a synonymous one that has the next best impact on the CHI.

Codon harmonization does not guarantee optimal expression, although by providing the sequence already harmonized we think to speed up the still necessary trial and error tests for cloning.

  • 7. Reaction deletion


    The PIPE can also suggest reactions knock-outs that could allow to couple production to growth. The algorithm used for this analysis is Optknock [11]. Optknock finds reaction that if deleted cause the target to be obligatorily produced when growth is simulated [11]. A version of the algorithm written by Ruben Van Heck in Python is implemented in the iGEM PIPE. The PIPE will look for knock-outs only if the user has indicated to do so in the input file. Optknock is used on the models of the knock-ins mutants already able to produce the target compound (thus including the possible addition of reactions).


    Continues...
    arrow_downward

    For each model, six Optknock iterations are computed. The number of allowed reaction knock-outs is initially 1 and it is increased by one unit each round, reaching a maximum of 6 contemporaneous reaction knock-outs. Solutions of Optknock are not always successful in coupling growth and production, therefore they are verified by using the production envelope, which help evaluating the trade-off between production rate and growth rate in a model variant [12], [13].

Technical info

Usage

The iGEM PIPE can be called in a unique workflow from command line by calling the script "call_full_script_cmd.py". Alternatively, the functions that constitute the iGEM PIPE can be used from any Python IDE in combination to CobraPy package.


Have a look at the iGEM PIPE’s manual and make use of the tutorials 0 – 6 in the GitHub repository of the WUR iGEM project 2021 to get started.

Installation

This pipeline is packaged as a .whl file. It can be installed with pip.

pip install ./dist/cattlelyst2021-0.1.1-py3-none-any.whl


The scripts that constitute the iGEM PIPE can be accessed from the GitHub repository of Cattlelyst. Gurobi is used a solver for the PIPE, therefore, the Gurobi licence is necessary. A free academic licence can be made at this link. Only registration is necessary. The licence can be then saved in your PC following the procedure indicated at the website.

  • References
    arrow_downward
    1. F. Chen, L. Yuan, S. Ding, Y. Tian, and Q. Hu, “Data-driven rational biosynthesis design : from molecules to cell factories,” Brief. Bioinform., no. July, 2019, doi: 10.1093/bib/bbz065.
    2. P. J. Strong, S. Xie, and W. P. Clarke, “Methane as a resource: Can the methanotrophs add value?,” Environmental Science and Technology, vol. 49, no. 7. American Chemical Society, pp. 4001–4018, Apr. 07, 2015, doi: 10.1021/es504242n.
    3. R. García-Granados, J. A. Lerma-Escalera, and J. R. Morones-Ramírez, “Metabolic Engineering and Synthetic Biology: Synergies, Future, and Challenges,” Front. Bioeng. Biotechnol., vol. 7, no. MAR, p. 36, Mar. 2019, doi: 10.3389/fbioe.2019.00036.
    4. A. Dasgupta, N. Chowdhury, and R. K. De, “Metabolic pathway engineering: Perspectives and applications,” Comput. Methods Programs Biomed., vol. 192, p. 105436, 2020, doi: 10.1016/j.cmpb.2020.105436.
    5. M. H. Medema, R. Van Raaphorst, E. Takano, and R. Breitling, “Computational tools for the synthetic design of biochemical pathways,” Nature Reviews Microbiology, vol. 10, no. 3. Nature Publishing Group, pp. 191–202, Mar. 23, 2012, doi: 10.1038/nrmicro2717.
    6. M. R. Long, W. K. Ong, and J. L. Reed, “Computational methods in metabolic engineering for strain design,” Current Opinion in Biotechnology, vol. 34. Elsevier Ltd, pp. 135–141, Aug. 01, 2015, doi: 10.1016/j.copbio.2014.12.019.
    7. A. H. A. Parret, H. Besir, and R. Meijers, “Critical reflections on synthetic gene design for recombinant protein expression,” Curr. Opin. Struct. Biol., vol. 38, pp. 155–162, 2016, doi: 10.1016/j.sbi.2016.07.004.
    8. E. Angov, C. J. Hillier, R. L. Kincaid, and J. A. Lyon, “Heterologous Protein Expression Is Enhanced by Harmonizing the Codon Usage Frequencies of the Target Gene with those of the Expression Host,” 2008, doi: 10.1371/journal.pone.0002189.
    9. V. S. Kumar and C. D. Maranas, “GrowMatch: An Automated Method for Reconciling In Silico/In Vivo Growth Predictions,” PLoS Comput Biol, vol. 5, no. 3, p. 1000308, 2009, doi: 10.1371/journal.pcbi.1000308.
    10. J. L. Reed et al., “Systems approach to refining genome annotation,” Proc. Natl. Acad. Sci. U. S. A., vol. 103, no. 46, pp. 17480–17484, Nov. 2006, doi: 10.1073/pnas.0603364103.
    11. A. P. Burgard, P. Pharkya, and C. D. Maranas, “OptKnock: A Bilevel Programming Framework for Identifying Gene Knockout Strategies for Microbial Strain Optimization,” 2003, doi: 10.1002/bit.10803.
    12. J. D. Orth, I. Thiele, and B. Ø Palsson, “What is flux balance analysis?,” 2010. doi: 10.1038/nbt.1614.
    13. N. E. Lewis et al., “Omic data from evolved E. coli are consistent with computed optimal growth from genome‐scale models,” Mol. Syst. Biol., vol. 6, no. 1, p. 390, Jan. 2010, doi: 10.1038/msb.2010.47.
    14. R. van Heck, “Metabolic modelling to understand and redesign microbial systems,” 2017.
    15. E. Noor, A. Bar-Even, A. Flamholz, E. Reznik, W. Liebermeister, and R. Milo, “Pathway Thermodynamics Highlights Kinetic Obstacles in Central Metabolism,” PLoS Comput. Biol., vol. 10, no. 2, p. e1003483, Feb. 2014, doi: 10.1371/journal.pcbi.1003483.
    16. A. Flamholz, E. Noor, A. Bar-Even, and R. Milo, “eQuilibrator-the biochemical thermodynamics calculator,” Nucleic Acids Res., vol. 40, 2012, doi: 10.1093/nar/gkr874.
    17. N. J. Claassens et al., “Improving heterologous membrane protein production in Escherichia coli by combining transcriptional tuning and codon usage algorithms,” PLoS One, vol. 12, no. 9, Sep. 2017, doi: 10.1371/journal.pone.0184355.
    18. J. S. Edwards, R. Ramakrishna, and B. O. Palsson, “Characterizing the Metabolic Phenotype: A Phenotype Phase Plane Analysis,” Biotechnol Bioeng, vol. 77, pp. 27–36, 2002, doi: 10.1002/bit.10047.
    19. S. Klamt, S. Müller, G. Regensburger, and J. Zanghellini, “A mathematical framework for yield (vs. rate) optimization in constraint-based modeling and applications in metabolic engineering,” Metab. Eng., vol. 47, pp. 153–169, May 2018, doi: 10.1016/j.ymben.2018.02.001.
    20. “The Historical Development of Dutch agricultural Exports : Focusing on Post-war Growth,” J. Ind. Econ. Bus., 2015.
    21. PBL, Zure regen, een analyse van dertig jaar verzuringsproblematiek in Nederland. Bilthoven: PBL publicatie, 2010.
    22. A. Sikkema, “The nitrogen problem in five questions,” Resource, WUR, 2019.
About Cattlelyst

Cattlelyst is the name of the iGEM 2021 WUR team. Our name is a mix of 1) our loyal furry friends, cattle, and 2) catalyst, which is something that increases the rate of a reaction. We are developing “the something” that converts the detrimental gaseous emissions of cattle, hence our name Cattlelyst.

Are you curious about our journey? We have written about our adventures in our blog, which you can find here: