Team:Open Science Global/Software

Software

Menu

Description Engineering Success Proposed Implementation Proof of Concept Software Hardware

Introduction

So many engineering fields use many design-build-test-learn (DBTL) cycles to find optimal results. Biofoundries are the infrastructure that allows synthetic biology and biotechnology to utilize the DBTL as the main workforce of change for solutions in organism engineering. Automation is the key element inside Biofoundries allowing them to high-throughput a wide range of designs, experiments, tests that later will generate integrative reports to define if the desired goal is achieved. However, what should be the final objective for this software in Biofoundries? Well, in our perspective the future depends directly on the creation of autonomous biofoundries, especially the frugal ones.

Autonomy is a concept that people are familiar with when talking about cars. Fully autonomous vehicles are the pinnacle of automation by transforming a very human-dependent activity into a completely automatic one. What if synthetic biology could be similarly automized? What if protocols could be executed by an integrated infrastructure? What if they could be adapted to each specific laboratory setup? What if the design of genetic parts and experiments could be corrected while being produced?

We understand that Frugal Biofoundries will need open software that allows for these types of integrations. Where the community actively communicates and develops their own non-proprietary, free, easy, high-quality software solutions, that could resolve a high-throughput and high volume of data. Not only that, we need synthetic biology developers that will develop the next generation of tools for biotech infrastructure.

A quick walk through on toolkit

In order to do that, we decided to utilize Poly, an open-source Go package for organism engineering. As a Go package, Poly has intrinsic properties that allow easy reusability, compatibility, and good performance. Poly also has a very vivid and funny community, guided by a very active maintainer and creator of the package, Timothy Stiles. The compromise of creating good quality code allied to the ambition to become the most complete and open collection of computational synthetic biology tools is what makes Poly a very attractive option for what we’re trying to create. Actually, most of the Friendzymes software team is or became a Poly contributor.

We decide for this MVP (Minimum Viable Product) part of Friendzymes projects to stipulate two main objectives:

Create software that can be easily adapted and learned for people interested in being a SynBio developer, so they could be empowered to resolve their own and community problems; and,
Create software that can demonstrate how software could automatize processes in the DBTL cycle.

The main goal of our project is the democratization of biotechnology; thus, when thinking about people who have different backgrounds and levels of knowledge in programming, we created 1) the Friendzymes Cookbook, a collection of Jupyter Notebooks with scripts that we developed for this iGEM season to help the Design team create incredible work, and 2) the Friendzymes Actions, a collection of Github Actions for Synthetic Biology for Continuos Integration integration.

The Friendzymes Cookbook

During the iGEM season, Friendzymes’ software and design teams worked together to automatize steps that could be complicated, time-consuming, and unsafe to do by hand, e.g. making a typo and compromising your sequence. Ithis process, we created many scripts to locate our specific demands and shared this as Colab Notebooks so others could copy, modify and recreate.

However, many of these tasks are similar when it comes to biological circuit design: codon optimization, primer design, searching for forbidden sequences (e.g. EcoRI binding site outside the BioBrick standard prefix and suffix), among others. Hence, we thought it prudent to make tutorials that could help people beyond our own project so we create the Friendzymes Cookbook, not only a collection of scripts for design automation but also as an Educational Tool (Check on the Education Section) so newcomers in the Software Team, interested people from the Friendzymes, teams from iGEM/iGEM Design League, and others in the SynBio Community could all have a way to learn more about Poly, common problems, and how to design new tools!

The Cookbook is defined as the collection of Colab notebooks, currently comprising:

Understanding Poly

Poly is our key tool for the software. It was a planned decision to build workflows that integrate with Poly, to show ways to use the package, as well as create some new features; therefore, it is very important that you understand how the Poly package works and what its structure is in general before you begin manipulating it. Thus, we created this brief overview of Poly, its sub-packages, and a collection of use cases. We strongly recommend that you do the tutorials in the order they appear.
Codon Optimization

A very common task for the design of parts is Codon Optimization, so here we will show how you can create customized Codon Tables and how you can use this to do codon optimization of a given Coding Sequence (CDS).
Annotation of problematic sequences

Have you designed your sequence? Now it is time to remove small forbidden parts that can hinder you, not only when sequencing (e.g. hairpins, repetitive regions), but also when cloning (e.g. restriction binding sites). What this tutorial shows is the automatic annotation of these problems. It will give you a genbank file (with these annotations attached) that you can drop into your favorite viewer, like Benchling or Snapgene.
CDS fix

In this notebook, you will input your CDS sequence(s) and receive your CDS corrected without the problematic sequences. This is done by replacing the codons with synonymous ones, thus keeping the same amino acid sequence at the end. Kind reminder that this tutorial was NOT written for non-coding sequences such as promoters, rbs, and terminators. If you have found problematic sequences in it, review case by case and be careful not to lose biological meaning.
Automatically create parts with correct overhangs

How about designing your final plasmid without worrying about each separate part and using a script to add the restriction binding sites, spacer, and overhangs? That’s what you find here!
Golden Gate Simulation

In this notebook, you will run a simulation of a Golden Gate reaction and see if everything is theoretically acceptable before physically synthesizing your parts.

We made all these ‘recipes’ using Jupyter Notebook, with Google Colab in mind, a free platform for running notebooks using the Google Infrastructure. This way people don’t need to install or configure anything to run, adapt and develop their own tools. We also made this repository where people could contribute by proposing new chapters of the cookbook, fixing bugs, and maintaining this whole collection of tools. Feel free to take a visit to our repo and suggest anything you’d like!

Friendzymes Actions

While writing this text a script behind the scenes is checking if the words are used correctly, and this integration is so seamless and smooth that people take it for granted. This isn’t magic, it is actually an automation process. In software engineering, there is an entire field of study dedicated to automation processes which were previously manual. By using a pipeline, we make the process of automation simpler.

Pipelines could be understood as an iterative process where each output is used as the input of the next, so the sum of all this script's workflow is your final result. The actual pipeline manager tools try to make these workflows context-independent (using most of the time container as a solution), so developers could easily migrate and scale pipelines from local computers to a cluster or a cloud server.

To process the high-throughput demand inside the biofoundries, software engineers implement pipelines which process thousands upon thousands of designs, experiments, and data analyses every week. We believe the Jupyter Notebooks are good for some scenarios, however, they can't provide a framework scalable enough for this demand. With this in mind, we tried to avoid creating a solution that is attached to a specific cloud service provider or to use a tool that will need too many configuration steps. For this, we decided to use Github Actions.

Github Action is a free-to-use feature inside Github that allows you to automate tasks inside your repository. In essence, is a way to have pipelines that could or not be related to the code that you share in the platform. For us, this means a free pipeline manager software, with minimal steps for configuration, where people could automatize processes for the DBTL cycle in an open-source environment.

To demonstrate the potential of this too,l we created three Github Actions:

DNA Annotator

This action allows users to process Genbank files and reannotate them with problematic regions as Hairpins, Repetitive sequences, and forbidden restriction binding sites, allowing DNA designers to easily find regions to take a well-informed decision of what subsequences is better to correct.
Is this DNA Synthesizable?

Instead of copying and pasting each sequence that you have in the IDT gBlock Analyzer page to see if your sequence is synthesizable, we created this action to check this for you. We have some optional features, like break the pipeline, for halting the process if the software finds a non-synthesizable sequence, and then exports a JSON file with the score of each sequence and the problems they found.
Codon Optimization

You could, with this action, automatically codon- optimize a list of sequences for different organisms based on the codon tables you share. This is good if you’re working with multiple organisms at the same time.

In our development roadmap, we envision new actions that can automatically generate Opentron protocols files that already have parameters to assemble (BUILD), amplify and validate (TEST) your sequences as a whole experiment; additionally, we’re creating also more and more modular tools for automating the design of genetic parts. We believe with these tools, Synthetic Biologists could start automatizing manual labor-intensive tasks and utilize the benefits that software development already has for their field.

For the future

For us this feels like a beginning. We are starting to implement these ideas, but as previously stated, we know where we want to go. We want frugal biofoundries to be equal with full-size biofoundries, including automation. Furthermore, we don’t want the software to be ‘good enough’ to be open-source, we want software that is majestic for individuals and big companies. We want to integrate hardware and software, so that frugal biofoundries can automate DNA sequencing from end-to-end, for example, the processing of COVID’s DNA sequence. We want to make oligo pools assembly easy and less error-prone so people could synthesize parts 10x or 20x more cheaply. Living protocols running and showing if the experiment already has some inconsistencies, so you don’t have to waste more reagents to realize you make a mistake.

How could software improve synthetic biology? How much impact could software make to advance humanity to carbon-negative and actually make a (why not) solarpunk future a reality?

Software is a big piece of this puzzle. Together we could build it.

If you’re interested don’t hesitate: leave a message, e-mail or github issue, and we will be glad to present what we’re doing right now and show ways to contribute to the Friendzymes project.