Team:Open Science Global/Wetware/Design-Inspirations

Antecedents: Inspirations for our Bacillus Toolkit

The design of our Bacillus toolkit is informed by several prior wetware collections and projects.

Inspiration 1: pHT43 E. coli to B. subtilis shuttle vector.

In order to achieve high levels of recombinant protein expression, platform cells generally need to harbor multiple copies of the recombinant protein of interest. There are several ways to achieve this, including integrating the gene you want multiple times into a cell’s genome, or selecting for tandem-duplications of vectors integrated into the genome via single-crossover recombination. However, the simplest and easiest way to maintain multiple copies of a gene you want inside a cell is to put it on a replicating plasmid.

The plasmid pHT43 is one of the design inspirations for our project. pHT43 is a shuttle vector that can replicate in both B. subtilis and E. coli, because it has distinct origins of replication and selection markers for each cell type. Shuttling from E. coli to B. subtilis is very useful when we need to build and assemble genetic constructs, because it means we can clone, sequence-verify, and amplify them in E. coli first before transferring the constructs to B. subtilis for testing. E. coli remains the best/easiest cell type for this type of cloning work.

pHT43 has a composite promoter containing a strong B. subtilis promoter, the lac operator, and the gene for the lacI repressor gene under a constitutive B. subtilis promoter. This means all the machinery required for lactose or IPTG-based induction of recombinant gene expression is contained on pHT43.

Importantly, pHT43 undergoes theta replication rather than rolling circle replication in Bacillus. This matters because B. subtilis loves to homologously recombine single-stranded DNA, and while rolling circle replication generates long single-stranded DNA intermediates, theta replication does not. pHT43 is therefore much more genetically stable in B. subtilis than many other plasmids.

Inspiration 2: the FreeGenes Open Yeast Collection

Our open Bacillus wetware toolkit was directly inspired by the FreeGenes Open Yeast Collection (OYC). OYC was designed by our team’s primary PI Dr. Scott Pownall (with contributions of Pichia part designs from team co-lead Isaac Larkin), and in turn was inspired by the modular yeast toolkit published by Dueber lab in 2015.

OYC is a highly flexible and extensible collection of DNA parts for building not just single transcription units for expression in Saccharomyces cerevisiae and Pichia pastoris, but entire multi-gene pathways for metabolic engineering. OYC is freely distributed under the terms of the OpenMTA and includes parts for controlling the expression and localization of protein coding sequences (CDSs) in yeast, including promoters, terminators and some model CDSs as well as parts for yeast two-hybrid protein interaction experiments. These transcription unit (TU) component parts are all designed to assemble via Golden Gate with BsaI cut sites and the compatible 4 base pair (bp) overhangs from the Modular Cloning (MoClo) assembly standard.

OYC’s general assembly standard can be adapted for use with any species as some parts, such as the assembly connectors and E. coli backbone parts, are agnostic to species.  It contains modular parts in a standard architecture for building E. coli to yeast shuttle vectors, including E. coli selection markers, E. coli origins of replication (ori), Saccharomyces and Pichia origins and homology arm pairs for genomic integration, yeast selection markers, and even an origin of transfer (oriT) for conjugative transfer of the vector from E. coli to yeast. OYC can be easily adapted for use with other yeast species such as Yarrowia lipolytica and Schizosaccharomyces pombe.

This is achieved through an extension of the established MoClo assembly standard. Additional part types outside of the transcription unit are defined with new 4-base pair overhang sequences for BsaI. These new overhangs were validated for very high (~98%) high fidelity assembly using NEB’s Ligase Fidelity Viewer™ (v2, based on research from Pryor and Potapov et al.).

OYC enables the construction of level 2 multi-gene assemblies through the use of 7 pairs of  ‘Assembly Connector’ parts, each of which contain a single BbsI cut site within a short non-repetitive unique spacer sequence. During level 1 assembly pairs of assembly connectors are chosen to flank each transcription unit. The BbsI sites are then used in level 2 assembly to concatenate from 2 to 6 transcription units in one construct that can be episomal or integrative depending on the options chosen.

Such multi-gene inserts are essential for building complex genetic circuits in synthetic biology, but they are also highly useful for recombinant protein production. For instance, if users want to build strains to inexpensively manufacture, for examples, Type IIS restriction enzymes used to run the Golden Gate reactions including those that power OYC and other FreeGenes collections, they will need to co-express an appropriate restriction methyltransferase to prevent the restriction enzyme from chewing up the host cell’s genome. Moreover, for some target proteins, users may want to co-express chaperones that assist in protein folding or secretion; or they may want to assemble multiple non-repetitive but synonymous CDSs for the same protein, in order to boost expression levels.

Inspiration 3: The FreeGenes Protein Expression Toolkit

Another important inspiration for our project is the FreeGenes E. coli Protein Expression Toolkit (PET), designed in 2020 by Jenny Molloy, Chiara Gandini, Fernan Federici, Isaac Núñez, Tamara Matute, Anibal Arce, Isaac Larkin and Scott Pownall.

Designed at the beginning of the COVID-19 pandemic, the PET contains E. coli codon-optimized CDSs for most of the off-patent enzymes used in molecular diagnostic assays. These CDSs are available both in pre-built vectors with polyhistidine tags and IPTG-inducible promoters, and as standalone parts that can be assembled into multi-tagged TUs with other parts in the collection. These other parts include E. coli vector backbones; regulatory elements such as promoters (inducible and constitutive), ribosome binding sites (both standalone, and paired bicistronic start codons), and terminators; and a large collection of peptide tags including periplasmic export tags, affinity purification tags (with nickel-NTA resin, silica, starch/maltose, chitin, cellulose or calmodulin), fluorescent and chromoprotein reporter tags, and cleavage tags that proteolytically cleave other tags off the CDS.

To enable the assembly of multiple composite tags onto CDSs, an expanded and modified version of the MoClo assembly standard was designed for the PET collection (called FreeGenes Protein Cloning, or FG ProClo). Five additional 4 bp overhangs were added to the assembly standard, two of which replaced the 3’ overhangs in the ‘RBS’ and ‘CDS’ part definitions. These additions made space for up to 3 N-terminal and 3 C-terminal tag parts (defined generically as Tag1 - Tag6) to be appended to ProClo CDS parts, in a single Golden Gate reaction. As with OYC, the choice of additional overhang sequences capable of high-fidelity assembly with the standard MoClo overhangs was guided by NEB’s Golden Gate utility tools. Additionally, it was required that the ProClo RBS 3’ overhang sequence had to end with the beginning of a start codon, and that all other new overhangs had to be able to code for a pair of small, hydrophilic amino acids (glycine, serine, or threonine), so that the scar sequence between different tags would be less likely to affect the function of the tags or whatever protein they’re attached to.

The ability to quickly and simply fuse composite tags to a CDS could be very useful for decreasing the cost and increasing the speed of the design-build-test cycle for developing high-expressing strains for a protein of interest. For instance, the combination of an affinity peptide, a fluorescent reporter, and a protease cleavage site could rapid measurement and testing of the expression levels of different constructs, of different protocols to purify the target protein, and of comparison of the function of tagged and untagged versions of the purified protein.

Inspiration 4: Brockmeier et al.’s 2006 paper on comprehensively screening B. subtilis secTags

B. subtilis has 173 native secretion signal peptide tags (secTags) fused to various proteins that are secreted by the Sec pathway. In 2006, Brockmeier et al. fused all 173 secTags to two different heterologous recombinant proteins, a fungal cutinase and an esterase. Of the 148 secTag-cutinase fusions that could be expressed in B. subtilis, the top secTag secreted ~10 times as much cutinase as the median secTag, while roughly a quarter of the secTags secreted no detectable cutinase at all. Importantly, when the top performing secTag for cutinase was fused to the esterase, it only produced 5% of the secretion efficiency of the top secTag for the esterase; and likewise, the top secTags fused to the esterase gave little to no secretion when fused to the cutinase. As Brockmeier et al. reported, ‘the best [secTag] for the secretion of one target protein is not automatically the best, or even a sufficient [secTag], for the secretion of another different target protein.’ This means that discovering a good secTag for the proteins we want to manufacture would likely require comprehensive screening of B. subtilis secTags.

Inspiration 5: The Bacilloflex Toolkit

The Bacilloflex toolkit is a Golden Gate collection of genetic parts for engineering B. subtilis, designed and characterized by Wicke et al. in 2017. It contains descriptions and sequences for numerous useful parts, including promoters, RBS, terminators, homology arms for genomic integration, and linkers for multi-TU assembly. The Bacillus collection, combined with a number of B. subtilis plasmid maps shared with us by Dan Ziegler, serve as inspiration in our continued efforts to expand our toolkit of useful B. subtilis parts.

Inspiration 6: Software and wetware tools to increase the predictability of bioengineering

Finally, we were inspired by wetware and software work from Voigt lab and Salis lab aimed at making biology more engineerable. Specifically, Voigt lab has developed ribozyme insulators that make the expression of genes from a given promoter more predictable and consistent, independent of the CDS sequence it is expressing, by cleaving off the 5’ untranslated region of the mRNA and preventing it from forming secondary structures with the rest of the sequence. Salis lab’s Ribosome Binding Site calculator (RBScalc) and Nonrepetitive Parts Calculator (NRPcalc) are easy-to-use tools for tuning (or in our case, maximizing) expression from a given CDS, and for reducing genetic instability caused by DNA sequence repeats. These tools were both directly useful to us (RBScalc was used to design RBSs for the secTags in our library plasmids), and were inspirations for the work our software team did both to re-build RBScalc in Golang for the Poly package, and to develop a pipeline to identify and remove sequence repeats and predicted secondary structures that could potentially impact the synthesizability and genetic stability of our parts, particularly the coding sequences.

Design InspirationsBaClo Assembly Standard