We re-coded the open reading frame of the optimized gene in order to consider the microbiome’s codon preferences, thus enhancing the translation efficiency of the optimized organisms while decreasing translation efficiency for the deoptimized organisms. You can read more about this in the Translation Optimization Modeling Section. We then constructed each of the modified sequences into a plasmid, which was transformed into our chassis organisms to test their effect on protein expression and growth rates.
All the presented data is the average of three biological replicates, performed with three technical replicates of each ORF.
As explained in our engineering cycle, the sequences for this analysis were produced using a previous version of our software.
We engineered the fluorescent mCherry gene to optimize The translation for B. subtilis while deoptimizing it for E. coli, and test different CUB measurements along with optimization techniques.
Specifically, we calculated 3 CUB profiles: CAI, tAI, TDR.
We then optimized the sequences according to 2 different optimization strategies:
- R optimization: the new score for each codon is calculated as the ratio between the CUB score in B. subtilis and the CUB score in E. coli. The synonymous codon with the highest ratio score is selected.
- D optimization: the new score for each codon is calculated as the difference between the CUB score in B. subtilis and the CUB score in E. coli. The synonymous codon with the highest difference score is selected.
In order to visualize the differences between the sequences, we created the following plots:
Figure 1: Multiple sequence alignment, highlighting different positions
Figure 1.2a: The number of different positions (out of 711 in the full coding sequence)
*Note: E stands for E. coli and B for B. subtilis
During our measurement process, both OD and fluorescence were measured once every 20 minutes using a plate reader. This rate satisfied our need in finding the growth curve, however, it was not frequent enough to receive derivations of manageable qualities due to random noise in measurements. Thus, we had to devise a new automatic strategy to find the logarithmic growth phase.
During the logarithmic growth phase, the growth is optimal and linear, thus the growth rate is maximal. In order to find that region, in each iteration, a point from one of the edges of the graph was removed, and a linear trendline was fitted to the curve (Fig. 1.3). If the removal of the point caused the slope of the linear curve to increase, it was considered not part of the log phase. These iterations were conducted continuously until:
- Only ⅛ of the graph is left.
- Two iterations did not change the slope
Figure 1.3: Two examples of automatic-detected logarithmic growth phase, for growth curves of B. subtilis with ORF version of TAI-R and E. coli with TDR-D. Calculated slopes indicate the growth rate values.
We calculated p-values with the student’s t-test and permutation test. Since t-tests are not traditionally performed on probability spaces of this size, we also calculated an experimental p-value using a permutation test. In the latter, we fit the traditional permutation test to our data; for every optimization, the three experiments from the same organism were averaged and a difference between E. coli and B. subtilis was calculated. Then, splitting, averaging, and distance calculations were performed for all random splits of the six results, to see if the separation between E. coli and B. subtilis is significant. Or, in other words, how many times can a random split be more different from each other than the difference between the two organisms (initial split). The p-value is considered the percent of splits in which the difference between the two is larger than the difference between the original split. Larger p-values would mean that many random cases performed better than our optimization, thus the optimization is not significant, In other words, the significance of our results increases as the p-value decreases). Download summary of results here.
Note: Statistically, the p-value can never be 0, it is only under the detection threshold of the analysis, which is in this case 0.0083.
Altering ORF sequences hinders the growth of deoptimized bacteria:
Growth curves obtained from three biological repeats for each bacteria and GOI version were plotted against time (Fig. 1.4A). Detection of logarithmic phase for each type of ORF was performed as mentioned (see computational analysis), and growth rates depicted by the slopes of linear trendline were calculated.
While growth rates of B. subtilis, to which ORFs were optimized, growth rates and maximal OD600 nm values were comparable to the original ORF (Fig. 1.4B). In E. coli, to which ORFs were deoptimized, growth rates as well as maximal OD600nm values were both significantly reduced in the ORF versions of TAI-D and TDR-D (Fig. 1.4B). In particular, TAI-D version largely restricted E. coli’s growth rateby approximately 7-fold, possibly due to ribosomal traffic jams (Fig. 1.4C). In this case, ribosomes are stalled during mRNA translation while waiting for the binding of low abundance tRNA anticodon. As a result, endogenous protein synthesis is diminished, which in turn restricts cellular growth. To assess the degree of optimization inB. subtilis relative to the deoptimization in E. coli, growth rates folds of B. subtilis were divided by those of E. coli as the following formula indicates:
As the graph clearly shows, TAI-D is by far optimized for B. subtilis and deoptimized for E. coli in terms of growth rates (Fig. 1.4D).
Figure 1.4: A. Representative growth curves for E. coli and B. subtilis from one of the biological repeats. Control (black dashed curve) stands for bacteria containing the same plasmid backbone that lacks the mCherry gene. mCherry (red dashed curve) is the original (unmodified) version of the gene, and CAI, TAI-D, TDA-R, TDR-D, and TDR-R are modified versions of mCherry gene. B. Right: fold change in bacterial growth rates of each ORF version relative to the growth rates in mCherry. Left: the same as the right but calculated for the average maximal density. C. Illustration of stalled ribosomes during mRNA translation, which hinders bacterial growth. D. Fold of growth rates in B. subtilis relative to E. coli. Values more than one confirm success in ORF version in terms of bacterial growth rates.
Significant variations in GOI expression corresponds with ORF modifications:
MCherry’s fluorescence intensity reflects that its expression levels increased or decreased substantially in all modified versions, in both B. subtilis and E. coli respectively (Fig. 5A). In B. subtilis, the TAI-D version exhibited the highest average maximal fluorescence intensity compared to the original version, while in E. coli, the TDR-R version exhibited the lowest (Fig. 5B). In order to account for the alterations in fluorescence intensity to variations in ORF’s sequence code and not to differences in bacterial density, it was normalized to the ratio of fluorescence intensity per bacterial density (OD600 nm values). Then, the average of normalized fluorescence intensity was determined. Higher values of this ratio represent greater GOI expression per bacteria and vice versa. When normalized, TAI-D was still ranked highest as expressive-optimized version in B. subtilis. However, in E. coli, CAI was ranked as the most deoptimized version, just ahead of TAI-R (Fig. 5C).
Finally, we evaluated the optimization degree (B. subtilis relative to E. coli) in terms of GOI expression by using the formula mentioned above, with the average normalized fluorescence intensity values. All five ORF versions, but especially CAI, TAI-R, and TDR-D, showed successful optimization in B. subtilis (Fig. 1.5D).
Figure 1.5: A. Representative fluorescence intensity plots of all ORF variants in B. subtilis (left) and in E. coli (right) from one of the biological repeats. Note that the control lacked mCherry gene, and thus wasn’t exhibited fluorescence, and served for background subtraction. B. Fold change in average maximal fluorescence intensity of each ORF version relative to mCherry. C. the same as the right but calculated for the average normalized fluorescence. D. Fold of average normalized fluorescence in B. subtilis relative to E. coli. Values more than one confirm success in the ORF version in terms of GOI expression.
We used our software in order to find endogenous E. coli and B. subtilis promoters that can be selectively transcribed only in one organism and not in the other.
We scanned the endogenous promoters of B. subtilis and E. coli searching for 2 different types of genetic motifs:
- Motifs that have enhanced presence in one organism compared to the other, i.e. selective motifs
- Motifs that are found in highly expressed genes, i.e. transcription motifs
These two sets of motifs are then joined together, and only selective motifs that had similar representation in the transcription motif set were used, creating a final set of motifs that are transcription enhancing in a selective manner.
In order to this set, we had to define what promoter is considered highly expressed:
Figure 2.1: Selective and transcription enhancing motif sets are compared using pairwise motif correlation, for different definitions of highly expressed promoters. In each subplot, the columns are the selective and transcription enhancing motifs of E. coli, and the rows are the transcription enhancing and selective motifs of B. subtilis. Brighter rubrics represent motifs that are more similar to each other.
- Fig. 2.1a: only promoters belonging to the top 5% of expression are taken
- Fig. 2.1b: only promoters belonging to the top 10% of expression are taken
- Fig. 2.1c: only promoters belonging to the top 25% of expression are taken
- Fig. 2.1d: only promoters belonging to the top 50% of expression are taken
In order to decide which definition is optimal, two factors have been taken into account:
- The total number of motifs in each set: More significant motifs allows us to scan and tailor the endogenous promoters in an improved manner, as explained in the transcription model.
- The correlation patterns: Generally speaking, the correlation between selective and transcription enhancing motifs from the same organism are higher than for the other organism, however in all cases the transcription enhancing motifs had good correlations across organisms, which shows that a large proportion of transcription-related motifs are non-selective and emphasizing the need to intersect the two motif sets.
We decided to choose option d, filtering out the lowest 50% of our promoters. These motifs were then used to rank optional promoters according to both selectivity and expression abilities and point synthetic changes were introduced to them to enhance their activity. You can read more about how we did this on the Transcription Optimization Modeling Page.
We then constructed each of the modified promoters into a plasmid, which was transformed into our chassis organisms to test their effect on protein expression and growth rates.
It is important to mention that the new promoters replaced the existing P43 promoter in our plasmid. However, while inserting the new promoters, a synthetic ribosome binding site (SynthRBS) (Fig. 2.2) wasn’t excluded while the vector was linearized by PCR reaction (see cloning protocols). We suspect that having a synthetic RBS in addition to equivalent sites in our promoters gave poor protein expression and impaired our results. Nevertheless, we repeated our POC assay twice and built graphs, and calculated growth rates as described in the translation tab.
Figure 2.2: Schematic representation of the modified region in our plasmid: P43 is the original promoter that was excluded by PCR and replaced with new promoters. Downstream to P43 and conjugated to the mCherry gene is the synthetic ribosomal binding site that was accidentally retained in the newly constructed plasmids.
The following table summarizes promoters’ names and their predicted bacteria to be optimization, hence are deoptimized to the other bacteria:
|A2||Ribosomal protein L4||B. subtilis|
|C||cAMP phosphodiesterase||E. coli|
|D1||L-lactate permease||B. subtilis|
|D2||Synthetic cAMP phosphodiesterase||E. coli|
|E||Synthetic L-lactate permease||B. subtilis|
|K||50S ribosomal subunit protein L27||E. coli|
|M||RNA polymerase transcription factor DK||E. coli|
|T||16S rRNA pseudouridine synthase||E. coli|
Subtle deviations in growth rates regardless of the optimization direction:
As indicated from growth curves and slopes of logarithmic phases, all promoter versions in E. coli had no significant effect on its growth compared to the P43 original promoter (Fig. 2.3A, B). However, the duration of the lag phase seems to be shortened compared to the P43 original promoter. In B. subtilis these measures were increased in most of the promoter's versions (fig. 2.3A, B), regardless of the promoter optimization direction.
Figure 2.3: A. Representative growth curves for E. coli and B. subtilis from one of the biological repeats. Control (black dashed curve) stands for bacteria containing the same plasmid backbone that lacks the mCherry gene and P43 promoter. P43 (red dashed curve) is the original (unmodified) version of the promoter, and A2, D1, and E are optimized promoters for B. subtilis ( E. coli deoptimized). C, D2, K, M, and T are optimized promoters for E. coli (B. subtilis deoptimized). B. Right: fold change in bacterial growth rates of each promoter version relative to the growth rates in the P43 variant.
Reduction in GOI expression in all promoter versions
Fluorescence intensity in both bacteria was dropped by a factor of approximately 10-fold for all promoter versions (Fig. 2.4A). It is possible that retaining the synthetic RBS (see explanation in the background above) largely an impaired translation of GOI, hence diminished protein expression. Not surprisingly, the average normalized fluorescence showed a dramatic decrease compared to the original promoter. In a try to find optimization preferences, folds of average normalized fluorescence of B. subtilis were divided by those of E. coli (as explained in translation results). Values more than one indicating optimization preferences toward B. subtilis and vice versa. Indeed, D1 and E showed optimization toward B. subtilis. Where D1 worked better than its synthetic version E. Only promoter versions A2 and D2 had optimization preference to E. coli. Thus A2 promoter failed since it was predicted to be deoptimized in E. coli.
Figure 2.4:A. Representative fluorescence intensity plots of all promoters variants in B. subtilis (left) and in E. coli (right) from one of the biological repeats. B. Fold change in average normalized fluorescence intensity of each promoter versiosn relative to P43. C. Fold of average normalized fluorescence in B. subtilis relative to E. coli.
The Communique project aims to develop software that allows safe and secure bioengineering of microbial communities in a selective manner.
The process was particularly challenging for several reasons:
First, a microbiome consists of tens and even hundreds of organisms - how does one create a model of biological processes for a whole microbiome? How does one then selectively optimize these in specific members of the community?
Second, optimization in the desired species is not sufficient for effective expression isolation - gene expression has to be deoptimized in the undesired species group. This is especially important in this model as ribosomes are a limited resource in the cell, so the introduction of non-optimal codons will create an mRNA sequence that is translated very inefficiently and slowly, decreasing the number of ribosomes free to translate other sequences. This actually slows down the cell’s growth rate, impairing fitness and creating innate selection against the presence of the gene in the deoptimized organisms.
Despite these obstacles, we managed to tackle these goals and implement them into one comprehensive software tool, including in silico analysis of a potential implementation and experimental tests. Within this software, we were also able to include appropriate plans for biosafety measures, our yet-to-be-released safety scan based on the SafetyNet software of the Heidelberg 2017 iGEM team.
Experimental results show that our strategy indeed works, as we were able to successfully optimize the ORF selectively. Our ORF variants robustly increased protein expression in the optimized bacteria, while substantially decreasing it in the deoptimized bacteria.
Surprisingly, growth rates were relatively unhindered in the optimized organisms, while we noticed a decrease in growth rate in the deoptimized organisms. While we would have expected the expression of a non-essential gene (such as mCherry) to have a negative effect on growth as it adds a metabolic burden on the cell and its machinery, increased fluorescence in comparison to the control did not cause a decrease in growth rate in B. subtilis, which points to the possibility of successful engineering of essential genes as well.
In addition, the decrease in growth rates of the deoptimized E. coli were interesting as well, as it may lead to a drop in abundance of the deoptimized organisms in the population, and thus also lower the possibility of plasmids reaching them via HGT.
In some cases, a user may want to use our software to optimize and deoptimize phylogenetically proximal species: As we saw in our software analysis,optimization becomes less effective as phylogenetic proximity increases. We would, however, be happy to take this challenge on in the future. It should be noted that a scenario is unlikely to occur in any real-world application as very similar species tend to not occupy the same niches due to resource competition . Were there to be a microbiome like this, we can use the same induced toxicity method described in the Safetysection, which would cause the deoptimized organism to die.
 N. Tromas, Z. E. Taranu, B. D. Martin, A. Willis, N. Fortin, C. W. Greer, and B. J. Shapiro, “Niche separation increases with genetic distance among bloom-forming cyanobacteria,” Frontiers in Microbiology, vol. 9, 2018.