TAU_Israel's Header

Layout for Engineering Success page

Engineering Success

Figure 1: The engineering cycles

In our project, our long-term goal is to broaden the achievable horizons when it comes to microbiome engineering, and to progress biological development in the real world. Due to this ambitious ideology, we had to conduct extensive research on every building block and component inserted, then analyse them together, and finally design the engineering process in a user-friendly configurable manner.

Each one of these three engineering cycles was carefully designed according to the previous cycle, greatly impacting the following one and our general insights regarding this approach. In this section, we aim to introduce you to the engineering process that happened behind the scenes, and expose the various challenges while clarifying the background for our strategies.

Part 1: Designing multi-organism optimizations

As previously stated, in order to achieve our premise, we decided to model and selectively optimize the core biophysical processes related to gene expression. Most of these processes have been modeled to some degree for a single organism, however this knowledge must be used for the following 3 consecutive goals:

Learn: Finding variability among the species in the microbiome
Design: Understanding multiorgnaism preferences and trends
Build: Re-designing genetic components according to the novel information - implementing both optimization and deoptimization simultaneously
Test out model

This process is demonstrated in all 3 main models:

Entrance into the cells

Figure 2: Interfering restriction sites

Our purpose was to optimally edit restriction sites. This problem can easily be solved for one organism, as the degrees of freedom within different genetic components are known.

However, in our approach, we wanted to insert sites present in the deoptimized organisms and avoid any sites present in the optimized organisms.

Learn: We characterized the restriction enzyme database in order to characterize the origin of different restriction sites and piece together organisms with their correspondingly recognized restriction sites.
Design: We designed a dual-organism model for one optimized and one deoptimized species, coordinating synonyms insertion and avoidance of restriction sites.
Build: Our problem was that different restriction sites interfere with each other, causing conflicts and dependencies between different sites and organisms. Thus, we decided to build a novel algorithm and implement it to take the variation in placing and number of sites from each organism into account.
Test: When applying this model, it did not insert sites from the optimized organisms (despite rare cases characterized in the software log files).

Transcription

Different organisms utilize different transcription factors in order to promote the transcription initiation, as each of these factors recognises different sets of genomic sequences described as “motifs”.

Learn: We reviewed different motif-finding tools and characterized them in our future contribution.
Design: We picked the MEME-suite tools and built a dual-organism model, that crosses the set of motifs that are selectively found in one organism compared to the other, with the set found in promoters compared to intergenic regions
Build: We used the selective-motif principle in order to devise a novel algorithm described in the transcription optimization model that can take those two needed features (efficiency and selectivity) into account. This can be done using the motif set construction strategy.
Test: We tested our model in our POC; Go check our results!

Translation

During evolution, cellular machinery has adapted to translate certain codons more optimally than others. Different genetic information can be used in order to infer the profile of preferred codons (also known as the codon usage bias, or CUB) of each organism.

Learn: Researched different CUB - we looked into traditional codon harmonization techniques such as CAI (codon adaptation index) and tAI (tRNA adaptation index), and even examined a new approach using ribo-seq profiling called TDR (typical decoding rate).
Design: We devised an initial selective optimization strategy similar to the single codon approach, and examined two different implementations- optimization of the score difference between the optimized and deoptimized organism, and optimization of the score ratio.
Build: We took the mCherry fluorescent gene and optimized translation in B. subtilis while simultaneously deoptimizing it for E. coli. We did this for all specified CUB measurements.
Test: Tested the results in the wet lab, and found out that tAI worked really well. (See Results)

Part 2: Analysis

Due to the computational and biological novelty of our two goals - whole microbiome design, and simultaneous optimization and deoptimization - this engineering cycle was not conducted perfectly; we had to revisit the steps according to feedback and due non satisfactory results. However, we recognise the importance of this imperfect learning and engineering cycle, as it exhibits the wide potential of analysis and advancements that can be made to and using our models.

Additionally, we hope that this thinking process will inspire future iGEM teams to not to stick to traditional methods and strive to develop basic assessment strategies for their innovations, as well as broaden the arsenal of computational synthetic biology analysis mechanisms to fit the rapid evolution of this field.

Initial cycle

This engineering cycle bridges the individual models with the analysis process

Learn: when analysing our experimental results we examined the performance of the designed sequences in the lab, we performed many analysis processes which you can find in our results. A consistently evident conclusion was that the two best codon usage bias scores for our optimization were CAI and tAI (Figure 3):
Figure 3: Optimized fluorescence ratio shows optimization success
As seen in this graph, the fluorescence of B. subtilis was nearly 12 times better than E. coli for CAI optimization, and 9 times better in TAI-ration.

After finding out that tRNA optimizations worked exceptionally well, we made an effort to find the optimal way to precisely calculate without impacting the number of organisms that can be optimized by the software. We compared the tGCN (tRNA gene copy numbers) from NCBI and the GtRNAdb only to find that the quality of the tRNA was worth it’s limitations.
Design: Devised comparison strategies between CUB measurements. Our initial optimization score was a traditional implementation of the statistical Z-score.

Click here to read more about our initial optimization analysis.

Optimization index 1 - Z-score ratio:

Z-score:

Z-score (standard score) describes the number of standard deviations by which a specific value is distant from the mean value of an interest population. Negative Z-score indicates the original value is below the mean, and positive Z-score indicates the original value is above the mean. The standardization is given by the formula:

Equation 1

Where μ and σ are the mean and the standard deviation of the population, correspondingly [ref].

Z-score ratio:

The calculation is performed according to the following algorithm:

The following steps should be performed for each one of the organisms in the community:

I. Creating the population by calculating the CUB for each gene of the organism.

II. Calculating the CUB of the original sequence, and standardizing it according to the population.

III. Calculating the CUB of the optimized sequence, and standardizing it according to the population.

Let Y be an organism in which we want to optimize the gene expression, and let Z be an organism in which we want to deoptimize the gene expression. The calculation is given by the following formula:

Optimization index 1 =

Equation 2

Where:

ZY' is the Z-score of the CAI of the optimized sequence according to organism Y population,

ZZ' is the Z-score of the CAI of the optimized sequence according to organism Z population,

ZY is the Z-score of the CAI of the original sequence according to organism Y population,

ZZ is the Z-score of the CAI of the original sequence according to organism Z population.

In case of multiple organisms, step (2) should be performed for each pair of Y and Z organisms, and the returned index would be the mean value.

Build: multi organism optimization functions based on single codon optimizations.
Test: we performed novel whole-microbiome performance analysis of the Arabidopsis soil microbiome, provided in our software analysis.

At this point, we were very puzzled - our experimental POC was beyond our expectations, but somehow the scale-up from two organisms to a complete microbiome has completely failed. The performance test results were disappointing, and different consultations have made it clear that although our score is statistically accepted, it is counter-intuitive and hard to analyse.

This was one of the major challenges we faced during our project, as our computational results simply did not align with the traditional codon harmonization method. We decided to revisit step 2:

Design - second attempt

We re-designed our optimization score (which is the current version) to be more user friendly and biologically correct, by fitting the mathematical nature of the measurement to the behavior of selective translation efficiency optimization.

Test - second attempt

After testing the new statistical optimization score on our Arabidopsis microbiome for the correlation between evolutionary distance and model performance, we still received problematic results:

Figure 4: Correlation between optimization score and evolutionary distance (measured as number of different positions in the sequence alignment of the 16S rRNA).

For more information about this analysis revisit the software analysis page.

In Figure 4:

Red points = optimization failed
Green point = optimization succeeded

It is clear from this graph that the optimization is not very successful in most cases, and optimization chances are relatively low for closely related organisms.

After this encounter, we understood that the single codon optimization does not take into account the significance of the codon usage bias score change of the protein in the context of the whole microbiome. I.e., if an organism has very close codon usage bias scores for all proteins, even a small change in CUB can be significant.

For example (Figure 5):

Figure 5: The standard deviation of CAI scores of the proteome of the bacteria in A.thaliana root microbiome which was used for software analysis.

As seen in Figure 5, there is a order of magnitude difference in the standard deviation of CAI scores of the endogenous proteins for different bacteria present in the A. thaliana root microbiome that was analysed.

We the returned to the third step:

Build - third attempt

We devised a whole microbiome hill climbing optimization approach in order to achieve our goal.

Test - third attempt

Once we finished implementing the new optimization strategy, we analysed the results and were able to both improve correlation between the performance and evolutionary distance and improve overall optimization capabilities, especially for closely related organisms. Review our results for all optimization strategies in the software analysis page.

Team:TAU Israel/Engineering