Team:Sydney Australia/Engineering


Due to the NSW lockdowns, we were unable to enter into the lab and complete any lab work. However, with so many components to our project, we can confidently show how we would employ the engineering design cycle in our experimental protocols, as well as how it came to be of use in our bioinformatics analysis.

Putative Lab-based Design Cycles


We have designed a novel inducible salicylate promoter system that has tighter repression and has modifiable expression levels, as described in Design (hyperlink?). However, we do not know if combining these two mutations will yield other unforeseen effects, so we will need to conduct an assay to determine their expression levels.

The following is a representation of the design cycle that we project would occur.

The fliK landing pads will be used to test the promoters, as that would be doubly efficient (if they work, then we can just insert Cluster 1 straight away, since that would remove the Psal promoter and fuGFP). They will contain a selectable marker and other elements, however, one will contain the promoter system with 34% of the WT expression and the other with 100%. They will both be regulating the expression of fuGFP.

These will be directly ordered as gBlocks from TWIST.

After they have been transformed into the JM109 cells, they will be plated separately on regular LB media supplemented with salicylate as shown below.

The yellow represents inoculation patterns, and the control would be the JM109 cells with no recombineering attempted on it. From there, a visual comparison can be made on whether the 34% variant generates a lower expression than the 100% Psal promoter. For quantitative analysis, quantitative Western blots can be performed to determine the amount of protein produced by both strains.

Some possibilities include that there is no GFP shown at all, or GFP is shown but is the same across the variants. For the first example, we could learn that the recombineering has failed, or the salicylate inducible promoter has failed. For the second, we could learn that the base changes in the salicylate promoter do not actually generate any difference in expression.

If these occurred, some revisions in the design would be necessary:

For the first example, to determine if it was the recombineering that was at fault, an experiment could be performed where the fuGFP gene is inserted with a constitutive promoter instead. If it works with that, then recombineering worked and the salicylate promoter system is at fault. A re-design here onwards could be using restriction enzyme sites instead of recombineering, or CRISPR Cas systems.

For the second example, we could re-order the fliK regions, but this time do not combine the TTC mutations with the mutations that change expression levels - use just the expression level mutations. If it generates different expression levels after that, we could deduce it was the TTC mutation causing the trouble. A re-design here onwards would be to adjust all subsequent clusters to have only the mutations responsible to changed expression levels.

Cluster Insertion

While recombineering would have been tested already with the insertion of the fliK landing pad (described before), we would need to employ the engineering design cycle with our Babushka Block method, which uses novel primers. The system requires both the successful annealing of primers to a cluster, and then its homologous recombination into the cell.

The following is a representation of the design cycle that we project would occur:

The design is described in Cluster Insertion.

Gene clusters will be ordered in pTwist-chlor-MC medium copy CmR vectors from TWIST. This will allow us to have as many copies as we need to use.

Cluster 1 will be amplified out of the Twist plasmids by PCR, and by using the primer with the homology arm as one of the primers, the replicated DNA would have the homology arm. They will then be transformed into JM109 cells with the pKD46 plasmid via heat shock or electroporation. The linear DNA should then be picked up by the bet and exo gene products to incorporate it into the genome via homologous recombination.

Cells will be cultured, and then plated on LB agar plates with the antibiotic that corresponds to the one in the gene cluster. For Cluster 1, it would be trimethoprim.

There are numerous avenues to test the successful incorporation of the clusters. A simple method would involve the DNA extraction of the E. coli chromosome of the recombineered strain and a control strain (no recombineering has occurred), and then sequencing both. If using short reads, the reads can be assembled bioinformatically via Velvet, and an alignment can be performed. If there is an insertion in the recombineered strain between the fliK homology regions, and it matches the Cluster 1 design, we can state that recombineering via the Babushka method was successful.

It is likely that the ideal results are not shown. There could be no difference between the control and recombined strain, there would be an insertion elsewhere, or the sequence does not match.

With the first scenario, we could learn and that recombineering works, but it is the homology arms themselves that do not work. The second situation could indicate a potential off-target match in the genome and recombination is occurring there. The third could be that there was a synthesising error when purchasing the gBlocks from TWIST.

Future steps and redesigns could include:

With the first scenario, the primer hybridisation idea would need to be re-assessed. The homology arms could just be synthesised directly onto the clusters, and while that would lose the flexibility of cluster insertions, that could resolve some problems.

With the second scenario, the primers can be redesigned from new random DNA, and BLASTed against the ADP1 genome prior to its use to ensure it is completely unique. For the third scenario, a preventative strategy should be prioritised. All subsequent clusters should be sequenced to ensure they are as expected before insertions.

Natural Transformation

We hypothesise that there should be some low natural transformation rates after all the competency genes have been inserted, and a moderate/high transformation rate after all the pilus genes have been inserted. This would need to be tested throughout the whole recombineering process and troubleshooted if results are not as expected.

The design is described in Cluster Insertion

Gene clusters will be ordered in plasmids from TWIST. This will allow us to have as many copies as we need to use.

Cluster 1 will be excised out of the plasmids and the first primer will be hybridised to it. They will then be transformed into a competent, recombineering strain of E. coli via heat shock or electroporation. The linear DNA should then be picked up by the bet and exo gene products to incorporate it into the genome via homologous recombination.

Cells will be cultured, and then plated on LB agar plates with the antibiotic that corresponds to the one in the gene cluster. For Cluster 1, it would be trimethoprim.

The E. coli strain without any cluster insertions will first be tested. We will follow our Coli Transformation Protocol and incubate the cells with a small, iGEM plasmid pSB1C3, with an antibiotic resistance marker. We will then inoculate them on agar plates with the antibiotic, and record the number of colonies after a few days. This will be repeated with the recombined strain after Cluster 1's insertion, then 2, so on and so forth.

There is a good chance that there might not be any natural transformation whatsoever. According to Seitz & Blokesch (2013), the three genes inserted in Cluster 1, ComA, ComEA and ComF, should together generate some natural transformation, so if this isn't the case, we might learn the following about our methods: the genes are not being transcribed, the genes are not being translated, or the proteins are not forming the correct structures in the cell.

For future re-designs, we could do the following:

For scenario one, we could analyse transcriptomics data of the cells to determine if they really are not being transcribed. From there, if they aren't, we would need to try a different promoter, ideally one that has varying expression levels and is inducible.

For situation two, proteomics would help us determine if the proteins were being produced. If the genes are being translated but not translated, there might be something that we inserted into the chromosome that is affecting it, and we might try to recombineer a simpler gBlock to see if it works.

For scenario three, it might be a matter of expression levels, whether they are too high or low. We could try to use more Psal expression variants to match its natural expression better for better pilus/competence structure formation.

Dry-lab Design Cycles

Recombineering Strategy

The development of the Babushka Blocks method took many iterations. See below the last design cycle before the final strategy was discovered:

One of the first designs for our recombineering strategy did not involve primers, but instead started off with clusters with many homology arms, and then gradually fewer and fewer:

With this design, when Cluster 1 is inserted into the fliK landing pad, the red and purple arms match and the genes are inserted:

Because the 5' homology arms of the rest of the clusters were still kept downstream of the Cluster 1 genes, any of the other clusters could be inserted. It allowed for clusters to be skipped too, see below with Cluster 1 skipped:

Sample structures were created on SnapGene® software (from Insightful Science; available at

While there was no lab component, the testing involved drawing out diagrams like the ones shown above and working as a team to point out limitations/errors.

We discovered several problems with this initial design:

The cluster insertion was not truly non-sequential. Clusters could be skipped, but the order could not be changed. E.g. 1-2-4-5 is possible, but not 1-2-4-3. Observe what happens when Cluster 1 is inserted after Cluster 2

Cluster 1 would remove the genes in Cluster 2!

There are too many off-target homology arms. Recall the image that showcased how Cluster 1 would be inserted:

Those are not the only homology arm combinations possible. Homologous recombination might occur like this:

Or even this:

Thus, we learnt that there were numerous limitations to this design.

We thus went back to our fundamental ideas. We knew that we needed unique selectable markers for each cluster for true non-sequential insertion, as well as unique 5' homology arms. Furthermore, we also knew that if we put more than one unique homology arm on a gene cluster, it increased the possible recombination patterns and some regions of the gBlock might be omitted.

K-means Clustering

We wanted to use k-means clustering to group genes based on a myriad of parameters, like expression, toxicity and biological function. To begin with, we focused on looking at just native expression levels, as clusters will be assigned different Psal promoter variants depending on this.

We used R's built in k-means algorithm with k set to be 5 on the transcriptome concentration data. This was to form 5 clusters, of which the other 3 clusters were chosen to accommodate unpivotable individual challenges - ComC has length 4352, and thus cannot easily fit other genes in a sensible way to fit our 5kb limit, ComP due to its extremely high transcription rate (6352) and our first chosen cluster of ComEA, ComA and ComF.

This is one of the initial tables generated from our analysis, using both length and concentration in our algorithm.

We also experimented with using 6 clusters (i.e k = 6)

Our findings from our tests revealed that the natural groupings of genes via expression does not lead to clusters that are under 5kb. Some expression levels of genes are quite unique, which can lead to clusters like Cluster 3 in the second diagram, while some are very similar, which lead to clusters like Cluster 1 in the first diagram.
We learnt that we need to add in our own restraints to ensure that the clusters formed stay under 5kb.

We found by removing the use of length in our algorithm, our likelihood of having large clusters would decrease. This resulted in the following results:

This formed the basis for our cluster design. After considering the biological function of some genes and manually shifting them around, we were able to come to a final decision on these genes.

The minimal spread as shown in the above figure shows that the algorithm has quickly grouped together genes with similar transcriptome levels, and thus gives our method the best chance of success.


Seitz, P., & Blokesch, M. (2013). DNA-uptake machinery of naturally competent Vibrio cholerae. Proceedings of the National Academy of Sciences, 110(44), 17987–17992.