Team:Sydney Australia/Design


The design for USYD's Free coli involves transforming twenty-three genes from A. baylyi into E. coli, informed by literature searches. The genes are as follows:

Table 1.Names of the twenty-three genes from A. baylyi to be inserted into JM109 E. coli cells to induce natural transformation (NT). Genes were first acquired from Vesel and Blokesch (2021), who BLASTed NT genes from A. baumannii against the ADP1 genome. They were then identified via locus tags.

These genes will be organised into eight clusters, which will be then sequentially inserted into the E. coli chromosome using a novel recombineering strategy (described below). After each cluster insertion, the natural transformability of the new E. coli strain will be tested. This will occur until the final cluster, after which our findings and results will be analysed and presented. Our design stages can be divided into:

  1. Gene selection
  2. Novel recombineering strategy (Babushka Blocks)
  3. Promoter design
  4. Primer design
  5. Cluster design
  6. Selectable marker design
  7. Novel parts

Gene Selection

E. coli actually has most of the transformation and competency genes, but are largely unexpressed under lab conditions, and attempts to induce their expression have not been successful (Sinha & Redfield, 2012).

Thus, we decided that the entire natural transformation system of another phylogenetically similar bacteria could be incorporated into E. coli. We ended up deciding on Acinetobacter baylyi, not only as an evolutionarily similar species to E. coli, but one of lower pathogenicity than other relatives (Chen et al., 2008).

The genes involved in transformation and competency in Acinetobacter baylyi consist of 23 genes. Several research studies (Busch et al., 1999; Friedrich et al., 2001; Seitz & Blokesch, 2013) have highlighted that competence genes, i.e. genes beginning with 'com', are absolutely crucial for natural transformation. However, if the entire pilus structure is functional, that potentially increases uptake rates 10,000 fold (Seitz & Blokesch, 2013).

Thus, our method includes all 23 genes.

Cluster Insertion

The next step is determining the method for inserting these genes of interest into the E. coli chromosome. There are several initial challenges, detailed below:

  • The genes cannot be directly excised out of A. baylyi and inserted in E. coli. Only a few genes like ComN/ComO/ComL cluster together and could be removed in one go, and even if this was done, there are too many illegal restriction enzyme sites within the genes themselves to be edited.
  • The genes cannot be inserted all at once, or individually. Twenty-three iterations of cloning genes into cells is too lab-intensive, and the genes cannot realistically be all fused together and inserted in one go, as that would be more than 30kb of linear DNA

These two problems can be solved with the use of synthetic gene clusters. gBlocks can be ordered from TWIST, and can contain the sequences of the genes that we want to insert. This means that any gene can be clustered together with any other, and additional edits like removing restriction enzyme sites can be done before ordering them. As this particular company set a 5kb limit on the size of gene clusters to order, that was the general limit we set our clusters to be at.

We completed k-means clustering to determine the genes in each cluster, as well as their order, which is described in the modelling page. This yielded eight clusters to insert.

From there, we can now deal with the problem of inserting the clusters into E. coli. The following problems arise:

  • The genes needed to be inserted non-stochastically, meaning they could not be inserted randomly into the chromosome. This could disrupt the function of other crucial genes.
  • Each gene cluster needed to contain a selectable marker, but we did not want these to remain in the final product. We need to be able to select colonies that had the cluster successfully transformed, but cells with resistance to multiple antibiotics would not be ideal.
  • The clusters needed to be non-sequential. This means that if one cluster did not work, we would be able to skip the cluster and insert the next one. If our insertion design did not account for this, then it solely depends on all eight clusters being successfully incorporated, which we cannot assume.

With these in mind, we developed a novel recombineering strategy called Babushka Blocks. Named after the Russian dolls that fit into one another, our idea involves sequentially inserting each gene cluster inside each other. This controls where they are being inserted, knocks out the antibiotic resistance marker of the previous block, and with primers, allows for clusters to be inserted in any order!

We opted for recombineering because it removes the restraints of restriction enzyme sites, and large fragments of DNA can be inserted. A study by Juhas and Ajioka (2016) has shown that up to 50kb of DNA in total could be inserted into E. coli with high efficiency and "did not have a negative effect on the growth of E. coli."

The particular recombineering strategy we have employed in our design is the bacteriophage λ Red recombineering system, and we are inserting our gene clusters into the fliK gene.

  • In the same study by Juhas and Ajioka (2016), the fliK gene in the E. coli was shown to be an optimal location for recombineering, and 15kb was successfully inserted there in one iteration.
  • The bacteriophage λ Red recombineering system is described in the diagram below, and many strains of E. coli have these systems already in place (Sharan et al., 2009). We have decided to use the JM109 strain and the recombineering functions were going to be brought in by the pKD46 plasmid.

Figure 1. Recombineering system using the bacteriophage λ Red system. According to Sharan et al. (2009), the homology arms need to be only 50bp for successful recombination, Additionally, only three genes, gam, bet and exo, are involved. The gene product of gam "prevents an E. coli nuclease, RecBCD, from degrading linear DNA fragments", which allows for linear DNA to survive in vivo for recombination. The roles of exo and bet are shown above, with the gene product of bet, Beta, being an "ssDNA binding protein" and exo having "5' to 3'dsDNA exonuclease activity". This method was first described by Murphy (1998).

With that in mind, let's see the Babushka block design in action:

The above diagram is an initial look into the Babushka blocks.

  • The first structure is known as the fliK landing pad. We have designed the ends of this block to be homologous with regions in the fliK gene, so that this structure can be inserted into the chromosome at a designated location. The nahR gene is there to assist with gene expression regulation, and more information can be found in the promoter design section.
  • The second block shown is the first gene cluster, which will be inserted into the landing pad. All gene clusters have been designed to have a common 5' homology arm (always grey), the genes, a unique homology arm (orange here), a selectable marker, and then a common 3' homology arm (always purple).
  • The final structure is the primer. This will help the recombination of clusters into the E. coli chromosome. More information on its design can be found in the promoter design section.

To insert into the fliK landing pad, Cluster 1 must be first hybridised with the primer:

As we can see now, there are two matching homology arms between the landing pad and Cluster 1: the red and purple arms. As a result, Cluster 1 is able to be inserted into the landing pad, and the end result has the genes of interest inside the landing pad:

This insertion removes selectable marker A of the previous cluster and instead replaces it with the genes, the new homology arm, and a different selectable marker B.

Let's see this with the next cluster:

An orange primer has been instead hybridised with Cluster 2, since that is the homology arm in Cluster 1. The orange and purple arms match, and antibiotic resistance marker B would be replaced by the new cluster. With this design, it's the purple 3' homology arm that anchors where the cluster is inserted, and each new cluster will bring in a new, different homology arm that the next cluster will match with too.

You may be wondering at this point why the primers exist, and why we need to have the additional step of hybridising them to clusters. Imagine if the cells started dying right after Cluster 2's insertion, so we needed to skip it for now and insert Cluster 3 after Cluster 1:

As the primers all attach to the gene clusters using the same grey homology arm, any cluster can be attached to the orange primer here! With primers, clusters can be non-sequentially inserted if something went wrong, which overcomes the last hurdle of this design. So long as the selectable markers are unique, then it can always be inserted.

Thus, using the Babushka Block method, all eight clusters will be able to be inserted into the E. coli chromosome.

Promoter Design

A key aspect of this project was the design of the promoters. We wanted to address the following concerns with our design:

  • We wanted the natural transformation system to be tightly inducible. This would help during the recombineering of Free Coli, as cells would not immediately die from the gene products of a cluster after insertion if they were toxic. Additionally, from a safety perspective, labs planning on using Free coli can minimise the risk of accidentally transforming dangerous DNA fragments by keeping the system turned off.
  • We wanted the clusters to be controlled by the same inducible promoter. If there was a different inducible promoter for each gene cluster, it would mean that users of Free coli would need to expose their cells to eight different chemicals to induce natural transformation. Not only is this logistically challenging, but is also counterintuitive to the idea that Free coli is cheaper than competent cells.
  • We want each cluster to have different expression levels. As described in the modelling page, genes were clustered based on expression levels in A. baylyi. Thus, we should aim to reflect these expression levels as accurately as possible.

We have designed a novel promoter design that addresses all of these! We began with a system that uses the nahR gene and sal promoters to activate transcription in response to the inducer salicylate. It is shown below:
Figure 2. Inducible promoter design. According to Cebolla et al. (1997), nahR produces a transcription factor that controls the expression of genes regulated by sal promoters. In the presence of salicylate, expression of those genes is facilitated.

From this, we have made the following iterations, informed by the results of two studies:

We have incorporated variants to generate different expression levels. A study has described how subtle base changes in the Psal promoter can induce different expression levels (Schell & Poser, 1989), and this is described below:

In our design, we have used the 34% variant and the WT 100% variant to match the expression levels of our genes.

We have simultaneously used a variant of the Psal promoter dubbed PsalTTC (Meyer et al., 2018), which provides tighter repression and greater expression when fully induced. It is simply the addition of T residues and a C residue in the promoter, and since it targets a region outside of the region described in the Schell paper, we have decided to use this too.

We have thus combined these two findings to create our own promoter system. The nahR gene is inserted right at the start with the fliK landing pad, described in Cluster insertion. Each gene cluster is controlled by a single Psal promoter, and depending on the average expression level of the cluster, it would be assigned the 34% or the 100% variant. Further information on this can be found under the documentation for our composite parts but a table can be found below of our allocations:
Table 2. Assigned PsalTTC variants to gene clusters.

Primer Design

A key element of the Babushka Block design is the use of primers.

As described in Sharan et al. (2009), only 50bp is needed for homologous recombination. It would be very simple to randomly generate 50bp seven times for the seven different cluster insertions (besides Cluster 1) and use them directly as primers. However, in primer form, these homology arms have a great risk of creating unwanted self-dimerisation and hairpin formations. This can reduce the ability for the primers to function optimally and hybridise with clusters.

To mitigate this, two programs were used to generate ideal primers: Primer3 (Untergasser et al., 2012) and PrimerSelect (Graham & Holland, 2005). Primer3 helped to generate an initial primer to work with, while PrimerSelect was used to determine the likelihood of self-dimer and hairpin formations, measured in kc/m. We aimed to have each primer have self-dimer scores of greater than -2kc/m and hairpin scores of 1kc/m (the higher score the better, as it is energetically less likely to form). Note that these thresholds are just a rule of thumb, but could ensure that the primers would work as effectively as possible.

The following method was employed:
1. Generate 5kb of randomised DNA at 0.45 GC content at
2. Insert into Primer3 ( with the following settings:

3. Find the outlined probe, and arbitrarily capture bases upstream/downstream of it to make up 50bp (the website caps it at 36bp). Avoid repetitive regions and long regions of the same base (e.g. ATATA, TTTTTTT) as that leads to worse self-dimer and hairpin scores.
4. Insert into PrimerSelect and observe initial dimerisation/hairpin scores. If they are too large (negative double digits) redo steps 1-3. The closer it is to the optimal values, the easier the following steps will be.
5. Observe the bases that are causing dimerisation/hairpin formation. From there, change one of these bases into a different one and repeat step 4 to see if it improves/worsens the values. Repeat till the values are satisfactory.

Using this method, most primers had self-dimer formation scores of >-2kc/m (with some as high as -1.5kc/m) and hairpin formation scores of >1kc/m (with some as high as 2.5kc/m)!

Further information on this can be found under the documentation for our basic parts but a table can be found below of primer scores:

Table 3. Self-dimerisation and hairpin formation scores of primers used to attach clusters together. Values determined from PrimerSelect (Graham & Holland, 2005).

Thus, the use of programs Primer3 and PrimerSelect has assisted us in the creation of primers that have lower chances of forming self-dimers and hairpins.

Selectable Marker Design

Selectable markers need to be used in each gene cluster to be able to select for colonies that have successfully incorporated them into the E. coli chromosome. As described in Cluster Insertion, the Babushka doll method requires all the clusters to have a unique antibiotic resistance marker. There are many selectable markers available, but we needed to also consider that the clusters need to mostly stay under 5kb as well.

We found that the following configuration was possible:

* the MalS gene converts starch into simpler sugars, and can be used in lieu of an antibiotic resistance gene to select for transformed colonies. If the bacteria are grown on minimal media with starch, only those expressing the MalS gene will be able to survive. To avoid Free coli having any kind of antibiotic resistance, MalS could be used as the final selectable marker. This would be possible if Cluster 8 was inserted first, then Cluster 7, but without experimental data, it is not possible to determine the viability of this as of now.

Cluster 4 and 8 both exceed the 5kb limit by a small margin, however, they would still be able to be synthesised as gBlocks if the lab component was possible.

Cluster Design

The following describe the general practices involved when constructing the gene clusters on SnapGene® software (from Insightful Science; available at

  • Restriction enzyme sites were removed from the transformation genes manually. The bases that were changed were chosen to most closely match the codon frequency of the original codon, as to minimise its effect on the expression of the gene.
  • A Bba_B0024 terminator was inserted between the selectable marker and the 3' homology arm. This ensured that transcription of one cluster didn't affect other clusters or genes.
  • The genes were ordered from highest to lowest expression levels within the clusters. This was so that higher expressed genes were in closer proximity to the promoter. However, if genes were stuck together, meaning that the genes themselves or their RBSs overlapped, they were not separated and re-ordered, and their natural order was preserved.
  • Each cluster began with a KpnI restriction enzyme site, had a HindIII site between the last A. baylyi gene and the homology arm for the next cluster, and finally a BamHI site between the selectable marker and the terminator before the 3' homology arm. This would allow for any necessary cuts/breaks to be made in the cluster, if say the cluster wasn't working optimally and experimental diagnostics would need to be conducted to determine the problem with it.
  • References

    Busch, S., Rosenplänter, C., & Averhoff, B. (1999). Identification and Characterization of ComE and ComF, Two Novel Pilin-Like Competence Factors Involved in Natural Transformation of Acinetobacter sp. Strain BD413. Applied and Environmental Microbiology, 65(10), 4568–4574.

    Cebolla, A., Sousa, C., & de Lorenzo, V. (1997). Effector Specificity Mutants of the Transcriptional Activator NahR of Naphthalene Degrading Pseudomonas Define Protein Sites Involved in Binding of Aromatic Inducers. Journal of Biological Chemistry, 272(7), 3986–3992.

    Chen, T. L., Siu, L. K., Lee, Y. T., Chen, C. P., Huang, L. Y., Wu, R. C. C., Cho, W. L., & Fung, C. P. (2008). Acinetobacter baylyi as a Pathogen for Opportunistic Infection. Journal of Clinical Microbiology, 46(9), 2938–2944.

    Coleman, N., & Somerville, M. (2019, May). The Story of Free Use GFP (fuGFP). Small Things Considered.

    Friedrich, A., Hartsch, T., & Averhoff, B. (2001). Natural Transformation in Mesophilic and Thermophilic Bacteria: Identification and Characterization of Novel, Closely Related Competence Genes in Acinetobacter sp. Strain BD413 and Thermus thermophilus HB27. Applied and Environmental Microbiology, 67(7), 3140–3148.

    Graham, K. J., & Holland, M. J. (2005). PrimerSelect: A Transcriptome-Wide Oligonucleotide Primer Pair Design Program for Kinetic RT-PCR–Based Transcript Profiling. Methods in Enzymology, 544–553.

    Juhas, M., & Ajioka, J. W. (2016). Lambda Red recombinase-mediated integration of the high molecular weight DNA into the Escherichia coli chromosome. Microbial Cell Factories, 15(1).

    Meyer, A. J., Segall-Shapiro, T. H., Glassey, E., Zhang, J., & Voigt, C. A. (2018). Escherichia coli “Marionette” strains with 12 highly optimized small-molecule sensors. Nature Chemical Biology, 15(2), 196–204.

    Murphy, K. C. (1998). Use of Bacteriophage λ Recombination Functions To Promote Gene Replacement in Escherichia coli. Journal of Bacteriology, 180(8), 2063–2071.

    Schell, M. A., & Poser, E. F. (1989). Demonstration, characterization, and mutational analysis of NahR protein binding to nah and sal promoters. Journal of Bacteriology, 171(2), 837–846.

    Seitz, P., & Blokesch, M. (2013). DNA-uptake machinery of naturally competent Vibrio cholerae. Proceedings of the National Academy of Sciences, 110(44), 17987–17992.

    Sharan, S. K., Thomason, L. C., Kuznetsov, S. G., & Court, D. L. (2009). Recombineering: a homologous recombination-based method of genetic engineering. Nature Protocols, 4(2), 206–223.

    Sinha, S., & Redfield, R. J. (2012). Natural DNA Uptake by Escherichia coli. PLoS ONE, 7(4), e35620.

    Untergasser, A., Cutcutache, I., Koressaar, T., Ye, J., Faircloth, B. C., Remm, M., & Rozen, S. G. (2012). Primer3—new capabilities and interfaces. Nucleic Acids Research, 40(15), e115.

    Vesel, N., & Blokesch, M. (2021). Pilus Production in Acinetobacter baumannii Is Growth Phase Dependent and Essential for Natural Transformation. Journal of Bacteriology, 203(8).