Bioinformatics
Introduction
For successful implementation of biocontainment, the process of
conjugation must be inhibited to fortify against horizontal gene
transfer between microbes. If a microbe with recombinant DNA is
capable of engaging in conjugation, it would be able to directly
transfer synthetic plasmids and nucleic acids to other microbes in
the wild, spreading it throughout the population. Therefore,
disabling conjugation is imperative to the biocontainment of
microbes intended for unsupervised release, such as in wild
environments or in gut microbiomes.
We devised a system to disable conjugative plasmids that hinges on
CRISPR-Cas9. We would create a small array that would code for sgRNA
that would target cleavage of conjugative plasmids in locations
essential to conjugation. Therefore, once this essential region is
broken, translation from said plasmid would be inhibited and the
conjugative plasmid would be rendered useless. In order to optimize
the number of sgRNA, we needed to discover highly conserved regions
across many conjugative plasmids to knockout function in the most
number of plasmids, with the least possibility of off target knockout.
To pool and sift through this computationally demanding task, we
created the bioinformatics team as a response.
Literature Review
To begin identifying regions of conservation on conjugative plasmids,
we first established a classification system to organize plasmids.
Research into plasmid classification revealed two ways plasmids are
characterized: the MOB typing system and the Incompatibility type system.
The MOB system partitions conjugative plasmids by their Relaxase,
an enzyme that is crucial to induce conjugation [1,2]. However, the
classifications are based upon the amino acid sequence of the relaxase,
and therefore are subject to codon degeneracy in their nucleotide
sequences, rendering the MOB system incompatible for our purposes.
The Incompatibility system partitions plasmids into groups depending on if
they can coexist with one another in the same environment [2,3]. If two
plasmids are too similar, they are rendered incompatible. Because there
is far more information on incompatibility groups than on MOB groups, we
were more reliably able to categorize plasmid sequences that were
collected. It is for this reason we focused on Incompatibility typing.
Among the incompatibility plasmids, we chose to focus on Incompatibility
P (IncP) plasmids due to the heavy characterization of the plasmid RP4
[4].
After narrowing our sites on the Incompatibility P (IncP)
plasmid family, we sought to identify regions on plasmids that are
heavily conserved. Initial research led us to a region of plasmids
known as the Origin of Transfer (OriT) site [4]. The OriT is a
nucleotide region on the conjugative plasmid where the Relaxase
enzyme attaches, initiating the process of conjugation of DNA from
the host to the new recipient. However, OriT regions were seldom
labelled on conjugative plasmids sequences. However, the conjugative
relaxase locks onto this region, and genes coding for relaxases are
far more frequently labelled.
This then led us to search for the
gene coding for the conjugative relaxase of IncP plasmids, that gene
being TraI. Then, we were able to identify the catalytic center of the
TraI conjugative relaxase, which is its Tyrosine 22 residue [5]. As
expected, the catalytic center is the most highly conserved amino acid
of the relaxase gene, and therefore codon degeneracy will be minimal in
this CDS region. Consequently, regions that we target Cas9 to knock out
will be highly conserved among many sites within the plasmid population.
Methods
To identify nucleotide conservation across populations of plasmids,
multiple methods were utilized to observe the degree of nucleotide
similarity between many plasmids. Clustal Omega was utilized to generate
multiple sequence alignments of many sizes, from up to 30 whole plasmid
sequences to hundreds of gene sequences. This technique was ultimately
our most successful, generating many alignments which were used for
exploratory data analysis as well as for our ultimate list of sequences
for Cas9 to target. All against all BLAST searches were also performed
against the datasets, the results of which were fed into the program
Cytoscape to generate sequence similarity networks [6]. These networks
are designed to show the relationships between plasmid sequences, but
analysis of the networks created from our all against all BLAST searches
showed no relationship between sequences and their relationships in the
network. As a consequence this novel method was scrapped. Phylogenetic
trees were also constructed from sequence alignment data using the program
RAxML [7]. Phylogenetic trees were initially constructed using small
sequence alignments, but these trees were likely subject to errors and
discarded. If there was time, it would have been used to create
phylogenies of gene sequences to analyze relationships between genes
instead of whole plasmids.
Conjugative plasmid sequences were
originally collected by hand using the NCBI nucleotide database [8],
as well as a plasmid database called the PLSDB Plasmid Database [9].
The PLSDB allowed us to collect plasmid sequences categorized by
Incompatibility type, which was of use for both intra- and inter-family
sequence alignments. The NCBI nucleotide database was used again
later in the process to procure sequences of particular genes such
as TraI as opposed to entire plasmids to create more applicable sequence
alignments.
We used Benchling to generate and visualize small
sequence alignments composed of one to five plasmids. Clustal Omega [10]
was used to generate multiple sequence alignments of larger plasmid
datasets including the complete list of all gene-specific alignments.
These larger alignments were then visualized using the program Mega-X
[11].
Multiple Sequence Alignment
Multiple sequence alignment initially served as a method to identify
regions of conservation that may not have been covered in initial
literature review about conjugative plasmid sequences and their gene
contents. We first took whole plasmid genomes and aligned them against
one another to identify if and where there were continuous sequences
of nucleotide homology. The nucleotide sequences were then BLAST
searched, on the NCBI website, to determine the genes and regions
most commonly conserved in said alignment.
Initially this method allowed us to observe conserved nucleotide sequences
in small alignments of 2-5 sequences on Benchling. These analyses were
performed on Benchling to visualize the degree of homology between and
among plasmid families. These alignments gave mixed results however:
alignments were incredibly imprecise, with plasmids of the same family
frequently failing to align. This challenge with aligning plasmids of
the same species led us to question our continued use of benchling for
even preliminary sequence alignments. This inability to accurately
align plasmids of the same family was presumed to be due to their
immense size in combination with hardware limitations, given that
benchling is a web browser-based application. With this hypothesis,
we presumed we would need stronger computational abilities to run the
alignments we wanted. As a consequence, we began creating our alignments
with Clustal Omega.
Clustal Omega was used to construct larger
sequence alignments, which gave us a more detailed picture of the degree
of nucleotide homology among conjugative plasmids. Clustal was used to
create our initial 24 sequence IncP plasmid alignment, as well as our
38-sequence multi-family alignment. Still, errors and software issues
persisted. Most noticeably, issues regarding memory allocation began
appearing, suggesting that we did not supply the software with enough
memory to run the above alignments. However, discussion with other
bioinformaticians revealed that alignments of much larger size could
be run with the parameters we had set: the only difference is their
alignment used many small sequences rather than a few large sequences.
Persistently stonewalled by these errors, we began to look into the
validity of our current method of data exploration: performing
alignments of large sequences to then comb through the resulting
alignment for highly conserved sequences.
However, this method
was flawed. After researching our problem, we discovered that multiple
sequence alignment programs were not designed to align sequences above
a few thousand base pairs. When large alignments are performed, many
software alter parameters to expedite the process of alignment at the
expense of alignment accuracy or precision. The whole plasmid sequences
we were running up to that point regularly topped over 100,000 base
pairs. As a consequence, the alignments produced from such sequences
were likely hard-capped at a small size. Though merely 20 to 30
sequences may inform us of basic insights, we wanted our alignments
to pull from larger sample sizes to rigorously address the question.
To rectify this limitation, we decided to only align gene sequences,
given their small size. Now, instead of using multiple sequence
alignment to discover conserved sites, we decided to simply perform
a deeper exploration of literature to identify notable regions of
conservation, rather than finding these conserved sites by the
alignment of whole plasmid sequences. Through this process, we were
able to identify a region of interest: the TraI gene of Incompatibility
P conjugative plasmids, which codes for an enzyme known as the
relaxase. This enzyme is pivotal to conjugation, and therefore posed
as an excellent target for knockout.
Results
The bioinformatics sub-team was able to perform a gene homology assessment on 284 plasmids in the Incompatibility P family. We addressed the homology of the TraI gene, which codes for the conjugative relaxase, an enzyme essential for the process of conjugation and DNA transfer . We discovered the catalytic center of the relaxase, Tyrosine 22 [8] , and created a list of 10 sequences to target for knock-out with CRISPR-Cas9. Our consensus sequence is as follows:
AAGGATAATTACTATGTGCTGGG
Another 9 sequences have been selected from the IncP plasmid multiple sequence alignment, being the following:
AAAGACAATTACTACGTCATCGG
AAGGATAATTACTACGTCATAGG
AGGGATAACTACTACGTGCTGGG
AAGGATAATTACTATGTACTGGG
TCCGATAACTACTATTTTCTGGG
AAGGATAATTACTATGTCTTGGG
AAAGATAATTACTACGTCATCGG
GAAGACAACTACTATGCCAGCGG
AAGGATAATTACTACGTTATCGG
These sequences, in combination, should make for excellent targets for Cas9. The consensus sequence should cleave a plurality of IncP plasmids in this essential region, while the rest act as redundancies in the system to ensure cleavage in any IncP plasmids with nucleotide substitutions due to codon degeneracy. With this nucleotide region knocked out, the relaxase should cease to function, failing to bind to the conjugative plasmid’s nucleotide backbone. As a consequence, the relaxase will not bond to the conjugative DNA, and will not catalyze its unfurling from circular to linear form [5]. As a consequence, plasmids will not be able to replicate for the purposes of conjugation, ceasing their proliferation through conjugation, This excision should simultaneously induce a frameshift mutation, further impacting the relaxase’s ability to function. It is hoped that effects downstream would further impact the relaxase, and by extension the relaxosome’s ability to function and catalyze the process of transfer. Together, we hope that disabling this region will inhibit the process of conjugation enough to strengthen the biocontainment of our host significantly.
Future Avenues
Once our new methodology was secured and proven to work in creating alignments, only a few weeks were left in the program. We would be thrilled if our method of identifying plasmids amenable to knockout by CRISPR-Cas9 could be utilized to identify regions of homology in other plasmid incompatibility families, such as the IncN, IncW, and IncX families, which have considerable structural similarities in their conjugative apparatus [12]. This system of knockout targets was possible with IncP plasmids due to the single catalytic center. To disable the IncP TraI relaxase, only one amino acid needs to be knocked out, that being the catalytic center. A close relative of the TraI relaxase, the TrwC relaxase, requires two separate amino acids to be knocked out in order to cease functioning [13]. Therefore, regions that can only be disabled with multiple knockouts are unsuited for the conjugation prevention system. If there was more time, we would have developed a more rigorous system to choose the binding regions, perhaps using a script to comb through all possible targeting regions to determine the most frequent nucleotide permutations in the region targeted. Finally, other regions than the conjugative relaxase could potentially be investigated for knock-out purposes. Other members of the TraI gene family could be targeted to produce similar incapacitation of the conjugative apparatus as is caused by targeting TraI.
References
- Garcillán-Barcia MP, Francia MV, de La Cruz F. 2009. The diversity of conjugative relaxases and its application in plasmid classification. FEMS Microbiology Reviews 33:657–687.
- Shintani M, Sanchez ZK, Kimbara K. 2015. Genomics of microbial plasmids: Classification and identification based on replication and transfer systems and host taxonomy. Frontiers in Microbiology 6.
- Novick , R. P. (1987). Plasmid incompatibility . Microbiol. Rev . 51 , 381 - 395.
- Pansegrau W, Balzer D, Kruft V, Lurz R, Lanka E. 1990. In vitro assembly of relaxosomes at the transfer origin of Plasmid RP4. Proceedings of the National Academy of Sciences 87:6555–6559.
- Pansegrau W, Schroder W, Lanka E. 1993. Relaxase (TraI) of INCP alpha plasmid RP4 catalyzes a site-specific cleaving-joining reaction of single-stranded DNA. Proceedings of the National Academy of Sciences 90:2925–2929.
- Shannon P. 2003. Cytoscape: A software environment for integrated models of Biomolecular Interaction Networks. Genome Research 13:2498–2504.
- Stamatakis A. 2014. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312–1313.
- Sayers EW, Beck J, Bolton EE, Bourexis D, Brister JR, Canese K, Comeau DC, Funk K, Kim S, Klimke W, Marchler-Bauer A, Landrum M, Lathrop S, Lu Z, Madden TL, O’Leary N, Phan L, Rangwala SH, Schneider VA, Skripchenko Y, Wang J, Ye J, Trawick BW, Pruitt KD, Sherry ST. 2018. Database Resources of the National Center for Biotechnology Information. Nucleic Acids Research 49.
- Galata V, Fehlmann T, Backes C, Keller A. 2018. PLSDB: A resource of complete bacterial plasmids. Nucleic Acids Research 47.
- Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Söding J, Thompson JD, Higgins DG. 2011. Fast, scalable generation of high‐quality protein multiple sequence alignments using Clustal Omega. Molecular Systems Biology 7.
- Kumar S, Stecher G, Li M, Knyaz C, Tamura K. 2018. Mega X: Molecular evolutionary genetics analysis across computing platforms. Molecular Biology and Evolution 35:1547–1549.
- Schröder G, Lanka E. 2005. The mating pair formation system of conjugative plasmids—a versatile secretion machinery for transfer of proteins and DNA. Plasmid 54:1–25.
- Grandoso G, Avila P, Cayón A, Hernando MA, Llosa M, de la Cruz F. 2000. Two active-site tyrosyl residues of protein TrwC Act sequentially at the origin of transfer during plasmid R388 conjugation. Journal of Molecular Biology 295:1163–1172.