Figure 1. A predicted structural model of the SALSA protein. The unstructured SID linkers separate the SRCR domains that bind to the teeth and are used by the bacteria to anchor on the tooth. In contrast to SRCR domains, the SID linkers are accessible for enzymes, making them a suitable targeting sequence for proteases.
After the alignment of all SIDs and its analysis, we were able to decide which proteases to consider based on their sequence specificity. Preteases' toxicity, availability, and cost were also considered (Supplementary alignment 1). Two candidate proteases, trypsin and prolyl peptidase, were identified as the most suitable enzymes based on these parameters. Trypsin cleaves peptide bonds from the C-terminal side of positively charged lysine and arginine residues, whereas prolyl peptidase cleaves C-terminally of proline residues. As SRCR-interspersed domains are highly enriched with proline residues (Supplementary alignment 1), they are the ideal substrate for prolyl peptidase. To set up the protease specificity assay, we also used the TEV protease as the model and control enzyme, as it is highly sequence-specific and thoroughly studied. While trypsin and prolyl peptidase have fewer determinants for the cleavage site, TEV protease specifically targets the sequence ENLYFQS/G.
Yeast surface display to engineer protein interactions
Yeast surface display is a method for high-throughput screening of proteins based on their interactions. Yeast surface display has been used to engineer antibodies for distinct targets (Cherf & Cochran, 2015), and a similar approach could be used for the protease and SALSA interaction. One way to target SALSA would be to engineer a protease that binds strongly to SALSA, assuming that an increased local concentration of the protease near SALSA would result in its cleavage. However, by conducting literature research, we found a peptide that binds to SALSA SRCR domains with high affinity and could be easily fused to the protease to drive the protease-SALSA interaction (Kelly et al., 1999). Moreover, the protease cleavage efficiency also depends on the protease active site-specificity, and we needed a method that would enable direct testing of the cleavage efficiency.
Modified yeast two-hybrid system to detect protease activity
We modified the yeast two-hybrid system to test the candidate proteases for cleaving the desired sequence. Canonical yeast two-hybrid screening aims to study protein-protein interactions (Wong et al., 2017a). The system includes a transcription factor, promoter, and reporter gene. The transcription factor consists of separate binding and activation domains and can only induce transcription when the two domains are close. In a two-hybrid system, each domain has a protein of interest fused to it. Therefore, when two proteins interact, the two transcription factor domains come closer to each other. It results in the activation of the promoter and transcription of the reporter gene, which is used as a readout of the assay. If the two proteins do not interact, there is no reporter gene transcription (Wong et al., 2017b). For our modified yeast two-hybrid system, we adopted the idea of a transcription factor consisting of two domains in proximity to each other separated by the linker (to be cleaved by the protease) (Figure 2). If the linker is cut, domains dissociate from each other, and transcription does not occur.
Figure 2. A modified yeast two-hybrid system to detect the cleavage of a linker sequence by a protease. The linker brings the DNA binding and activation domains of the transcription factor together. If cleaved, the transcription factor is inactivated, resulting in loss of transcription of the reporter gene.
We used the Golden Gate assembly MoClo yeast toolkit (Lee et al., 2015) to make the necessary constructs for our protease targeting assay. We constructed a system with three cassette plasmids, termed transcriptional units (Figure 3).
Figure 3. Transcriptional units in the yeast protease activity assay. TU1 - Transcriptional Unit 1 consists of sic1ΔN under the control of LexA binding promoter. TU2 - Transcriptional Unit 2 includes LexA binding domain (LexA BD) and activation VP16 domain separated by SRCR-SID linker from SALSA protein (under control of GAL1 promoter). TU3 - transcriptional unit 3 consists of protease fused to ligand facilitating protease binding to SALSA (under control of GAL1 promoter)
The first transcriptional unit (TU1) includes the coding sequence for sic1∆N that inhibits cell cycle progression, consequently terminating cell growth. sic1∆N is expressed from the promoter with the LexA binding sites. The transcription of sic1∆N is dependent on the transcription factor encoded in the second transcription unit (TU2). The second transcriptional unit (TU2) contains the transcription factor consisting of LexA binding domain and VP16 activation domain, just as in conventional two-hybrid. However, the two domains are separated by the SID linker from SALSA, and they are also fused with an SRCR domain. SRCR domain can be used as a separate binding partner for the engineered protease to increase the affinity of the protease for the transcription factor (explained below). The SID linker between the DNA binding and activation domains is the target site for the protease. As a proof of principle, in the positive control construct, we included the six amino acid long specific TEV protease cleavage site in the middle of the SID. In case the cleavage by TEV does not occur, we are notified that the problem lies in the experimental setup itself. This approach enables us to determine if our test system is reliable and specific. The final component, expressed as the third transcriptional unit (TU3), is the protease. To improve the specificity of our protease, we used a strategy to increase the affinity of the protease to SALSA (Figure 4). For this, we employed the mechanism that the cariogenic bacteria use to bind SALSA. S. mutans cell surface contains streptococcal antigen I/II (SA I/II) adhesin, which binds to salivary agglutinin. Studies mapped the residues in the C-terminus of SA I/II as the prominent region for binding to salivary receptors (Kelly et al., 1999). We used this sequence as a ligand to facilitate binding of the protease and the SRCR domain fused to the transcription factor. We also added the SV40 nuclear localization sequence to the protease to guide the protease to the nucleus, where the transcription factor functions.
Figure 4. Inactivation of the transcription factor by proteolytic digestion. The protease is fused with a peptide from S. mutans that binds the SRCR domain. The DNA binding and activation domains of the transcription factor are separated by the SID linker and SRCR domain from SALSA. The protease docks to the SRCR domain via the peptide from S. mutans and cleaves the SID linker that leads to the separation of the DNA binding and activation domains and, finally, inactivates the transcription factor.
All three transcription units have sites for homologous recombination to integrate the coding sequences into S. cerevisiae chromosomes: URA3homology for TU1, LEU2homology for TU2, and HO homology for TU3. Furthermore, we can assemble multigene plasmid from TU1, TU2, and TU3 using Golden Gate assembly and the aforementioned MoClo yeast toolkit. In this case, the integration vector inserts our coding sequences into ura 3-1 locus. Two possibilities for integration make our system versatile and make the integration of transcriptional units independent of each other. Therefore, during laboratory experiments, we do not necessarily need to wait to obtain all transcriptional units for yeast transformation. Our modified yeast two-hybrid system is intended to integrate into yeast S. cerevisiae chromosomes, which we approached in two ways: individual integration of each transcriptional unit into separate loci or combined integration of all three TUs into a single locus.