Team:Estonia TUIT/Engineering



We developed a testing system that would allow us to engineer a specific protease for the cleavage of the SALSA protein. To target SALSA, we first carried out a structural analysis of the SALSA protein to estimate which sequences are on the surface of the protein and accessible for proteolytic cleavage. Having identified the regions to target, we were then tasked with creating an effective, specific, and affordable method of carrying out this cleavage. For this, we developed a technique that enables us to screen for proteases that target distinct sequences. The method allows for high-throughput screening where the protease cleavage efficiency is translated to yeast growth rate as a simple readout.

SALSA structure

The SALSA protein consists of 17 folded domains separated by unstructured linker regions, raising the question of which regions to choose as the target sequence for the protease. The scavenger receptor cysteine-rich (SRCR) domains are of crucial importance for this project, as they are involved in both SALSA binding to the tooth and bacteria binding to the SALSA protein (Bikker et al., 2002). According to the three-dimensional structure of the SALSA protein (AlphaFold Protein Structure Database), the SRCR domains are globular domains that are not easily accessible to proteases (Figure 1). The linkers, or SRCR-interspersed domains (SIDs), however, are unstructured and can potentially be easily cleaved. SIDs are roughly 20-amino-acid-long threonine-serine-proline-rich stretches consisting of a number of glycosylation sites that proposedly render the linkers to extended conformation, spanning 7 nm (Reichhardt et al., 2020; Turenchalk & Xu, 2001).

Figure 1. A predicted structural model of the SALSA protein. The unstructured SID linkers separate the SRCR domains that bind to the teeth and are used by the bacteria to anchor on the tooth. In contrast to SRCR domains, the SID linkers are accessible for enzymes, making them a suitable targeting sequence for proteases.

Target sequences

After the alignment of all SIDs and its analysis, we were able to decide which proteases to consider based on their sequence specificity. Preteases' toxicity, availability, and cost were also considered (Supplementary alignment 1). Two candidate proteases, trypsin and prolyl peptidase, were identified as the most suitable enzymes based on these parameters. Trypsin cleaves peptide bonds from the C-terminal side of positively charged lysine and arginine residues, whereas prolyl peptidase cleaves C-terminally of proline residues. As SRCR-interspersed domains are highly enriched with proline residues (Supplementary alignment 1), they are the ideal substrate for prolyl peptidase. To set up the protease specificity assay, we also used the TEV protease as the model and control enzyme, as it is highly sequence-specific and thoroughly studied. While trypsin and prolyl peptidase have fewer determinants for the cleavage site, TEV protease specifically targets the sequence ENLYFQS/G.

Yeast surface display to engineer protein interactions

Yeast surface display is a method for high-throughput screening of proteins based on their interactions. Yeast surface display has been used to engineer antibodies for distinct targets (Cherf & Cochran, 2015), and a similar approach could be used for the protease and SALSA interaction. One way to target SALSA would be to engineer a protease that binds strongly to SALSA, assuming that an increased local concentration of the protease near SALSA would result in its cleavage. However, by conducting literature research, we found a peptide that binds to SALSA SRCR domains with high affinity and could be easily fused to the protease to drive the protease-SALSA interaction (Kelly et al., 1999). Moreover, the protease cleavage efficiency also depends on the protease active site-specificity, and we needed a method that would enable direct testing of the cleavage efficiency.

Modified yeast two-hybrid system to detect protease activity

We modified the yeast two-hybrid system to test the candidate proteases for cleaving the desired sequence. Canonical yeast two-hybrid screening aims to study protein-protein interactions (Wong et al., 2017a). The system includes a transcription factor, promoter, and reporter gene. The transcription factor consists of separate binding and activation domains and can only induce transcription when the two domains are close. In a two-hybrid system, each domain has a protein of interest fused to it. Therefore, when two proteins interact, the two transcription factor domains come closer to each other. It results in the activation of the promoter and transcription of the reporter gene, which is used as a readout of the assay. If the two proteins do not interact, there is no reporter gene transcription (Wong et al., 2017b). For our modified yeast two-hybrid system, we adopted the idea of a transcription factor consisting of two domains in proximity to each other separated by the linker (to be cleaved by the protease) (Figure 2). If the linker is cut, domains dissociate from each other, and transcription does not occur.

Figure 2. A modified yeast two-hybrid system to detect the cleavage of a linker sequence by a protease. The linker brings the DNA binding and activation domains of the transcription factor together. If cleaved, the transcription factor is inactivated, resulting in loss of transcription of the reporter gene.

Transcriptional units

We used the Golden Gate assembly MoClo yeast toolkit (Lee et al., 2015) to make the necessary constructs for our protease targeting assay. We constructed a system with three cassette plasmids, termed transcriptional units (Figure 3).

Figure 3. Transcriptional units in the yeast protease activity assay. TU1 - Transcriptional Unit 1 consists of sic1ΔN under the control of LexA binding promoter. TU2 - Transcriptional Unit 2 includes LexA binding domain (LexA BD) and activation VP16 domain separated by SRCR-SID linker from SALSA protein (under control of GAL1 promoter). TU3 - transcriptional unit 3 consists of protease fused to ligand facilitating protease binding to SALSA (under control of GAL1 promoter)

The first transcriptional unit (TU1) includes the coding sequence for sic1∆N that inhibits cell cycle progression, consequently terminating cell growth. sic1∆N is expressed from the promoter with the LexA binding sites. The transcription of sic1∆N is dependent on the transcription factor encoded in the second transcription unit (TU2). The second transcriptional unit (TU2) contains the transcription factor consisting of LexA binding domain and VP16 activation domain, just as in conventional two-hybrid. However, the two domains are separated by the SID linker from SALSA, and they are also fused with an SRCR domain. SRCR domain can be used as a separate binding partner for the engineered protease to increase the affinity of the protease for the transcription factor (explained below). The SID linker between the DNA binding and activation domains is the target site for the protease. As a proof of principle, in the positive control construct, we included the six amino acid long specific TEV protease cleavage site in the middle of the SID. In case the cleavage by TEV does not occur, we are notified that the problem lies in the experimental setup itself. This approach enables us to determine if our test system is reliable and specific. The final component, expressed as the third transcriptional unit (TU3), is the protease. To improve the specificity of our protease, we used a strategy to increase the affinity of the protease to SALSA (Figure 4). For this, we employed the mechanism that the cariogenic bacteria use to bind SALSA. S. mutans cell surface contains streptococcal antigen I/II (SA I/II) adhesin, which binds to salivary agglutinin. Studies mapped the residues in the C-terminus of SA I/II as the prominent region for binding to salivary receptors (Kelly et al., 1999). We used this sequence as a ligand to facilitate binding of the protease and the SRCR domain fused to the transcription factor. We also added the SV40 nuclear localization sequence to the protease to guide the protease to the nucleus, where the transcription factor functions.

Figure 4. Inactivation of the transcription factor by proteolytic digestion. The protease is fused with a peptide from S. mutans that binds the SRCR domain. The DNA binding and activation domains of the transcription factor are separated by the SID linker and SRCR domain from SALSA. The protease docks to the SRCR domain via the peptide from S. mutans and cleaves the SID linker that leads to the separation of the DNA binding and activation domains and, finally, inactivates the transcription factor.

All three transcription units have sites for homologous recombination to integrate the coding sequences into S. cerevisiae chromosomes: URA3homology for TU1, LEU2homology for TU2, and HO homology for TU3. Furthermore, we can assemble multigene plasmid from TU1, TU2, and TU3 using Golden Gate assembly and the aforementioned MoClo yeast toolkit. In this case, the integration vector inserts our coding sequences into ura 3-1 locus. Two possibilities for integration make our system versatile and make the integration of transcriptional units independent of each other. Therefore, during laboratory experiments, we do not necessarily need to wait to obtain all transcriptional units for yeast transformation. Our modified yeast two-hybrid system is intended to integrate into yeast S. cerevisiae chromosomes, which we approached in two ways: individual integration of each transcriptional unit into separate loci or combined integration of all three TUs into a single locus.

Protease activity inactivates expression of a growth repressor

Both TU2 and TU3 contain GAL1promoters. Therefore, the second and the third transcriptional units can be expressed simultaneously, in the presence of galactose (and strongly repressed in the glucose-containing media). When the protease cuts the SID linker in the transcription factor, it separates the LexA and VP16 domains, inactivating the transcription factor and suppressing the expression of sic1∆N (Figure 5). In this case, the cells grow normally, indicating that protease has efficiently cleaved the target sequence. If the protease fails to cut the SID linker, then the transcription factor binds to the LexA binding promoter, leading to the expression of the sic1∆N inhibitor, cell cycle arrest and suppression of cell division.

Figure 5. Schematic representation of the system used to assess protease ability to cut the target sequence. sic1∆N (an inhibitor of cell cycle progression) expression depends on the activation by a transcription factor that consists of two domains: LexA binding domain and VP16 activation domain that are bound by the linker to be cut by the protease. If the protease is unable to cleave the peptide, the transcription factor initiates sic1∆N expression that leads to cell cycle arrest, stopping yeast growth. In case of efficient cleavage, the two domains of the transcription factor dissociate, and sic1∆N expression does not occur, allowing normal cell growth. In the case of non-efficient cutting, sic1∆N expression is low, resulting in slow yeast growth.

High-throughput screening of proteases

After testing our candidate proteases and finding the ones that cleave the SID linkers most efficiently, we plan to implement error-prone PCR to create proteases’ libraries and to improve their specificity for the SID linkers (Figure 6). Error-prone PCR is a random mutagenesis technique for introducing mutations in genes. Mutations appear due to the work of polymerase with decreased proofreading ability and the special conditions of the polymerase chain reaction — namely, the increased concentrations of Mg2+, the addition of Mn2+ ions, and an unbalanced ratio of nucleotides. The number of cycles can control the frequency of the mutations incorporated. The resultant PCR products are cloned into expression vectors, which generate the library for screening changes in protein activity and identifying beneficial mutations (Tao & Raz, 2015). As the screening in our case is based on the positive selection using a simple readout, yeast growth, this is a powerful tool to isolate rare variants that have increased efficiency for cleaving the SID linkers. Also, the assay selects for specific proteases, as proteases with very broad specificity will target additional yeast proteins, causing off-target toxicity and their depletion in the competitive growth assay. This attribute is also beneficial for our project to obtain a protease with minimal off-target effects when used for oral hygiene.

Figure 6. Our assay allows engineering a protease for a desired sequence or alternatively, mapping the specificity determinants of a protease. In the first care (left panel), a library of different proteases can be used or a mutation library can be generated using error-prone PCR approach (left panel), followed by selection of the enzymes based on their cleavage efficiency of a desired sequence. Alternatively, a library of target sequences in the transcription factor can be generated and screened in a strain expressing the protease of interest (right panel).

AlphaFold Protein Structure Database. (n.d.). Retrieved October 16, 2021, from

Bikker, F. J., Ligtenberg, A. J. M., Nazmi, K., Veerman, E. C. I., van’t Hof, W., Bolscher, J. G. M., Poustka, A., Amerongen, A. V. N., & Mollenhauer, J. (2002). Identification of the Bacteria-binding Peptide Domain on Salivary Agglutinin (gp-340/DMBT1), a Member of the Scavenger Receptor Cysteine-rich Superfamily*. Journal of Biological Chemistry, 277(35), 32109–32115.

Cherf, G. M., & Cochran, J. R. (2015). Applications of yeast surface display for protein engineering. Methods in Molecular Biology (Clifton, N.J.), 1319, 155.

Kelly, C. G., Younson, J. S., Hikmat, B. Y., Todryk, S. M., Czisch, M., Haris, P. I., Flindall, I. R., Newby, C., Mallet, A. I., Ma, J. K.-C., & Lehner, T. (1999). A synthetic peptide adhesion epitope as a novel antimicrobial agent. Nature Biotechnology, 17(1), 42–47.

Lee, M. E., DeLoache, W. C., Cervantes, B., & Dueber, J. E. (2015). A Highly Characterized Yeast Toolkit for Modular, Multipart Assembly. ACS Synthetic Biology, 4(9), 975–986.

Reichhardt, M. P., Loimaranta, V., Lea, S. M., & Johnson, S. (2020). Structures of SALSA/DMBT1 SRCR domains reveal the conserved ligand-binding mechanism of the ancient SRCR fold. Life Science Alliance, 3(4). Tao, A., & Raz, E. (2015). Allergy Bioinformatics.

Turenchalk, G. S., & Xu, T. (2001). Lats in Cell-cycle Regulation and Tumorigenesis BT - Encyclopedic Reference of Cancer. Encyclopedic Reference of Cancer, 491–496.

Wong, J. H., Alfatah, M., Sin, M. F., Sim, H. M., Verma, C. S., Lane, D. P., & Arumugam, P. (2017a). A yeast two-hybrid system for the screening and characterization of small-molecule inhibitors of protein–protein interactions identifies a novel putative Mdm2-binding site in p53. BMC Biology 2017 15:1, 15(1), 1–17.

Wong, J. H., Alfatah, M., Sin, M. F., Sim, H. M., Verma, C. S., Lane, D. P., & Arumugam, P. (2017b). A yeast two-hybrid system for the screening and characterization of small-molecule inhibitors of protein–protein interactions identifies a novel putative Mdm2-binding site in p53. BMC Biology 2017 15:1, 15(1), 1–17.

Supplementary figure 1. Multiple sequence alignment of SALSA SID sequences. The residues in green are accessible to the solvent and proteases.

Contact Us: