Modeling

Overview

Prediction of the protein structure to understand protein antibody interactions

In order to help us evaluate our design and the different steps in the DBTL cycle, we used some protein modeling tools. We first predicted some important biochemistry parameters (such as molecular weight and isoelectric point) to help us identify the protein during the purification experiments. Also, we decided to build hydrophobicity plots to verify if the chosen epitopes would be accessible at the protein surface and hence available for interactions with antibodies. Finally, we also performed a de novo 3D structure model, so we could better evaluate the epitopes disposed at the protein surface. Since we are using only linear B-cell epitopes, which can be recognized by antibodies in denatured antigens [1], we did not worry at first about conducting docking simulations to check if it could recognize Dengue antibodies. However, considering that more than 90% of the B-cell epitopes are conformational [1] and the fact that depending on the multi-epitope folding there is a possibility of discontinuous epitopes to be formed, we decided to run docking simulations with anti-Zika antibody, especially to verify potential cross-reactive spots. Check the details of our modeling approach below.

Modeling protein properties

Isoelectric point and molecular weight

For computing isoelectric point (pI) and molecular weight, we used the Expasy pI/Mw tool, available at: https://web.expasy.org/compute_pi/. The software calculates molecular weight by adding the average isotopic masses of amino acids and removing the average isotopic mass of the water molecules incorporated during the peptide bond [2,3]. To calculate the pI, the aminoacids pK values [4,5] were used. After the final design of both the DME-C and DME-BR proteins, the values shown in Table 1 were calculated.

Table 1 - Molecular weight and pI of both Ammit Proteins: DME-C and DME-BR
Protein	MW (kDa)	pI
r-DME-C	16.83	6.35
r-DME-BR	16.80	6.35

Hydrophobicity plot

As previously mentioned, we decided to build hydrophobicity plots in order to check for accessible regions in both proteins we have designed. Among the available scales for this analysis, the Kyte-Doolittle is widely used, especially for identifying surface-exposed regions. For this analysis, a window of residues is set, the average of all the hydrophobicity scores of the residues within the window is calculated and then assigned to the first aminoacid in the window. The program runs the window along the protein sequence and the hydrophobicity scores are plotted versus the residue position [4]. For evaluating surface exposed regions, shorter window sizes are usually preferred. Thus we used a 9 residue window, described by Kyte-Doolittle as ideal for predicting surface regions in a globular protein [4].

. Figure 1: Hydrophobicity plots for DME-C (A) and DME-BR (B). The results for both DME-C and DME-BR proteins are shown below (Fig 1)

In this scale, hydrophilic regions have a negative score. Therefore, by analyzing the score in Fig. 1 we can check for the presence of potentially exposed regions that might be accessible for the interaction with antibodies.

Modeling protein structure

DMPfold tool available at PSIPRED

For a better evaluation of exposed regions in the proteins, we decided to model their 3D structure. However, since the Ammit proteins are synthetic fusion polypeptides composed of many different fragments, we thought homologous modeling would not be the best approach. Therefore we decided to search for a reliable de novo software for making the 3D structure prediction. Among the different methods available, the DMPfold (DeepMetaPSICOV) contact predictor called our attention. The software is freely available at the PSIPRED server and uses a deep-learning algorithm for the prediction of inter-atomic distance bounds, hydrogen bond network, and torsion angles, using them as restraints for building the models [6]. The software is reported to have improved performance compared to other de novo methods such as CONFOLD2 and Rosetta for some proteins [6]. In addition, the many iterations done during modeling helps to refine and generate better predictions. According to the developers, it can be used for proteins up to 600 residues in length, since they used only proteins with a maximum chain length of 500 residues during the algorithm training. Since our proteins own 156 residues, we thought it could be appropriate for its modeling. Below are shown the 3D models built for the designed DME-C and DME-BR.
3D-models of DME-C and DME-BR proteins. In green envelope epitopes; in red NS1 epitopes; in blue NS3 epitopes. In yellow GPGPG linkers.

DME-C

DME-BR

Structure validation using Ramachandran plot

Prediction of the protein structure to understand protein antibody interactions of

For validating the our protein 3D-models we built their Ramachandran plots that show the torsional angles - phi and psi - of the residues present in a peptide. Due to steric hindrance, there are preferable and forbidden regions in this plot for different aminoacids [7]. So, in order to verify the quality of our model, we counted the percentage of residues in preferred regions. After uploading the DME-C and DME-BR sequences to the Ramachandran Plot Server available in https://zlab.umassmed.edu/bu/rama/, the following graphics were generated. As we can see, 87,2% of DME-C and 84,7% of DME-BR proteins are in preferred or highly preferred regions. Depending on the resolution of a protein structure, 70% to 90% [8] of residues are in preferred regions in quality structures, so we can conclude that our models are of acceptable quality.

Multi-epitope protein and antibody interaction

Explaining Cluspro

Since characterizing antibody-antigen interactions experimentally is really cumbersome and time demanding, computational alternatives such as molecular docking can really be helpful in order to better understand these complexes and guide the design of new biological molecules of interest. Among the different docking platforms, ClusPro is one of the few dedicated to protein-protein interactions. Our first contact with this software was through conversations with the Queens iGEM team, who was also working with antibody-antigen interactions and showed us the relevance of using this software. It is a web based server which performs the following steps for the modeling: rigid body docking by sampling billions of conformations, followed by root-mean-square deviation (RMSD) based clustering of the 1000 lowest energy structures generated in order to find the largest clusters that will represent the most likely models of the complex, and refinement of selected structures using energy minimization [9]. What made us interested in this server was its antibody mode, which, differently from the usual docking algorithms, considers the asymmetry between epitope and paratope when calculating the total potential energy of the complex antigen-antibody [9]. Besides, it enables using masks to hide non-complementarity determining regions (CDRs). CDRs are important regions in the antibody responsible for antigen recognition and specificity, so limiting the docking to those regions is crucial for improving the quality of the simulations.

Docking multi-epitope protein and antibody

After a search in the Protein Data Bank (PDB), we selected a structure associated with Zika and Dengue cross-reaction to run the docking analysis in ClusPro (Table 2). Since finding specific anti-dengue antibodies with elucidated structure was a bit challenging, we decided to test an antibody that is already described as being cross-reactive among other flaviviruses such as Zika, so we could use the docking data to evaluate epitopes that might cross-react.

Table 2 - Molecular weight and pI of both Ammit Proteins: DME-C and DME-BR
Antibody	Target	PDB entry code	Notes
anti-Zika Fab antibody Z004	Zika	5VIC	Structure complexed with DENV-1 Envelope protein DIII, so it is a cross-reacting antibody

It is important to establish a few restraints in order to limit the dockings conformations found by the server. After talking to the Queen’s team (Check the Collaboration Page for details), who was also working with antibody-antigen Docking and had more experience than us with the software, they suggested we define two important things for the simulation:

1- Firstly, to run the docking analysis in the antibody mode. ClusPro is a protein-protein interaction server, however, it has this dedicated mode that had been optimized for this type of docking. Usually, for protein-protein docking, it is considered a symmetric pairwise assumption for the calculation of the potential energy of the interacting protein forces. However, some studies have indicated that this is not a good approximation for evaluating antigen-antibody interactions, since the binding region is asymmetric (for example hydrophobic residues such as phenylalanine, tryptophan, and tyrosine frequently populate the paratope of the antibody but not the epitope of the antigen) [1]. This mode of Cluspro considers the interaction asymmetric for those calculations and some studies had indicated that this improved the accuracy of the analysis [1].

2- Secondly, to mask the non-CDRs, in order to limit the docking results to more relevant complex structures. We did not know exactly the CDR of the antibodies we used since the PDB structure was not annotated regarding this feature. However, since the PDB file used was actually a complex structure with an antigen (Domain III of Envelope protein), we analyzed it using Pymol to identify at least the antibody region that was not interacting with an antigen and use those regions as a mask for our simulation.

Since we only considered making docking simulations after our protein was synthesized, we used the Cluspro analysis to help us identify potential spots of cross-reactivity in case our protein showed this phenomenon during experiments. After docking, the software provides the top 30 complex models, ranked according to cluster size, as explained previously. From the models, we chose the one with the lowest energy score in order to evaluate the interactions, since a more negative complex energy can be correlated with higher stability. The chosen DME-C - Z004 model had an energy score higher (-294.5) than the chosen DME-BR complex (-301.2). In addition, when we evaluated the interacting residues using Pymol “Polar contacts tool” (which predicts hydrogen bonds between the residues), we verified that the DME-BR-Z004 complex had fewer interacting residues (8) than the DME-C complex (11).

Thus, DME-BR Envelope epitope ETLVTFKNPHAKKQDVVVLGS and NS3 epitope ILEENVEVEIWTKEGERKKL might be associated with cross-reactivity. While DME-C Envelope ETLVTFKNPHAKKQDVVVLGS, Ns1 PENLEYTIVITPHSGEEH, NS1 EHKYSWKS and Ns3 ILEENMEVEIWTREGEKKKL.

Considering that our experiments have shown that there is cross-reactivity of those proteins with anti-NS1 from Zika (check Results Page), the NS1 epitopes might be considered to be replaced in a new iteration from the DBTL described in the Engineering Success page, in order to build a more specific antigen for anti-dengue antibodies recognition.

References

1.Brenke R, Hall DR, Chuang G-Y, Comeau SR, Bohnuud T, Beglov D, Schueler-Furman O, Vajda S, Kozakov D. Application of asymmetric statistical potentials to antibody-protein docking. Bioinformatics. 2012 Oct; 28(20):2608-2614;
2.Gasteiger E., Hoogland C., Gattiker A., Duvaud S., Wilkins M.R., Appel R.D., Bairoch A.; Protein Identification and Analysis Tools on the ExPASy Server; (In) John M. Walker (ed): The Proteomics Protocols Handbook, Humana Press (2005). Full text - Copyright Humana Press.
3.Expasy is operated by the SIB Swiss Institute of Bioinformatic. Terms of Use. Compute Pi/MW [Internet]. Protein Identification and Analysis Tools in the ExPASy Server. [cited 2021Oct18]. Available from: https://web.expasy.org/compute_pi/pi_tool-doc.html
4.Kyte, J. and Doolittle, R. 1982. A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157: 105-132.
5.Bjellqvist, B.,Hughes, G.J., Pasquali, Ch., Paquet, N., Ravier, F., Sanchez, J.-Ch., Frutiger, S. & Hochstrasser, D.F. The focusing positions of polypeptides in immobilized pH gradients can be predicted from their amino acid sequences. Electrophoresis 1993, 14, 1023-1031
6.Greener JG, Kandathil SM, Jones DT. Deep learning extends de novo protein modelling coverage of genomes using iteratively predicted structural constraints. Nature Communications. 2019;10(1).
7.Ramachandran GN, Ramakrishnan C, Sasisekharan V. Stereochemistry of polypeptide chain configurations. Journal of Molecular Biology. 1963;7(1):95–9.
8.Elsliger M-A, Wilson IA. 1.8 structure validation and analysis. Comprehensive Biophysics. 2012;:116–35.
9.Kozakov D, Hall DR, Xia B, Porter KA, Padhorny D, Yueh C, et al. The ClusPro web server for protein–protein docking. Nature Protocols. 2017;12(2):255–78.

Team:Rio UFRJ Brazil/Model