Team:IISER Mohali/Model




Our modelling focus is highlighted as shown. While our analysis is done only for MMP9 which we have used to demonstrate proof of concept, the analysis will be done for each biomarker in our panel.

We performed two crucial stages of modeling. This has helped us understand the following points:

  • Wet Lab (Experimental):
    1. Length of the substrate sequence corresponding to each biomarker
    2. Position at which substrate should be inserted
    3. MD simulation to see dynamics of cleavage

  • Hardware:
    1. Optical setup and arrangement

Wet Lab(Experimental):

Before we began the modeling for the wet lab work, we needed a crucial piece of information - the substrate sequence.

  • Determining substrate sequence-

    We used MEROPS to find the consensus substrate corresponding to our biomarker MMP9 (MEROPS ID: M10.004).
    The specificity matrix for MMP9 was -
    Protease recognition sites are always denoted by the following notation: P4-P3-P2-P1-P1’-P2’-P3’-P4’
    Where cleavage occurs between amino acids at positions P1-P1’. The numbers in each row indicate the frequency of a particular amino acid at various positions along the octapeptide as found in research articles surveyed by MEROPS.
    From the table, it is clear that some positions have a consensus amino acid, while few do not. Thus, the motif that we narrowed down was GP_GL_G_
    The reason we chose a motif instead of a known octapeptide was that we wanted to incorporate structural data into our eventual choice.

    Why MEROPS?
    MEROPS is updated monthly and contains comprehensive data about all protease families. Protease recognition sites are not putative, with protease promiscuity in recognizing sequences - a common feature. Thus, empirical data is the best indicator of probable substrate recognition sites.

    DOCKING - Part I
    Our first round of docking was to fill the gaps in the substrate sequence motif we got via MEROPS.

    On consulting other teams and professors (Dr. Monika Sharma, and Dr. Garima Jindal) well-versed in this matter, our computational team decided to use CABS-Docking to perform protein-protein docking. Further, we decided not to dock the motif with positional wildcards, but to explicitly use octapeptide sequences.
    We decided to dock MMP9 (PDB: 1L6J) with this potential list of candidate substrates.

    The results were interpreted using three factors -
    • Cluster Density
    • Average RMSD
    • Trajectory data

    Each substrate sequence that we ran through CABS, returned 10 clusters. Thus, to compare across different sequences, we chose the top cluster (Defined by Cluster Density) as representative of that sequence.

    Cluster Density:
    Cluster Density tells us how different models or binding poses within a single cluster are distributed. The higher the value, the more identical the sampled poses are within that cluster.

    Average RMSD:
    As a thumb rule, we only considered clusters that had an average RMSD of less than 3. Exactly one sequence had an average RMSD of less than 3.

    Trajectory Data:
    CABS also outputs ‘’Trajectory Data’- indicating the probable path of the binding between the protease and substrate. We chose the one with the least number of probable trajectories while also considering the above two parameters. Thus, giving us additional insight into how the protease would approach the substrate.
    Using the above approach, we concluded that the most suitable substrate sequence corresponding to MMP9 is: GPGGLGGA

    The corresponding graphs for trajectory data and probable binding mode for each sequence can be found here.

  • Determining position at which substrate is to be inserted

    3D Structure Prediction Now that we had confirmed our substrate sequence, it was time to work on building our main construct.
    Image: PyMol visualization of Encapsulin (PDB: 3DKT)

    Image: Encapsulin with all of it’s respective monomeric chains differently coloured

    As a first step, we investigated the Encapsulin PDB structure (PDB ID: 3DKT). This helped us identify the domains that would be most amenable for substrate insertion. In surveying literature, we found out that the E-loop is most surface accessible and flexible. It is also the least conserved across bacteria species. Positions 138-139 on the A-loop were also exposed on the surface of Encapsulin.
    Image: The E-Loop of Encapsulin

    The E-Loop spans residues 47-76 in the chain.
    Thus, we chose positions 64 and 71 on the E-loop as well as position 138 on the A-loop.

    Next, to ensure structure flexibility, we flanked our substrate sequence with poly-G linkers. Lastly, we incorporated a 6X-His tag in between the substrate and linker to ensure that the engineered construct could be purified.
    Our resulting construct was the following: GGGGGGGPGGLGGAHHHHHHGGGGG
    Then we generated 3D structures of the fusion protein constructs using I-TASSER.

    Insertion site: K138

    Image: Engineered construct as predicted by I-TASSER (Inserted substrate sequence shown in red)

    Next, we docked the engineered protein with the catalytic domain of MMP9.

    DOCKING - Part II
    The first docking established the substrate sequence, but it was not accurate as it was devoid of any additional interactions. Now that we had generated the engineered Encapsulin, it was time to see if it interacted with the biomarker - MMP9.

    Image: Docked Structure from ClusPro (Orange - engineered Encapsulin, Cyan - MMP9 Catalytic Domain)

    This was performed using ClusPro, as

    Finally, we used PyMol to explore the interactions between the docked molecules.
    The protocol to find the interactions in a docked pair of molecules can be found here.
    We are able to quantify the distances between Encapsulin and MMP9. (Interaction ~2.7 Angstroms). Thus, there is indeed interaction between the two molecules.

This completes the first part of our modelling.

Now we proceed to the Molecular Dynamics Simulation - to get dynamic interaction information.

  • MD Simulation to understand dynamics of interaction

    Molecular Dynamics simulations are employed to study the strength and properties of the protein- complexes and their conformational changes on an atomic level. Various parameters such as RMSD, RMSF, Radius of Gyration, and SASA were calculated throughout the simulation trajectory to give insights into the structure of the proteins. To illustrate the dynamics, and conformational stability of the protein-drug complexes, the protein-protein complexes were subjected to MD simulations for a period of 4ns. We were unable to do the long simulations due to a lack of computational power.

    Root Mean Square Deviation (RMSD)
    The Root Mean Square Deviation (RMSD) analysis is an important step towards measuring the stability of the Enc-MMP9 complex. A stable RMSD indicates that the binding of the complex does not cause any significant changes in the structure of the protein. We observed a jump in RMSD around 2 and 3.7 ns indicating that protein has not fully stabilized yet.
    The radius of gyration (Rg)
    The radius of gyration is a key parameter of the Enc-MMP9 complex that is used to study the folding properties and conformations of the bound complex. A comparatively high radius of gyration value indicates that a protein molecule is packed loosely while a lower radius of gyration value indicates a protein structure that is more compact. A more compact protein here indicates that the MMP9 has not significantly interfered with the folding mechanism of the protein.
    Root Mean Square Fluctuations (RMSF)
    Root Mean Square Fluctuations (RMSF) is a vital structural parameter that is used to quantify the flexibility and rigidity of the protein-protein complexes. Since the RMSF measures the deviations of residue from its initial position, it is also highly useful in exploring the conformational flexibility of the protein-drug complexes. The below plot shows that the middle part of the chain shows maximum flexibility.
    To better understand the solvent Hydrophobic and Hydrophilic behavior of the protein-drug complexes, solvent accessible surface area analysis (SASA) was performed. These results indicated that all the protein-protein complexes are well solvated after the binding of drug molecules.


  • Ray-tracing with COSMOL
    The trajectory the ray of light will follow helped us understand the optical setup. The simulation was performed with COMSOL, a finite-element analysis software.

    The entire protocol with results can be found here.


    1. Huang, H. Matrix Metalloproteinase-9 (MMP-9) as a Cancer Biomarker and MMP-9 Biosensors: Recent Advances. Sensors 2018, 18, 3249.
    2. Jain A, Kotimoole CN, Ghoshal S, Bakshi J, Chatterjee A, Prasad TSK, Pal A. Identification of potential salivary biomarker panels for oral squamous cell carcinoma. Sci Rep. 2021 Feb 9;11(1):3365. doi: 10.1038/s41598-021-82635-0. PMID: 33564003; PMCID: PMC7873065.

Get in touch