Difference between revisions of "Team:CPU CHINA/MolecularModeling"

Revision as of 19:23, 21 October 2021

MOLECULAR MODELING

MOLECULAR DYNAMICS

To obtain a sound analysis of our key function enzyme, manganese peroxidase (MnP, PDB code: 3M5Q), which has a high-precision 3D structure and the potential to degrade PE, we must obtain the details of the dynamics. To achieve this goal, molecular dynamic (MD) simulations were utilized to analyze the stability, changes of the protein secondary structure, and salt bridge number of MnP. Therefore, we went series of molecular dynamics simulations for a better understanding of the protein characteristics.

A. RMSD & RMSF

1. RMSD

Root-mean-square deviation (RMSD) is the measure of the average distance between the atoms (usually the backbone atoms) of superimposed proteins. The RMSD of certain atoms in a molecule concerning a reference structure, $r_{ref}$ . RMSD can be calculated through this formula below:
$\operatorname{RMSD}(t)=\left[\frac{1}{M} \sum_{i=1}^{N} m_{i}\left|\mathbf{r}_{i}(t)-\mathbf{r}_{i}^{\mathrm{ref}}\right|^{2}\right]^{1 / 2}$
where $M=\Sigma_{i} m_{i}$ and $r_i (t)$ is the position of atom $i$ at time $t$ after least square fitting the structure to the reference structure.

In our molecular dynamics simulation, the RMSD value was used for revealing the structure stability of wtMnP protein, and such calculation was undergone in Gromacs^[1]. Through the analysis of the track file of MD simulation, we acquired the RMSD curve (Fig. 1) and it revealed the change of protein backbone stability in four temperature gradients of 298.15 K, 313.15 K, 338.15 K, and 400 K. The meaning of a higher RMSD value is that the trajectory has a higher degree of deviation compared to the reference molecule with the protein molecule in the original state. This means that the protein may undergo a significant conformational change at this temperature.

Fig. 1 RMSD curve. Such curve is about the average RMSD of every residue changing with simulation time.

2. RMSF & B-factor

Besides RMSD, root mean square fluctuation (RMSF) was also utilized for molecular dynamics. RMSF is a measure of the deviation between the position of particle i and some reference position. RMSF value can be calculated as:
$\mathrm{RMSF}_{i}=\left[\frac{1}{T} \sum_{t_{j}=1}^{T}\left|\mathbf{r}_{i}\left(t_{j}\right)-\mathbf{r}_{i}^{\mathrm{ref}}\right|^{2}\right]^{1 / 2}$
where $T$ is the time over which one wants to average and $r_i^{ref}$ is the reference position of particle $i$ . This reference position will be the time-averaged position of the same particle $i$ .

Based on the RMSF curve acquired from classic MD simulations, we could measure the sensitivity of different residues towards temperatures.

Through the analysis of the track file of MD simulation, we obtained the RMSF plot (Fig. 2) under four reaction temperatures, and it revealed that mainly No. 228-233 residues were almost sensitive to temperature, thus facilitating the later construction of a single mutation library. Meanwhile, we compared the RMSF^[3] library curve with the B-factor plot (Fig. 3) from the R package Bio3d, and the filtrated range of residues was almost the same.This indicates that when protein crystals are heated by X-ray irradiation, our model is consistent with the actual thermal movement level of amino acid residues.

Fig. 2 RMSF curve. This curve interfaces with every residue at different temperatures.

Fig. 3 MnP B Factor plot. This curve reflects the thermal stability of each amino acid of the protein, wtMnP.

B. Secondary structure, salt bridge alterations

Ramachandran plot is a way to visualize energetically allowed regions for backbone dihedral angles ψ against φ of amino acid residues in protein structure, and thus reflects the secondary structure of the protein. The white area shown in the dynamic Ramachandran plot we obtained shrank with the increase of temperature (Fig. 4), which represented the disruption^[3]of MnP’s structure.

Fig. 4 Ramachandran plot. In our simulation, the shrinking area (white part) represents the disruption of the secondary structure.

Besides, secondary structure numbers decreased, β-sheet part was severely damaged (Fig. 5), and the number of salt bridges increased with the rise of temperature. The above three indices reflected the disruption of protein structural stability caused by the increase of temperature.

Fig. 5 MnP secondary structure changes with temperature.

C. Gyration Analysis

Gyrates show the variation of the radius of gyration of protein ( $R_g$ ) over time, the radius of gyration is a physical quantity that describes the compactness of the protein structure. The smaller the value, the greater the compactness of MnP, and the greater the expansion of the system. We could be found from the figure that as the temperature increased, the curve describing the radius of gyration (unit: Å) shifted upward and the amplitude increased, suggesting that at high temperatures, the protein structure will become looser and it is difficult to have a stable conformation.

Fig. 6 Radius of gyration ( $R_g$ ) of MnP protein.

D. Hydrogen Bond Analysis

Protein hydrogen bonding is a very important non-covalent structural force, and an important factor affecting protein stability. With the change of temperature, the structure of MnP and the number and existence of hydrogen bonds will change as well.

As can be seen from the picture below, the high temperature above 120 ℃ caused the structure of the protein to change, the number of hydrogen bonds decreased, and the stability of MnP was reduced.

Fig. 7 Hydrogen Bonds of MnP in the dynamic simulation

E. Solvent Accessibility

Solvent accessibility surface area (SASA) usually reflects the relevant sites where the enzyme can bind. Therefore as we simulated the temperature around the protein (Fig. 8), we found that the value and trend of SASA were more obvious as the temperature rised. If the value of SASA is larger, it means that the solvent accessible surface area is larger, which is more conducive to the related docking of the ligande. While the average value of SASA reached the highest value around 45℃-65℃ (seen in Parts http://parts.igem.org/Part:BBa_K3853014). It was possible that the enzyme activity would decrease in this temperature range, but the number of binding sites would increase. However as the subsequent temperature continued to rise, the area of SASA would decrease instead. So the enzyme would lose stability and damage the internal structure at higher temperatures.

Fig. 8 SASA trend of the wtMnP in the dynamic simulation.

SINGLE POINT MUTATION

A. Mutant Library Construct

We wrote PLMC module to explore the ectopic dominance of mutated MnP, and to plot the distance matrix of amino acid residues between proteins (Fig. 9). Based on the MnP sequence alignment with BLAST, we did 16 cores calculation in parallel, and the results were essentially proportional to the single point free energy produced by FoldX^[4].

Fig. 9 MnP PLMC module analysis and the distance matrix of amino acid residues.

B. Optimization & ML Analysis

1. DL & ML analysis

With Turi Create applied, we trained the data and got the predicted^[5] “scaled_effect” of MnP (Fig. 10). After aligning the single mutant sequences and adjusted parameters, we utilized support vector machines (SVM), K-Nearest Neighbour (KNN), and Random Forest^[6] to forecast “scaled_effect”, and the Spearman coefficients were accordingly 0.86, 0.88, and 0.92, which proved the result of “scaled_effect” after Turi Create was fine.

Fig. 10 Scaled_effect. The scatter points in the figure are composed of the first principal component of PCA and the second principal component of PCA. Variant effect’s color reflects the confidence of scaled_effect.

2. Rosetta

Because the data obtained by FoldX lacked the accuracy to a certain extent, we used Rosetta^[7,8] to correct the mutation free energy of mutation points (Table.1), so that the Spearman correlation coefficient of the data increased to 0.752, which was more accurate. Meanwhile, 16 cores calculation was used to establish a more comprehensive multi-function mutation library^[9] of enzymes in parallel.

Table.1 Calculated amino acid residue sets at available mutation sites under different ΔΔG values.

Residue No.	35	39	177
ΔΔG <0.5	E	E	A, H, L, M, R, T, Y
0.5<ΔΔG <2.0	E	E	A, H, L, M, R, T, Y, K, S
2.0<ΔΔG <2.5	E, D	E	A, H, L, M, R, T, Y, Q
2.5<ΔΔG <3.0	E	E, D	A, H, L, M, R, T, Y

MNP PCA ANALYSIS

A. Instruction

PCA method, as known as principle component analysis, was once again employed here for MnP after core identification and subsequent superposition, and it was used to examine the relationship between different structures based on their equivalent residues. We applied PCA to both experimental structures and molecular dynamics trajectories, and PCA provided us with considerable insight into the nature of conformational differences, which could be seen in our molecular dynamics trajectory analysis.

B. Method

In PCA, the resulting principal components (presented in the way of orthogonal eigenvectors) describe the axes of maximal variance of the distribution of structures. The projection of the distribution of the space defined by the largest principal components in the model onto results in a lower dimensional representation of the structural dataset. Meanwhile, the percentage of the total mean square displacement (or variance) of atom positional fluctuations captured in each dimension is characterized by their corresponding eigenvalue.

Previous experiences suggest that 3~5 dimensions are often sufficient to capture >70% of the total variance in a given family of structures. Thus, a handful of principal components are sufficient to provide a useful description while still retaining most of the variance in the original distribution.

C. Modeling & Results

In our model, we mainly selected MnP protein (PDB: 3M5Q) as the main analysis object and carried out the following series of analyses.

1. Sequence comparison

First, we obtained the structure file of wtMnP in R and performed our sequence comparison and BLAST search. We examined the alignment scores and their corresponding $E$ -values, which indicated a sensible normalized score ( $-\log{E}$ ) cutoff of 240 bits.

Fig. 11 Summary of BLAST results for query wtMnP against the PDB chain database.

The similar structures obtained by our search were mainly PDB: 1MN1, 1MNP, 2BOQ, 2VKA, 2W23, 3FJW, 3FKG, etc (ranked by BLAST scores). A detailed comparison of homologous protein structures could be acquired to infer pathways for evolutionary adaptation. Conventional structural superposition of proteins minimized the RMSD between their full set of equivalent residues. Fig. 12 below is the identification of core residues, and in Fig. 13 we presented the histogram of wtMnP RMSD values.

Fig. 12 Histogram for wtMnP core residues identification.

Fig. 13 Histogram of RMSD among collected structures.

Next, the conformation of a polypeptide or protein chain could be usefully described in terms of angles of internal rotation around its constituent bonds, as known as static Ramachandran plot (Fig. 14).

Fig. 14 Static Ramachandran plot of wtMnP protein.

The conformation of a polypeptide or nucleotide chain can be usefully described in terms of angles of internal rotation around its constituent bonds. Therefore we also calculated the difference distance matrix (Fig. 15) by subtracting one distance matrix from another, and the main target of this matrix was wtMnP.

Fig. 15 Difference of distance matrices between structures in 1MN1 and 1MNP nucleotide states utilizing wtMnP as the main target.

Then in the next step, we simulated different conformers of wtMnP and highlighted the positions which were responsible for the major differences between MnP structure and enable the interpretation and characterization of the relationships between multiple conformers.

All the work above gave us the foundation for our subsequent analysis. A long dynamic simulation was required in the traditional simulation method, and therefore a static conformational comparison was used here above. Then we carried out related research on the structure of MnP. The clustering structure in the PC space can usually cast people's attention to the relationship between the individual structures in terms of the displacement of the main structure and has a controllable dynamic level of details. By clustering along with PC 1 and 2 (Fig. 16B), we could study how the X-ray structure of MnP was related to major conformational changes covering more than 65% of structural variations. This was able to reveal functional relationships that were often difficult to find through traditional pairwise methods. For example, in the PC1-PC2 plane, the inactive structures were further divided into two subgroups. The lower right subgroup specifically corresponded to structures that had been compounded. We also examined the contribution of each residue to the first three principal components (Fig. 16C).

Fig. 16 Main PCA results of enzyme MnP. A. Overview of PCA results for MnP crystallographic structures. B. Results of setting pc.axes=1:2 in plot.pca() function. C. Contribution of each residue to the first three principal components

CONSTANT PH SIMULATION

A. $pK_a$ & Protonation

In order to perform constant pH simulation, we mainly selected four categories of amino acids for titration analysis in simulation. Intending to further explore the structural stability of MnP protein under different pH environments, we categorzied the pH environment used in the experiment into five different gradients of 3, 4, 5, 6, and 7 for investigation. Because we were only titrating the acidic residues, so the only re-namings we made in wtMnP protein were the ASP->AS4, GLU->GL4, and HIS->HIP substitutions in leaprc.conspH force field in Amber. Furthermore, wtMnP contains 4 disulfide bonds. Therefore, we must change all CYS residues into CYX so that tleap or xleap chooses the correct residue template for Cysteines linked via disulfide bonds.

After the simulation, we will get the result below in Fig. 17. This result revealed the $pK_a$ protonation states of the amino acids we chose in different pH scenarios.

Fig. 17 $pK_a$ & protonation states of each amino acid in our pH gradient. The amino acids we chose were Asp 5, Glu 26, Glu 32, Glu 39, Asp 84, Asp 85, Asp 138, Glu 143, Asp 146, & Glu 156.

B. RMSPH

RMSPH is the RMSD in different pH environments obtained after we used the leaprc.constpH force field in Amber for simulation. According to the obtained RMSPH diagram (Fig.18), the structural stability of the protein was highly similar under different pH environments, so we planned to increase the number of simulation steps in the subsequent process to further establish the RMSD values under each pH gradient. Our RMSPH plot revealed that different pH values (pH = 3~7) had relatively low impact on the whole protein structure. (Of course, due to the time scale of the simulation being small, there was still a certain degree of inaccuracy.）

Fig. 18 RMSPH curve about MnP under different pH gradients.

QUANTUM CHEMISTRY

A. Chelate Optimization & Transition State Conjecture

We employed quantum chemistry calculation in our subsequent model. Since the hexa-coordinating ligand of trivalent manganese ion played the main degrading role in the system, we predicted its structure (Fig. 19A) here, and we optimized its structure (Fig. 19B) through energy optimization curve (Fig. 20). The ligands we applied in the chelate were based on our conjecture: three oxalates ( $3 \times {C_2 O_4}^{-2}$ ).

Fig. 19 Structure of our chelate, initial (left) & optimized (right).

Fig. 20 Energy optimization curve of the predicted chelate.

Based on it, use the following parameters of basis set functional: opt freq b2plypd3 empiricaldispersion=gd3bj jun-cc-pvtz for structural optimization and frequency calculation which were covered in the final calculation. Later, we considered referring to the first principles and transition state search methods since the product wasn't further detected due to time constraints. Hence we would use the QST2 method in the future. We should consider guessing & calculating the transition state reflecting previous experiences to better our initial structure for achieving the purpose of product estimation.

B. Rosetta for Stability Improvement

Rosetta SuperCharge supports two strategies to design high static electricity and protein: Rosetta supercharge (Rsc) and AvNAPSA supercharge (Asc). Here we mainly employed Rsc algorithm, which didn't completely ignore the evaluation of electrostatic, hydrogen bonding, and hydrophobic interactions. However, this algorithm also has a disadvantage: more amino acids need to be replaced to achieve the target net charge effect in this method. Finally, we analyzed the following specific content based on the output data, the final net charge number of the protein, which amino acids had been changed in design, the total number of mutations, and the free energy distribution of point mutations for each amino acid

Next, we also used PyMOL to visualize the output protein structures with different numbers of electrostatic charges. Our results are shown in Fig. 21 (Cg = charge).

Fig. 21 Electrostatic charge plots of three output protein structures. Cg:-25, Cg:normal, and Cg:21 respectively.

From this, we could know that with the increase in the number of net charges on the surface of wtMnP, the structural stability of its protein tended to a better conformation. In the above process, it was necessary to design a combination of mutations to affect our protein. Besides, the net charge stability was modified. In the process of simulation, we also restricted the mutation of the key amino acid sites of the active center to prevent the enzyme activity from being affected.

C. Hydrophobic Core Stability Optimization

We applied Rosetta Holes to search for the hydrophobic voids in the interior of wtMnP protein and determined the compositions of amino acids around the void. Next, we applied single point mutations of these amino acids into hydrophobic amino acids. Some potential point mutations in this round were further subjected to Relax-optimized structure in Rosetta software, so the $\Delta \Delta$ G value of single point mutation was calculated and the best one was selected from these point mutations. Finally, the next round of iteration was carried out on the basis of this mutation. The above iterations continued until no gap was found or there was no better point mutation site. Our main purpose was to design the hydrophobic core (Fig. 22) in the protein so that it could achieve a stable structure to a certain extent and reduce the hydrophobic voids in it.

Fig. 22 Hydrophobic surface about wtMnP, PDB: 3M5Q.

RosettaVIP performed a total of multiple rounds of iterations on the MnP structure we input (wtMnP): each round listed the candidate mutation types and the ddEgoe value, and we could also use the data to roughly judge the impact of mutations on stability (Table.2).

Table.2 The ddEgoe calculation of every point mutation after first round literation.

After the final round of Relax, the program analyzed that 1, 19, 75, 91, 92, 141, 201, 226, 237, and 299 mutations had improved structural stability. The stability of the 18 $^{th}$ PRO mutation into LEU was more obvious, and this mutation was selected. In the second round of iterative screening, based on the mutations used in the first round, the point mutations that could reduce the hydrophobic gap were re-analyzed. Based on the new structure, the location of the mutations and the energy contribution had all changed. In this way, we established the combination of mutations that improve the hydrophobic core of the protein.

Comparison of the final results: after VIP design, the internal hydrophobic voids are significantly reduced (Fig. 23)

Fig. 23 Changes in the hydrophobic space of the MnP before & after point mutation. A. wtMnP, B. MnP after mutation.

Reference

[1] Berendsen, H. J. C., Vanderspoel, D. & Vandrunen, R. Gromacs - A Message-Passing Parallel Molecular-Dynamics Implementation. Computer Physics Communications 91, 43-56.

[2] Sutherland, G. R. & Aust, S. D. The effects of calcium on the thermal stability and activity of manganese peroxidase. Arch Biochem Biophys 332, 128-134 (1996).

[3] Humphrey, W., Dalke, A. & Schulten, K. VMD: visual molecular dynamics. J Mol Graph 14 (1996).

[4] Delgado, J., Radusky, L. G., Cianferoni, D. & Serrano, L. FoldX 5.0: working with RNA, small molecules and a new graphical interface. Bioinformatics 35, 4168-4169.

[5] Goldman, M. & Pruitt, L. Comparison of the effects of gamma radiation and low temperature hydrogen peroxide gas plasma sterilization on the molecular structure, fatigue resistance, and wear behavior of UHMWPE. J Biomed Mater Res 40, 378-384 (1998).

[6] Negi, S. S., Goldblum, R. M., Braun, W. & Midoro-Horiuti, T. Design of peptides with high affinity binding to a monoclonal antibody as a basis for immunotherapy. Peptides 145, 170628 (2021).

[7] Fleishman, S. J. et al. RosettaScripts: a scripting language interface to the Rosetta macromolecular modeling suite. PLoS One 6, e20161 (2011).

[8] Rohl, C. A., Strauss, C. E. M., Misura, K. M. S. & Baker, D. Protein structure prediction using Rosetta. Methods Enzymol 383, 66-93 (2004).

[9] Ding, Y., Cui, K., Guo, Z., Cui, M. & Chen, Y. Manganese peroxidase mediated oxidation of sulfamethoxazole: Integrating the computational analysis to reveal the reaction kinetics, mechanistic insights, and oxidation pathway. J Hazard Mater 415, 125719 (2021).

@@ Line 958: / Line 958: @@
                          area (white part) represents the
                          disruption of the secondary structure.</em></p>
-                 <p>Besides, secondary structure numbers decreased, <strong><span class="katex"><span
+                 <p>Besides, secondary structure numbers decreased, <strong>β-sheet
-                                class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML">
-                                    <semantics>
-                                        <mrow>
-                                            <mi>&#x3B2;</mi>
-                                        </mrow>
-                                        <annotation encoding="application/x-tex">\beta</annotation>
-                                    </semantics>
-                                </math></span><span class="katex-html" aria-hidden="true"><span class="base"><span
-                                        class="strut"
-                                        style="height:0.8888799999999999em;vertical-align:-0.19444em;"></span><span
-                                        class="mord mathnormal"
-                                        style="margin-right:0.05278em;">&#x3B2;</span></span></span></span>-sheet
                          part</strong> was severely
                      damaged (<strong>Fig. 5</strong>), and the number of salt bridges increased with the rise of
@@ Line 1,060: / Line 1,048: @@
                      it means that
                      the solvent accessible surface area is larger, which is more conducive to the related docking of the
-                     ligande. While the average value of SASA reached the highest value around 405℃-650℃ (seen in <em><strong>Parts</strong></em> http://parts.igem.org/Part:BBa_K3853014). It was possible that the enzyme activity would decrease  in this temperature range, but the number of binding sites would increase. However as the subsequent temperature continued to rise, the area of SASA would decrease instead. So the enzyme would <strong>lose
+                     ligande. While the average value of SASA reached the highest value around 45℃-65℃ (seen in <em><strong>Parts</strong></em> http://parts.igem.org/Part:BBa_K3853014). It was possible that the enzyme activity would decrease  in this temperature range, but the number of binding sites would increase. However as the subsequent temperature continued to rise, the area of SASA would decrease instead. So the enzyme would <strong>lose
                          stability and damage
                          the internal structure at higher temperatures.</strong></p>