Team:SCAU-China/Model

MESEG

Model

Molecular Dynamics Simulation


Description

To maximize the ability of Chlamydomonas absorbing heavy metal ions, We tried creating efficient molecular magnets, a fusion protein, to catch and gather heavy metal ions from the environment through metallothionein binding and the Autophagy pathway of the cell. The fusion protein consists of heavy metal ion binding protein, autophagy related peptide, fluorescent protein and linker. We focused on the construction of fusion proteins, simulating their molecular dynamics, and speculating their mechanism of binding cadmium ions.



First Try

Our initial design of the fusion protein was in the order of mCherry-CC-SpMTL-ARR. SpMTL is a metallothionein-like protein from Sedum plumbizincicola which has been proved that it has strong ability binding cadmium ions. CC and ARR are vital regions in Autophagy related protein. mCherry is an enhanced fluorescent protein which could help observing the localization of the fusion protein in the cell.

However, from the results of WetLab, SpMTL in the fusion protein lost its function, which implicated that the structure of the fusion protein may having defects. As a metallothionein-like protein, SpMTL has classic features of common metallothionein, which is Cysteine abundant. The sulfhydryl of Cysteine gives SpMTL chelating cadmium ions. However, two sulfhydryl from different cysteines could be oxidized by oxidants and form a disulfide bond, leading to the defection of cadmium ions chelation. It could be one possible reason to the lost of function. Therefore, it is necessary to analyze and optimize its structure by using Molecular stimulation.



Optimization

Linker selection and protein structure 3D modeling

There are mainly 2 ways modeling protein 3D structure, which are respectively homology modeling, based on existed experimental results, and initial modeling, using AI modeling protein folding from the amino acid sequence. SpMTL is a recent studied protein, there is no recorded X-ray, NMR analysis results or homologous structure data so far. Therefore, we decide to use initial modeling to acquire 3D structure of SpMTL and the whole fusion protein. Then further analyze the reason of function losing and optimize the design.

We used trRosetta to construct the 3D model of mCherry-CC-SpMTL-ARR, then imported the model into Molproity to observe and highlighted the disulfide bond(Fig.1.1). It implicates that in this design, SpMTL may be trapped by CC and ARR, and its functional hydrosulfuryls decrease for the disulfide bond formation, consequently reduces the coordination efficiency with cadmium ion.

Figure 1.1: 3D structure (top) and disulfide bonds (bottom) of mCherry-CC-SpMTL-ARR


To tackle this problem, we tried importing linker to adjust the 3D structure of fusion protein, divide SpMTL, CC and ARR, keep their structure natural and function normal. Two linkers were used in our design, one was a rigid linker, EAAAK, and the other one is a random peptide, LRRRF. We then used trRosetta to predict their 3D structure(Fig.1.2), and delivered the design to WetLab to verify the function. The result are shown in the Engineering Part of Wiki, which indicates that Design SpMTL- LRRRF-CC-mCherry-ARR has a better performance, according to the survivability in Cadmium ion stress.

Figure 1.2: 3D structure of SpMTL-EAAAK-CC-mCherry-ARR and SpMTL-LRRRF-CC-mCherry-ARR


Molecular Dynamic Simulation

The modeling result from trRosetta is static, which may has differences to the variable structure in the actual environment. Therefore, we need to process molecular dynamics simulation to analyze the structure variation and obtain its lowest-free-energy structure. We used Gromacs to stimulate SpMTL-LRRRF-CC-mCherry-ARR in Regular dodecahedron water box, OPLS forcefield. SPC/ENVT water molecule model, 17 Na ions and 17 ions. We first perform 100ps NVT balance and 100ps NPT balance. After completing the above preparatory work, we conduct a 150ns stimulation.

Video: Contrail of SpMTL-LRRRF-CC-mCherry-ARR in water solution box


To evaluate the simulated conformation stability, Root mean square deviation (RMSD), a measure of protein structural drift, is often used. It indicates the sum of all atomic deviations of the conformation at a certain time and the target conformation.The formula is:


In the formula, N is the number of atoms, mi is the mass of atom i, xi is the coordinate vector of the target atom i, yi is the coordinate vector of the reference atom i, and M is the total mass.

Besides, The radius of gyration (Rg) is used to compare how various structural shapes will behave under compression along an axis.The formula is:


In the formula, M is the total mass, mi is the mass of the atom i, xi is the coordinate vector of the target atom i, and xc is the center of all atoms.

From the RMSD and Rg of the simulation, we could know that the fusion protein SpMTL-LRRRF-CC-mCherry-ARR will gradually and dynamically relax when it be with water molecules, domains which interact with cadmium ions could be exposed. In this way, the steric hindrance of cadmium ions entering the target can be reduced and the cysteine can fully chelate cadmium ions.

Figure 1.3: RMSD and Rg of SpMTL-LRRRF-mCherry-ARR in dynamic simulation


In addition to RMSD and Rg, another index of conformation stability is the number of Hydrogen bonds. Through the simulation, the number of hydrogen bonds in the complex varies from 245 to 300, with an average number of 272. This results showed that hydrogen bonding kept the affinity of SpMTL-LRRRF-CC-mCherry-ARR to form various helices, folds and curls stable (Fig.1.4). The stable structure provides an important guarantee for the fusion protein to maintain the form of chelated cadmium ions.

Figure 1.4: Hydrogen bonds number of SpMTL-LRRRF-mCherry-ARR in dynamic simulation


Principal Component Analysis (PCA) is suitable for solving the problem that it is difficult to intuitively analyze and simulate the trajectory of large protein system and obtain high dimensional data. We constructed the covariance matrix of residual movement trajectory of fusion protein (Fig.1.5). The color bar represents the correlation between the trend and direction of the relative motion of the abscissa residue and the ordinate residue. Blue represents positive correlation and red represents negative correlation.

Figure 1.5: Residue Cross Correlation of SpMTL-LRRRF-mCherry-ARR


By dividing the sum of the largest eigenvalues by the covariance matrix, we can get the eigenvector with high eigenvalues, also known as Principal component(PC).We calculate a series of PC values, denoted as "PC1, PC2..., PCi ", in order from largest to smallest. In general, we take two PCS as the two axes. Next, we projected the original trajectory onto the two coordinate axes according to the corresponding eigenvectors to obtain the scattered points with uneven density. The more dense the scatter, the higher the probability of conformation.

PC1, PC2 and PC3 has higher score describing the trajectory(Fig.1.6D). Scatters of PC1 and PC2 are densely distributed, and there are two clusters approximating linear and cluster respectively (Fig.1.6A).The scatter distribution of PC1 and PC3 is similar to the cluster with two similar positions in the previous group, but its scatter distribution is more dispersed than that in the previous group (Fig.1.6C). PC2 and PC3 have the most dispersed scatter distribution, and it is difficult to have obvious clustering (Fig.1.6B).

Figure 1.6: PCA of Fusion Protein


Gibbs free energy landscape can show the probability density distribution of various conformations in the direction of PC1 and PC2. Calculated by Boltzmann relationship, The formula is:


In the formula, K is the Boltzmann constant, T is the simulated temperature, and C is any constant.

Base on the Gibbs free energy landscape results, we obtain the Lowest energy configuration of SpMTL-LRRRF-mCherry-ARR. Compare to the configuration before dynamic simulation and other design (SpMTL-EAAAK-mCherry-ARR), we could see that in actual environment, SpMTL is well separated from region CC and ARR, making the ion binding site well exposed, which is consistent to the experimental results from WetLab.

Figure 1.7A: Gibbs Energy Landscape of SpMTL-LRRRF-mCherry-ARR. B, 3D structure of dynamic simulated SpMTL-LRRRF-mCherry-ARR. MIBP is in violet, Linker is in cyan, CC is in green, mCherry is in magenta and ARR is in red. Color implication is the same in C and D. C, 3D structure of SpMTL-LRRRF-mCherry-ARR before dynamic simulation. D, 3D structure of SpMTL-EAAAK-mCherry-ARR.


We also process the dynamic simulation of single SpMTL in the same way as a control. Root Mean Square Fluctuation (RMSF) can represent the flexibility of amino acid residues in proteins, and it is the index of time average fluctuation of single residues. The higher the RMSF value, the higher the flexibility of the protein. The formula is:


In the formula, T is the time, xt(tj) is the position of the atom i at time t, and xref is the average position of the atom.

By comparing RMSF of the two proteins (Fig.1.8), we could see that SpMTL in LRRRF fusion protein (0.45nm) is slightly higher than that in single SpMTL (0.3nm). It showed that the relative flexibility of SpMTL was increased by the addition of LRRRF, CC, mCherry and ARR to form complex.

Figure 1.8: Comparison of RMS fluctuation between single SpMTL and SpMTL in SpMTL-LRRRF-CC-mCherry-ARR fusion protein.


When a cysteine pair in the Spmtl forms a disulfide bond, the configuration stability may increase and lead to a decrease in RMSF. We hypothesized that the RMSF increase of SpMTL in fusion protein might be correlated with the addition of LRRRF. LRRRF may weaken the formation of disulfide bonds by cysteine in SpMTL to a certain extent, leading to a significant increase in the ability of LRRRF fusion protein to bind cadmium ions. However, at present, we still cannot conclude that disulfide bonds play a key role in the above RMSF results, but this conjecture could propose feasible ideas for further simulation in the future.



Prospections


We predicted the 3D structure of our designed fusion protein and using molecular dynamic simulation to obtain its lowest energy configuration. We also speculated and analyzed factors influencing the binding efficiency of heavy metal ions. Results of molecular simulation provided theoretical basis and practical guidance for our project. We will further proceed molecular simulation to rationally design the protein, and cooperate with WetLab to construct efficient molecular magnet.




Stress Model


Description

In the actual working situation, our Chlamydomonas will be in the device launched into the polluted water. Under the stress of limited space in the device and the heavy metal ion in waters, there must be a working limit. Beyond the limit, working efficiency will significantly decline. This limitation could be the recovery moment of the device. In this part of modeling, we use cell population (OD600) as criteria, aim of modeling the cell population dynamics and the autophagy pathways to figure out the working limit (i.e., device withdrawing moment).



Background: Molecule Magnet & Autophagy Pathway

The selective autophagy pathway of Chlamydomonas is mediated by receptors. In non-autophagy stage, Atg13 will be phosphorylated by TORC1 kinase, leads to decreased activity of Atg1 kinase. Under Nitrogen starvation, the activity of TORC1 kinase will be inhibited, which will benefit the dephosphorylation of Atg13. Then the dephosphorylated Atg13 will activate Atg1, and further induce Atg8 into lipidated Atg8-PE, indicating that the cell turns into autophagy stage from normal stage. By importing Atg related peptide into our fusion protein (the Molecule Magnet), under heavy metal ion stress or nutritional deficiency stress, the SpMTL-Cd2+ complex could be gathered into the vacuole through Atg pathway.

Figure 2.1: Molecule Magnet and Autophagy pathway



Criteria of Working Limit

The most accurate way to evaluate the working status of recombined cell is molecular scope measurement. However, considering the cost and difficulties of measurement in real time with our hardware, molecular measurement may not be an ideal option. Therefore, we need to look for a reliable criteria and its easier measuring method. Since autophagy has deep relevance to single cell living status and further influences the cell population, we therefore chose cell population as the criteria of working status. The way measuring cell population is accurate and feasible with simple device. Through measuring the OD600 by spectrometer, we could easily know the working and living status of a cell group, determine the moment launching and withdrawing devices.



Growth Curves of Engineered Cell

In order to investigate the effect of our fusion protein to cells, we transformed plasmids containing genes of different designs of fusion protein into yeasts and measured their OD600 (Fig.2.2 and Fig.2.3).

Figure 2.2:Line chart of the growth (OD600) of engineered yeast



Figure 2.3: Fitting lines of the growth (OD600) of engineered Saccharomyces


On the other hand, above recombined genes (CC and ARR were adjusted specifically for Chlamydomonas) were also recombined into Chlamydomonas to investigate the influence of our fusion protein on Chlamydomonas’s growth. Through experiments, we obtained the fitted growth curves of recombined Chlamydomonas (Fig.2.4).

Figure 2.4: Fitting plot of engineered Chlamydomonas with ‘logistics’



Stress model and working limit prediction

Our engineered Chlamydomonas culture will go through 3 stages in device, which are respectively normal stage, autophagy stage and absorption saturation stage. Chlamydomonas will first breed in nutrient medium. After breeding to a certain degree in normal stage, it will be launched into polluted water to work. Under the stress of heavy metal ion and nutrition deficiency, Chlamydomonas will turn into autophagy stage. Appropriate level of autophagy is conducive to cell survival, but excessive autophagy could induce programmed cell death and decrease cell population, which called absorption saturation stage. We use following model to describe the whole process.

Figure 2.5: Stage Transition Model


From t0 to t1, engineered Chlamydomonas grow in nutrient medium in limited space of device, which is in normal stage. The regenerate rate of Chlamydomonas is only related to the light intensity, which could be set as a constant λ1. G represents the relative density of engineered Chlamydomonas and the death rate is δ1G. At t1 moment, the relative density of Chlamydomonas will reach the maximum GM.


From t1 to t2: The growth and death rate of engineered Chlamydomonas are equal, so its relative density will keep at GM. At t2 moment, engineered Chlamydomonas could be launched into polluted waters for cadmium ion absorption, which is also the starting point of conversion from normal stage to autophagy stage.


From t2 to t3: There are three states of engineered Chlamydomonas: normal stage, autophagy stage and absorption saturation stage. Appropriate level of autophagy is conducive to cell survival. However, excessive autophagy can induce programmed cell death. X represents the relative density in normal stage, λ2 is the growth rate, δ2X is death rate, k1X is the rate of autophagy-stage entering. Y is the relative density in autophagy phase, δ3Y is death rate, k2Y is the rate of entering the absorption saturation stage. Z is the relative density in the absorption saturation stage and δ4Z is the death rate in absorption saturation stage. Engineered Chlamydomonas in autophagy stage and absorption saturation stage could not live and breed normally. t3 is the moment that all of engineered Chlamydomonas step into absorption saturation point, and it is also the moment of device withdrawing.


Whereafter, we used neural network to learn and figure out the internal relationship between various growing conditions and parameters in ODE model. We set growing conditions such as Cd2+ concentration, temperature, illumination intensity and so on as input layer, the five parameters (k1,k2234) as output layer. Through training with data set produced by WetLab, we could ascertain the parameters and accomplish the ODE model, finally apply the model in working limit prediction, guiding us withdraw the device. This part of work is still processing. We also shared this idea to team SJTang and helped them pushing forward their project. Our solution yielded positive results. For details, please refer to the Model part of SJTang.

Figure 2.6: Concept model of Chlamydomonas work recovery time; X aixs: Cd2+ concentration; Y aixs: Time(hours); Z aixs: OD600 of Chlamydomonas sample



Conclusions

Difficulties, high cost of real time molecular measurement and environmental variety has always been challenges in large scale applications. Through rational design, we use cell density (OD600), a population-level criteria to assess autophagy intensity, which is in molecular scope, to determine the working limit and withdrawing moment. Combining ODE and neural network, our model will have robustness and high tolerance to environmental variety, which assure our design work stably and efficiently in actual environment.




Mix Model


Description

Living and working status of our Chlamydomonas has deep relevance to the quality of actual waters. To maximize the working efficiency, figure out the launching and withdrawing moment of the device, we have to model the cell population dynamics and ion concentration of actual waters.

United with the Hardware group, our hardware will not only be equipped with biological workspace, but also Electrochemical sensor, propulsion system and Navigation Satellite System (GPS+BDS, BeiDou navigation satellite System). The device could produce space and protection for our Chlamydomonas to live and work in waters, meanwhile, it could also accomplish the function of environment monitoring, positioning and cruise. Under the guidance of a model of ion concentration in Nernst equation and RBF neural network, we could obtain an accurate metal ion concentration in a specific position in real time. Further, combine with the navigation of GPS+BDS and propulsion by motors, we could achieve whole water area survey and map the pollution distribution(Fig.3.1).

Figure 3.1: Mix Model



Electrochemical sensor system embedded with RBF neural network

In laboratory, we have set gradient Cadmium ion concentration environment to model the working limit under stress. However, the quality of actual waters is far more various and complicated. To ensure that our stress model works properly in the real environment, we have to measure the ion concentration of actual waters in real time as an input of our prior stress model. Here, we specially model the real time metal ion concentration basing on Electrochemical sensor and RBF neural network.

RBF neural network is a feedforward neural network constructed on the basis of function approximation theory, containing input layer, hidden layer and output layer. Compared with other neural networks, RBF neural network greatly accelerates the learning speed while avoiding local optimal solutions.

Our device is equipped with electrochemical sensors which could measure the equilibrium potential difference and absolute temperature of actual waters (Fig.3.2). Based on these data and Nernst equation, a rough concentration of metal ions could be calculated.

Figure 3.2: Schematic diagram of electrochemical sensor and structural diagram of the device.


As the water velocity, temperature, pH of actual waters changes, the measured rough value will fluctuate, the Nernst equation under complicated environment will be:


E is the sensor potential. E012,...ϑi) is the potential under the influence of various environmental factors. R is the gas constant. T is the absolute temperature in sewage. n is the number of electrons gained and lost. F is Faraday's constant. C is the ion concentration in sewage.

Due to the complication of environment and limitation of device and sensor, various environmental factors could not be directly quantified as input. However, we could use supervise learning and RBF neural network to make the model tolerance the disturbance of various factors while producing ideal results.

Following the standard Nernst equation, we could obtain a rough concentration z.



The sensor will be linked to an embedded system with RBF neural network (Fig.3.3), and the data from the sensor will be mapped to the hidden layer as the input vector.

Figure 3.3: Schematic diagram of RBF neural network in our Mix Model.


In the hidden layer, we take the Gaussian function as the activation function, use the distance between the concentration z in the input space and the center vector z corresponding to the p-th hidden layer node as the independent variable, and take the corresponding p-th radial as the width of the basis function. The equation is:


Make it form a P×P matrix:


And the output could be:


Thereinto, Ω is the weight vector, C is the output vector.

In this conversion, through training in the way that applying the sensor in gradient concentration of Cadmium ion water in laboratory to get the training data set and subsequent supervised learning, we could obtain and test the weighting vector, which would be used in actual work. We will further apply our device and model in actual environment, using accurate measurement as the validation set to assess the performance of our device and Mix Model.



Distribution mapping of heavy metal ion in whole water area

In a broad or flowing water area, distribution of heavy metal ions may be unbalanced. The deploy of electrochemical sensor, BDS+GPS and propulsion system makes the device have the ability of whole-water-area scope measuring, mapping pollution distribution and located the pollution source, further guide itself moving to high-pollution area to deal with pollution.

The water area will be divided into several equal hexagon subarea. Through remote controlling, multiple devices will be launched to each subarea and performing measurement. Coordinates from BDS+GPS and measurement data through RBF neural network will be sent to central server and be integrated, output a map of distribution. Here, we use some simulation ways to draw a distribution map which is the ideal outcome of our device working concept (Fig.3.4).

Figure 3.4: Simulative distribution map of Cadmium ions


Based on the Cd2+ distribution figure we obtain, and the Stage Transition Model, we can eventually achieve the map of Chlamydomonas work limit distribution, which is the basis of guiding the 'itinerary' of our device(Fig.3.5).

Figure 3.5: Simulative distribution map of Working Limited



Disfusion and Decay Model

In addition to mapping, we also modeling a spreading model of heavy metal ion in actual water, allowing the server estimating diffusion coefficient and attenuation coefficient of heavy metal ions.

We assume that C(x,y,h) is the concentration of heavy metal pollutants in the coordinate (x,y,h) at time t, ε,η,ξ are diffusion coefficients along x, y and h directions. We can randomly take a closed curve R, and the area enclosed is ψ, during the period from t to Δt. During this period, the inflow mass of heavy metal ion through D is M1. The equation is:



The concentration of heavy metal ions may be attenuated due to the factors such as absorption by water organisms and attachment to water soil. We assume that the attenuation coefficient is v, which will cause their mass decreasing to Mi. The equation is:


So the decreasing mass ΔM is:


Since ΔM is related to Δt, the equation could be:


Finally, we could obtain the following equation:


When the combined action of diffusion and attenuation is 0, the concentration of heavy metal ions will be in equilibrium. The depth of water h could be ignored. Assuming that heavy metal ions only diffuse along the x and y directions in water, the equation could be:


At last, according to x, y and C, using the Gibbs sampling method in the Monte Carlo MCMC algorithm, Markov chain, we could estimate the parameters ε and η.

We use the Gibbs sampling method in MCMC algorithm to estimate the parameters ε and η. The process is as follows:

  1. Stationary distribution input:input distribution π(ε,η), set state transitions value k1, samples value k2
  2. Random initialization: initialize state value ε(0) and η(0)
  3. for n=0 to k1+k2-1:
    (a) Obtain sample η from conditional distribution P(η|εn)
    (b) Obtain sample ε from conditional distribution P(ε|η(n+1))
  4. Sample set correspond to stationary distribution:

The brief process:




Perspection

Cooperated with hardware, our mix model gives the device perception and intelligence to sense metal ion concentration and initiatively cruise toward high-pollution area to catch Cadmium ions, significantly increase the ability and efficiency of our design. As the ultimate part of our modeling work, it greatly enlarges the application scenarios from environmental monitoring to protecting. Integrating models, bioengineering, hardware to an organic whole, our MESEG achieves being a veritable Molecule magnEt Super Environment Guard.




References

  1. Yan L, Chen J, and Zhang X. (2008). Research Progress in the Linker of Fusion Protein. Biotechnology, 92-94.
  2. Yu J, Sarra S, and Xu H.  (2016). Selection and Application of Linker Peptide in the Design of Fusion Protein. Pharmaceutical Biotechnology 23, 260-263.
  3. Charlebois, D. A., & Balazsi, G. (2019). Modeling cell population dynamics. In Silico Biol, 13(1-2), 21-39. doi:10.3233/ISB-180470.
  4. Jin, H., & Lei, J. (2014). A mathematical model of cell population dynamics with autophagy response to starvation. Math Biosci, 258, 1-10. doi:10.1016/j.mbs.2014.08.014.
  5. Jin, H., & Lei, J. (2016). A hybrid model of molecular regulation and population dynamics for yeast autophagy. J Theor Biol, 402, 45-53. doi:10.1016/j.jtbi.2016.04.019.