Time and finances are quite precious, so we try to save as much of them as possible. Modeling also helps us to do this. Thanks to modelations, we can get estimates of what is still realistic or already a utopia, and it is not suitable to put it into practice. Modeling tells us what might work and what might be a weakness or go wrong. This year we have been working on modeling our gene A and GFP (Green fluorescent protein) and you have to admit that it looks really interesting.
Figure 1: 1D structure of GFP protein
GFP is a fluorescent protein with a green color, which is inserted into the gene that we want to confirm or refute expression in a given organism under certain conditions. In our case, we used 2 modifications of the protein. In the first one, we added a tag and a linker before the base sequence. In the second, we attached a degradation tag to the end of the sequence. The end degradation tag is a tag that is added to the end of the protein to destroy these proteins quickly. The start tag is used, as we would say in IT, to address the beginning of the translation of a given protein. Since the tag sequence is not part of the protein itself, there is a legitimate concern that it could negatively affect the formation of the GFP tertiary structure. Therefore, our solution includes a linker that keeps the tag at a distance from the rest of the sequence.
The process of simulating molecular processes is very demanding and it is not worth, as we would say in the IT world, reinventing the wheel. That's why tools such as Amber, Yasara, and CHARMM have been developed - to make our work with simulations easier. A common characteristic of the previously mentioned tools is that they have been developed by large teams of experienced subject matter experts, making them well-tuned and very reliable.
But not everything is as rosy as it seems, and working with these tools brings some pitfalls. One of the complications of these softwares is their configuration and subsequent launch, where we are forced to follow a complex procedure consisting of several lengthy steps. These steps include:
setting up the simulation environment
loading the necessary input files
balancing the simulation envelope (box)
running the simulation itself
analyzing the results
In each step, several parameters are set that are important and can significantly affect the result, and therefore the simulation is very sensitive to them. Inappropriate choice of parameters can cause the simulation to stop or produce bad results.
At the time of researching these tools, the Jupiter Notebook was created (fortunately for us :D ), which made deploying simulations much easier. It moved the whole overhead to the Google Colab environment, so it didn't even burden our machines. Google Colab is a platform that Google set up from decommissioned servers and made its computing power available to the general public for scientific and educational purposes. When a user joins the platform, they are assigned a single computer, which usually has a graphics card. This means that when running highly demanding calculations (which is what molecular simulations are), it runs much faster. You can read more about the laptop that allows running molecular dynamics here.
The simulation offers many insights into the system under study. They are very complex and it is a superhuman task for a novice student in this field to fully master and understand them in the timeframe of creating a project for iGEM. It is important to note that it is not easy to capture all the parameters and therefore we cannot blindly trust the results of these tools, but always have a healthy doubt about them.
Design of experiments
Since the iGEM competition focuses on the creation of completely new genetic constructs, we, like probably every team, came across a completely new sequence that we wanted to explore in more detail. And what better way to do that than with the tools mentioned above? The first and only information we had was a 1D amino acid sequence that was designed by our BIO team. Since proteins occupy a 3D conformation that assigns them their unique properties in the organism, it is more than useful to know it. For the purpose of creating a 3D structure from 1D, the AlphaFold tool is used, the deployment of which is enabled by the previously mentioned Jupyter notebook. AlphaFold was developed by the DeepMind team and is considered to be one of the top-rated tools for 3D structure prediction. Its success rate can rival even laboratory experiments.
We would like to apply the Amber tool to the resulting 3D representation (in the form of a PDB file), which, according to the parameters given, created a trajectory of the events taking place in the system. Almost anything that is needed can be applied to the resulting trajectory. In our case, we were thinking about investigating the transport of substances into our BMC. For this analysis, we would use the Caver and CaverDock tools from Loschmidt Laboratories. The Caver tool is used to calculate protein tunnels and channels through which ligands can be transported. The CaverDock program is used to calculate the trajectory of a given ligand through the tunnels or channels found.
At the beginning of our work, we focused on modeling the core part of the project - the BMC. We wanted to consult the results of our experiments with the laboratories, which, unfortunately, did not happen because the delivery of the proposed sequence was delayed.
Next, we investigated the structures of GFP. In the course of the laboratory preparations, four variants of sequences working with this protein were generated (in the following sequences the highlighted parts correspond to tags/linkers/degradation sequences).
GFP with routing tag and linker
GFP with degradation tag
mScarlet with degradation tag
cI represor with degradation tag
The goal of the first experiment with the tag and linker sequence was to test whether adding a tag would affect the conformation of GFP, as mentioned above. We modeled the structure using the AlphaFold tool. In the first part of the experiment, we modeled the GFP without adding the linker and tag. In the second part, we repeated the experiment, but in this case with a sequence that already included the tag and linker.
Comparing the models of the resulting structures, it turned out that our fears were unnecessary and that the addition of the tag should not affect the properties of the protein.
Figure 2: Comparison of original structure of GFP (green) and GFP with routing tag and linker (blue). The tag and linker are highlighted in the red color.
In other experiments, we worked with sequences with a degradation tag, which could also potentially cause problems in the formation of the GFP structure. Also, in this case, we performed an alignment of the modeled structure with and without the degradation tag.
Again, the addition of the tag was shown that it has no effect on the resulting conformation.
Unexpectedly, the expression of the protein with the degrader tag turned out to be greatly attenuated in the labs, as described in the results
Figure 3: Comparison of original structure of GFP (green) and GFP with degradation tag (blue). The added tag is highlighted in the red color.
Result of experiment with mScarlet sequence with degradation tag:
Figure 4: Comparison of original structure of mScarlet (green) and mScarlet with degradation tag (blue). The added tag is highlighted in the red color.
Result of experiment with cI represor with degradation tag:
Figure 5: Comparison of original structure of cI represor (green) and cI represor with degradation tag (blue). The added tag is highlighted in the red color.
A word at the end
As you may have noticed, molecular dynamics are very challenging and require a great deal of time and knowledge of the subject matter. They can make our work very easy and offer a basic overview and graphical representation of our problem, but they cannot be relied upon to solve our entire project with almost no lab work.