While the general behaviour of heat shock proteins is understood, there is much we don’t know about the structure and physics of the interactions that enable them to dynamically associate and dissociate between oligomers and subunits as they bind to denatured proteins. For this reason, we performed some structural prediction for our heat shock protein of interest, HSP22E, in order to better determine what kind of oligomers it might form and hence provide some insight into its mechanism of action.
The structure of HSP22E, which comes from the thermotolerant microalgae C. reinhardtii, has not yet been thoroughly characterised. Hence, our approach involved starting from the gene sequence, using this to predict a monomeric unit or dimeric unit, investigating how these subunits might form a larger oligomer, and then modelling its interaction with denatured proteins.
Choices, Choices - selecting a sequence
In 2020, the Phase One Team identified the sequence for the small heat-shock proteins HSP22E and HSP22F from C. reinhardtii, our candidates for synthetically alleviating heat stress in coral-symbiont algae. These two HSPs are known to form a heterodimer (Rütgers et al., 2017), but preliminary experiments in the 2020 lab suggested that HSP22E was also able to form a homodimer. This is not uncommon in HSPs since they often have high sequence similarity and are predicted to be found in so many forms due to gene duplication events (de Jong et al., 1998). For example, the oligomeric form of HSP21 (a known chloroplastic HSP) has two distinct geometries, a “T” tetrahedral dodecamer constructed of six homodimers (Yu et al., 2021) and a “D3” dihedral dodecamer of six heterodimers (Rutsdottir et al., 2017), both of which are functional. With this in mind, our Phase Two team chose to work with only HSP22E for simplicity.
The First Hurdle - cellular localisation
The Phase One Team also attempted some structural modelling of the selected HSPs, but they ran into some significant clashes in their model - there were just too many atoms to fit in the space! In collaboration with them, the Phase Two team took a closer look and found that the protein sequences provided contained a proposed transit peptide motif. This is a “tag” sequence that labels the protein for transport to the chloroplast: the organelle that produces ROS when under heat stress, and where HSP22E acts. Now the question was, does this tag stay attached to the protein, or is it snipped off when it reaches the place it is meant to be? After some research we found that the process of transporting such proteins across a plastid membrane (that is, into a mitochondrion or chloroplast) involves cleaving off the transit peptide motif (Kunze and Berger, 2015). We performed a ClustalOmega Multiple Sequence Alignment (MSA) of the transit peptide motif from HSP22E with other known plastid localisation motifs (Holbrook et al., 2016), and received a good alignment. In addition, performing pBLAST of the full HSP22E sequence against the PDB database produced no more hits than a pBLAST with just the “mature” sequence, as no alignments were produced in the transit peptide motif region. Hence we decided to focus our attention on a mature form of the sequence that does not contain the transit peptide.
In Perspective - comparison to known sHSPs
As a quick sanity check, we performed a simple PSI-BLAST search against the PDB database to compare our proposed sequence against homologs with known structure. Below is a T-COFFEE MSA of HSP22E (last row) with the homologs found (preceding rows). As you can see, there are two red sections which are highly conserved - these correspond to the core alpha-crystallin domain that characterises sHSPs. The other parts of the sequence show more variation, and it has been hypothesised that these regions are involved in oligomerisation and substrate binding (Caspers et al., 1995).
We provided our mature sequence data to several modelling programs, which either used comparative modelling techniques or deep-learning methods to predict the structure. These included iTASSER, Rosetta Comparative Modelling, RoseTTAFold and AlphaFold, which each produced 5-10 models from the input data.
All of these different programs predicted the alpha-crystallin domains with a high degree of confidence, but there was a lot of variation in the N- and C-terminal regions and the middle loop. These are the regions with most variation, as seen in the T-COFFEE MSA above. Since the terminal regions are likely to be involved in oligomerisation (Caspers et al., 1995), it seemed reasonable to conclude these regions would be dynamic, and a more confident prediction of their structure would have to wait until the subunits were placed together.
Nevertheless, to obtain one structure to move forward with we submitted each of them to MolProbity. The most favourable score was obtained from Rosetta Comparative Model 1 (scoring shown below). In addition, this model provided an explicit residue-by-residue score for the certainty of the predicted coordinates, which we felt would be valuable moving forward for the future remodelling of terminal regions once we had an oligomeric scaffold.
Variations on a Theme - diversity and dynamism of HSPs
Small heat shock proteins are known to form complex higher order structures, in the realm of 12-48mers (Basha et al., 2012). Many of these large structures are known to dissociate into dimers under thermal stress in order to interact with denatured proteins, though there is also evidence of non-dissociative interaction (Yu et al., 2021). The large complexes are also known to regularly perform subunit exchange - new HSP dimers swapping with old ones to ‘refresh’ the large complex. As such, HSP oligomers are highly dynamic, and it is predicted that this dynamic process of interactions is governed by the more variable terminal regions of the sequence (Caspers et al., 1995).
The table linked below shows the variety of oligomeric structures formed by homologs of HSP22E:
The Family Tree
For this reason, we generated a phylogenetic tree (Fig. 4) to determine sequence similarity between different HSPs with known structure against HSP22E. Each HSP is colour-coded by the type of oligomer it is known to form. It can be seen that while some clusters have the same order, others with very high sequence similarity do not. This is most noticeable across 4ZJ-A/D/9, which are a 24mer, 18mer and 2mer respectively, but derived from the same sequence with only very minor mutations.
Diving into Detail
Given the ambiguity of the sequence information processed in this way, we decided to approach the problem using more structural information.
The T-COFFEE Expresso program incorporates structural information to align sequences and produce a Multiple Sequence Alignment. We divided our template HSPs into known 2mers, 4mers, 12mers and 24mers, and constructed a MSA for each of these with HSP22E. These particular oligomeric forms were picked because there was a reasonable number of sequences available. From these comparisons, it was clear that HSP22E aligned most closely to dimers and dodecamers. As such, we aimed to first produce a viable dimer for HSP22E, and then use the dimer structure to form the larger 12-mer.
The dodecamer alignment in particular provided an interesting insight. Most of the other alignments were only able to pick up the generalised alpha-crystallin domains as conserved regions, but the dodecamer alignment showed an extra piece of sequence that was well conserved, in between the two large alpha-crystallin domains. Inspection of this small conserved region revealed that it was the region of sequence corresponding to β6 for the known HSPs.
We pursued several methods to predict the dimeric structure of HSP22E:
Manual alignment via ChimeraX to the template 1GME
Modeller alignment via ChimeraX
ROSIE’s Symmetric Docking Tool
Comparing each of the models, the second model produced by ROSIE was selected for further work. This model was energetically favourable and aligned well with the homologous 1GME dimer.
We again used several methods to dock our dimeric subunits into a large oligomer, including Rosie’s Symmetric Docking, Galaxy’s Homomer, and Bonvin Lab’s Haddock, as well as some manual alignment methods in Chimera.
Each of these programs allows input of different parameters for constraints, and so we explored multiple combinations of these to find a model that was biologically favourable as well as energetically favourable.
Each method produced 5-10 models per set of inputs. In the table below, only selected models are shown for each method to illustrate general trends. Any residues coloured in red represent steric clashes in the model.
Manual alignment via ChimeraX
This was performed as per the dimer method. However, with all 12 chains present, the model contained many unresolvable clashes involving the N-terminal tails, and so was deemed unfeasible. Since 1GME is a hetero-12-mer, in which one of the chains is shorter than the other, it was hypothesised that the full-length N-terminal tail of HSP22E on all chains was not compatible with the template.
InputsHaddock is by far the most complex docking tool out of those listed here. It provides a considerable array of modifiable parameters, but those used for this project include:
Modeller alignment via ChimeraX
From the results above, two models are proposed as the most likely potential high-order structures for HSP22E.
- The 24-mer produced by Homomer from alignment to the template 5ZS3. This model had no clashes, and displayed oligomeric domain exchange through β-sheet interactions.
- The refined 12-mer produced by Modeller with a 1GME template.
Since sHSPs are known to form dynamic complexes, it is quite possible that HSP22E is able to form both a 12-mer and a 24-mer from dimeric subunits. Further analysis of oligomeric order could be performed with laboratory access, by measuring the molecular weight of purified HSP22E on a gel.
The wheat heat shock protein 1GME, a homologous heat shock protein to HSP22E, is believed to form a large dodecamer under normal cell conditions, but dissociates into dimers at elevated temperatures (van Montfort et al., 2001). These dimers become the active units that interact with unfolded proteins. Hydrophobic regions of the dimers are buried in the oligomeric structure, but after dissociation, these regions are available on the surface to interact with the hydrophobic areas of denatured proteins. In this way the dimers bind to denatured proteins, preventing the formation of larger aggregates caused by the hydrophobic regions of denatured proteins binding to each other.
We decided to investigate if HSP22E might also function via this mechanism.
To view our modelled dimer in action, we simulated its interaction with a denatured form of citrate synthase (den-CS), commonly used as a model substrate for molecular chaperones. The structure for den-CS was provided to us by the Glover Lab. We first positioned our dimeric unit next to a den-CS unit in Chimera, before submitting this file for molecular dynamics simulation using GROMACS. This was submitted as a job on NCI’s Gadi, a high performance computing cluster in Australia.
All-atom simulations such as this measure the energy interactions between every single atom in the model, which can be quite time consuming. Molecular dynamics expert Brian Ee introduced us to the concept of coarse-grained simulations which define larger pseudo-atoms to represent the system being simulated. While coarse-grained simulations are a less accurate measure of the dynamics of the system, they significantly reduce the computational load required to run the experiment, and are valuable when simulating larger systems and for longer periods of time. We elected to perform a coarse-grained molecular dynamics experiment due to the relatively large size of our dimer and denatured protein system.
We performed the simulation at a temperature of 313K, at which proteins have the potential to denature, and at which the wheat heat shock protein 1GME has been observed to dissociate from its larger oligomeric storage form into active dimers (van Montfort et al., 2001). Our procedure based on a tutorial and sample script written by Wunna Kyaw and Stephanie Xu of the Lee Lab, Single Molecule Science, EMBL Australia (2018), which Brian provided. The mains steps involved were:
- Convert the PDB structure (which we configured in Chimera) to be used into a coarse-grained structure, defined using the MARTINI22 force field.
- Define a box around the system, the boundaries of which are at least 1.0nm away from the molecule.
- Solvate the system with water. A coarse-grained water model was used as input for this.
- Add ions to the system to produce an overall neutral charge.
- Perform energy minimization on the system to fix poor rotamers and side-chain clashes. Minimization of our system took 7640 steps.
- Equilibrate the system to relax side chain rotamers. The equilibration was set at a reference temperature of 313K.
- Perform the molecular dynamics simulation, also set to a reference temperature of 313K. The Boltzmann distribution from which velocities were sampled was also defined using a temperature of 313K. We performed the simulation for 1.02μs.
- Convert the trajectory to an alpha-carbon version and align the molecule so that the simulation can be viewed in VMD (Visual Molecular Dynamics).
Download the scripts, mdp files, and trajectory files produced in the molecular dynamics simulation, along with our final monomer, dimeric, and large oligomeric structures here.
A next step for this project would be to perform a molecular dynamics simulation of the large oligomers at both regular temperatures and under heat stress. This would test whether the oligomers are able to dissociate into dimers when under heat stress, or if further stimulus is required to initiate dissociation. A simulation could also be performed with an HSP22E oligomer in proximity to denatured proteins under heat stress, to determine if the oligomer can associate with denatured proteins without dissociating.
The dimer MD model could also be extended. Firstly, the MD simulation could be tested with different starting positions and orientations between the HSP and denatured protein. Then, further modelling could be performed with multiple dimeric units and multiple denatured proteins in the one system. This could provide information on the relative binding affinity of HSP22E to denatured protein compared with the denatured proteins’ binding affinity for each other, and provide an indication of how effectively the HSP22E dimer prevents aggregation. This could also provide insight into the various purposes of different oligomeric forms. It has been observed that some heat shock proteins, including homologous heat shock protein 1GME, have their large oligomeric form function as a storage box to keep HSPs contained until they are needed during heat stress (at which point they dissociate into dimers), while other heat shock proteins use their larger oligomeric form as a core to which unfolded protein attach (Rütgers et. al, 2017).
Basha, E., O’Neill, H., Vierling, E. (2012). Small heat shock proteins and α-crystallins: dynamic proteins with flexible functions. Trends in Biochemical Sciences. 37(3). doi:10.1016/j.tibs.2011.11.005.
Caspers, G.-J., Leunissen, J.A.M., De Jong, W.W. (1995). The expanding small heat-shock protein family, and structure predictions of the conserved ‘α-crystallin domain. Journal of Molecular Evolution. 40(3). doi:10.1007/BF00163229.
De Jong, W.W., Caspers, G.-J., Leunissen, J.A.M. (1998). Genealogy of the α-crystallin—small heat-shock protein superfamily. International Journal of Biological Macromolecules. 22(3–4). doi:10.1016/S0141-8130(98)00013-0.
Holbrook, K., Subramanian, C., Chotewutmontri, P., Reddick, L.E., Wright, S., Zhang, H., Moncrief, L., Bruce, B.D.(2016). Functional Analysis of Semi-conserved Transit Peptide Motifs and Mechanistic Implications in Precursor Targeting and Recognition. Molecular Plant. 9(9). pg 1286-1301. doi:10.1016/j.molp.2016.06.004.
Kunze, M., Berger, J. (2015). The similarity between N-terminal targeting signals for protein import into different organelles and its evolutionary relevance. Frontiers in Physiology. 6. doi:10.3389/fphys.2015.00259.
Rütgers, M., Muranaka, L.S., Mühlhaus, T., Sommer, F., Thomas, S., Schurig, J., Willmund, F., Schulz-Raffelt, M., Schroda, M. (2017). Substrates of the chloroplast small heat shock proteins 22E/F point to thermolability as a regulative switch for heat acclimation in Chlamydomonas reinhardtii. Plant Mol Biol. 95(6). pg 579-591. doi:10.1007/s11103-017-0672-y
Rutsdottir, G., Hallmark, J., Weide, Y., Hebert, H. (2017). Structural model of dodecameric heat-shock protein Hsp21: Flexible N-terminal arms interact with client proteins while C-terminal tails maintain the dodecamer and chaperone activity. Journal of Biological Chemistry. 292(19). doi:10.1074/jbc.M116.766816.
Stamler, R. et al. (2005) “Wrapping the α-Crystallin Domain Fold in a Chaperone Assembly,” Journal of Molecular Biology, 353(1). doi:10.1016/j.jmb.2005.08.025.
Van Montfort, R., Basha, E., Friedrich, K., Slingsby, C., Vierling, E. (2001). Crystal structure and assembly of a eukaryotic small heat shock protein. Nat Struct Mol Biol. 8(12). pg 1025–1030. https://doi.org/10.1038/nsb722
Yu, C., Leung, S.K.P., Zhang, W., Lai, L.T.F., Chan, Y.K., Wong, M.C., Benlekbir, S., Cui, Y., Jiang, L., Lau, W.C.Y. (2021). Structural basis of substrate recognition and thermal protection by a small heat shock protein. Nature Communications. 12(1). doi:10.1038/s41467-021-23338-y.
Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., Madden T.L. (2009). BLAST+: architecture and applications. BMC Bioinformatics. 10:421. DOI: 10.1186/1471-2105-10-421
Notredame, C., Higgins, D.G., Heringa, J. (2000). T-Coffee: A novel method for fast and accurate multiple sequence alignment. Journal of Molecular Biology. 302(1). pg 205-217. DOI: 10.1006/jmbi.2000.4042
Sievers F., Wilm A., Dineen D., Gibson T.J., Karplus K., Li W., Lopez R., McWilliam H., Remmert M., Söding J., Thompson J.D., Higgins D.G. (2011). Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7:539 DOI: 10.1038/msb.2011.75
Yang, J., Zhang, Y. (2015). I-TASSER server: new development for protein structure and function predictions. Nucleic Acids Research. 43:W174-W181.
Zhang, C., Freddolino, P.L., Zhang, Y. (2017). COFACTOR: improved protein function prediction by combining structure, sequence and protein–protein interaction information. Nucleic Acids Research. 45:W291-W299.
Baek, M., DiMaio, F., Anishchenko, I., Dauparas, J., Ovchinnikov, S., Lee, G.R., Wang, J., Cong, Q., Kinch, L.N., Schaeffer, R.D., Millán, C., Park, H., Adams, C., Glassman, C.R., DeGiovanni, A., Pereira, J.H., Rodrigues, A.V., van Dijk, A.A., Ebrecht, A.C., Opperman, D.J., Sagmeister, T., Buhlheller, C., PavkovKeller, T., Rathinaswamy, M.K., Dalwadi, U., Yip, C.K., Burke, J.E., Garcia, K.C., Grishin, N.V., Adams, P.D., Read, R.J., Baker, D. (2021). Accurate prediction of protein structures and interactions using a 3-track network. Science 10.1126/science.abj8754. doi: https://doi.org/10.1126/science.abj8754.
Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Zidek, A., Potapenko, A., Bridgland, A., Meyer, C., Kohl, S.A.A., Ballard, A.J., Cowie, A., Romera-Paredes, B., Nikolov, S., Jain, R., Adler, J., Back, T., Petersen, S., Reiman, D., Clancy, E., Zielinski, M., Steinegger, M., Pacholska, M., Berghammer, T., Bodenstein, S., Silver, D., Vinyals, O., Senior, A.W., Kaukcuoghu, K., Kohli, P., Hassabis, D. (2021). Highly accurate protein structure prediction with AlphaFold. Nature. 596. pg 583-589.
Williams, C.J., Headd, J.J., Moriarty, N.W., Prisant, M.G., Videau, L.L., Deis, L.N., Verma, V., Keedy, D.A., Hintze, B.J., Chen, V.B., Jain, S., Lewis, S.M., Arendall 3rd, B.W., Snoeyink, J., Adams, P.D., Lovell, S.C., Richardson, J.S., Richardson, D.C. (2018). MolProbity: More and better reference data for improved all-atom structure validation. Protein Science 27: 293-315.
Chen, V.B., Arendall III, W.B., Headd, J.J., Keedy, D.A., Immormino, R.M., Kapral, G.J., Murray, L.W., Richardson, J.S., Richardson, D.C. (2010). MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallographica D66: 12-21.
Davis, I.W., Leaver-Fay, A., Chen, V.B., Block, J.N., Kapral, G.J., Wang, X., Murray, L.W., Arendall III, W.B., Snoeyink, J., Richardson, J.S., Richardson, D.C. (2007) MolProbity: all-atom contacts and structure validation for proteins and nucleic acids. Nucleic Acids Research 35: Web Server issue, W375-W383.
Andre, I., Bradley, P., Wang, C., Baker, D. (2007). Prediction of the structure of symmetrical protein assemblies. Proc Natl Acad Sci USA. 104(45). pg 17656-61. Link: http://www.pnas.org/content/104/45/17656.long [this is the primary citation for the algorithm].
Lyskov, S., Chou, FC., Conchúir, S.Ó., Der, B.S., Drew, K., Kuroda, D., Xu, J., Weitzner, BD., Renfrew, P.D., Sripakdeevong, P., Borgo, B., Havranek, J.J., Kuhlman, B., Kortemme, T., Bonneau, R., Gray, J.J., Das, R. (2013). Serverification of Molecular Modeling Applications: The Rosetta Online Server That Includes Everyone (ROSIE). PLoS One. 8(5):e63906. doi: 10.1371/journal.pone.0063906. Print 2013.
Heo, L., Park, H., Seok, C. (2013). GalaxyRefine: Protein structure refinement driven by side-chain repacking. Nucleic Acids Res. 41. W384-8. doi: 10.1093/nar/gkt458.
Baek, M., Park, T., Heo, L., Park, C., Seok, C. (2017). GalaxyHomomer: A web server for protein homo-oligomer structure prediction from a monomer sequence or structureI. Nucleic Acids Research. DOI: 10.1093/NAR/GKX246
Apol, E., Apostolov, R., Bauer, P., Berendsen, H.J.C., Bjelkmar, P., Blau, C., Bolnykh, V., Boyd, K., van Buuren, A., van Drunen, R., Feenstra, A., Groenhof, G., Hamuraru, A., Hindriksen, V., Irrgang, M.E., Lupinov, A., Junghans, C., Jordan, J., Karkoulis, D., Kasson, P., Kraus, J., Kutzner, C., Larsson, P., Lemkul, J.A., Lindahl, V., Lundborg, M., Marklund, E., Merz, P., Meulenhoff, P., Murtola, T., Pall, S., Pronk, S., Schulz, R., Shirts, M., Shvetsov, A., Sijbers, A., Tieleman, P., Virolainen, T., Wennberg, C., Wolf, M., Zhmurov, A. (2021). GROMACS Documentation. Royal Institute of Technology.
A. Bondi. van der Waals Volumes and Radii. J. Phys. Chem. 68 (1964) pp. 441-451
MARTINIZE, script version 2.4: de Jong et al., J. Chem. Theory Comput., 2013, DOI:10.1021/ct300646g
Humphrey, W., Dalke, A., Schulten, K., (1996). VMD - Visual Molecular Dynamics. J. Molec. Graphics. 14.1. pg 33-38.
Molecular graphics and analyses performed with UCSF ChimeraX, developed by the Resource for Biocomputing, Visualization, and Informatics at the University of California, San Francisco, with support from National Institutes of Health R01-GM129325 and the Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases.
Pettersen, E.F., Goddard, T.D., Huang, C.C., Meng, E.C., Couch, G.S., Croll, T.I., Morris, J.H., Ferrin, T.E. (2021). UCSF ChimeraX: Structure visualization for researchers, educators, and developers. Protein Sci. 30(1). pg 70-82. doi: 10.1002/pro.3943.
The PyMOL Molecular Graphics System, Version 1.2r3pre, Schrödinger, LLC.
Honorato, R.V., Koukos, P.I., Jimenez-Garcia, B., Tsaregorodtsev, A., Verlato, M., Giachetti, A., Rosato, A., Bonvin, A.M.J.J. (2021). Structural biology in the clouds: The WeNMR-EOSC Ecosystem. Frontiers Molecular Biosciences 8. fmolb.2021.729513.
Van Zundert, G.C.P., Rodrigues, J.P.G.L.M., Trellet, M., Schmitz, C., Kastritis, P.L., Karaca, E., Melquiond, A.S.J., Van Dijk, M., De Vries, S.J., Bonvin, A.M.J.J. (2016). The HADDOCK2.2 webserver: User-friendly integrative modeling of biomolecular complexes. Journal of Molecular Biology. 428. pg 720-725.
The FP7 WeNMR (project# 261572), H2020 West-Life (project# 675858), the EOSC-hub (project# 777536) and the EGI-ACE (project# 101017567) European e-Infrastructure projects are acknowledged for the use of their web portals, which make use of the EGI infrastructure with the dedicated support of CESNET-MCC, INFN-PADOVA-STACK, INFN-LNL-2, NCG-INGRID-PT, TW-NCHC, CESGA, IFCA-LCG2, UA-BITP, SURFsara and NIKHEF, and the additional support of the national GRID Initiatives of Belgium, France, Italy, Germany, the Netherlands, Poland, Portugal, Spain, UK, Taiwan and the US Open Science Grid.