Team:IISc-Bangalore/Predictions

Predictions | IISc Bangalore

Protein Structure Prediction


In this page, we elaborate our efforts to model the three constructs relevant to our project - C1, C2, C2 (2.0) and C2 (3.0). In order to prevent random trial-and-error based approach to genetic engineering, we have sought to take a more informed decision on the composite parts by using structural models built using AlphaFold.

Structural Prediction of Fusion Proteins using AlphaFold

Since we are working with fusion proteins, it is necessary to ensure that the synthetically designed proteins fold properly, and that each modular part does not interfere with the tertiary structure of the other parts. Moreover, we are working with an enzyme - OpdA. Naturally, an enzyme must fold in the exact same way as its native state in order to be soluble, stable and to preserve activity.

As C1 and C2 are fusion proteins, we had no idea how they might fold in physiological conditions, even though the crystal structures of the individual modules had already been solved before. Thus, in order to gain structural insight, we decided to utilize AlphaFold to predict a tertiary structure for these proteins. We hoped this would help us understand how they might fold in solution, and also help us make informed decisions about any future design steps.

C1

The amino acid sequence of C1 was fed into the AlphaFold Google Colab notebook. AlphaFold found 15537 partial matches having sequence similarity in a genetic database, which it used for a multiple sequence alignment (MSA).

Fig 1: C1 Multiple Sequence Alignment Results

It then utilized machine learning based on multiple models to come up with a predicted structure for C1, alongwith some predicted metrics.

Fig 2: C1 Prediction metrics - pLDDT (predicted Local Distance Difference Test) and Predicted Aligned Error

The predicted structure of C1 is given below:

Fig 3: Predicted cartoon model of C1 [N- to C- direction is blue to red]

We interpreted the pLDDT score (which is a measure of model confidence) and the 3D structure according to the findings in the original AlphaFold paper. Regions with pLDDT > 70 were reliably considered to have the same tertiary structure as predicted by the model - this was true for roughly 4 stretches of amino acids: ~5-20 corresponding to the PelB signal peptide, ~70-150 covering much of SpyCatcher002, and ~190-230 and ~250-290 corresponding to the two cellulose-binding domains of dCBD. Other regions had lower pLDDT scores and were less likely to have definite tertiary structures. The AlphaFold authors have previously shown that pLDDT < 50 is a reasonably strong predictor of disorder.

In order to gain more information, we used Expasy's ProtParam tool to look at different parameters of C1 and its constituent modules which would relate to its stability and solubility, like isoelectric point (pI), estimated half-life (t1/2), instability index (II) and grand average of hydropathicity (GRAVY).

Protein/Module Amino Acid Positions pI t1/2 II GRAVY
C1 1 to 286 5.91 > 10 hrs 37.14 -0.491
PelB signal peptide 1 to 22 8.34 > 10 hrs 41.42 1.191
SpyCatcher002 43 to 157 4.51 > 10 hrs 28.79 -0.554
dCBD 165 to 286 7.88 > 10 hrs 43.88 -0.500

Our interpretations of the above results were as follows:

  1. Overall, C1 seemed to be pretty stable (II < 40) and hydrophilic (GRAVY < 0). pI and t1/2 values were satisfactory too.
  2. The PelB signal peptide seemed unstable and also highly hydrophobic.
  3. SpyCatcher002 had no issues at all in terms of stability or hydrophilicity.
  4. dCBD was hydrophilic but seemed to be quite unstable.

From the above results and the AlphaFold predicted structure, our conclusions are listed below:

  1. The functional parts viz. PelB signal peptide, SpyCatcher002 and dCBD have high pLDDT scores throughout and might fold in a similar fashion as those in their native states.
  2. The other modules, which are mostly just linkers, have very low pLDDT scores and are most likely intrinsically disordered regions (IDRs). This is favorable since they are flexible and are not expected to cause the functional modules to interfere with each other's folding.
  3. The lysine residue (Lys72) of SpyCatcher002, which forms an isopeptide bond with SpyTag002, lies on the surface, and since the linkers are IDRs, this bond formation is not expected to be hampered by the dCBD module. Thus, we expect the activity of SpyCatcher002 and dCBD to be unaffected by each other.

Based on the above interpretations, we were confident that C1 would be stable and soluble, which we indeed verified as detailed in the Experiments and Results section.

C2

We were not able to obtain a predicted structure for C2 from AlphaFold, because it is quite large (686 amino acids), and our computational resources were not enough. However, in the design phase we had initially made provisions in order to remove the sfGFP module from C2 since it was just a reporter and had no actual functional significance for our project. We thus decided to obtain a predicted structure for C2(2.0), which is just C2 minus the sfGFP module.

C2(2.0)

The amino acid sequence of C2(2.0) was fed into the AlphaFold Google Colab notebook. This time AlphaFold found 7131 partial matches, which it used for a multiple sequence alignment (MSA).

Fig 4: C2(2.0) Multiple Sequence Alignment Results

The predicted metrics were as follows:

Fig 5: C2(2.0) Prediction metrics - pLDDT (predicted Local Distance Difference Test) and Predicted Aligned Error

The predicted structure of C2(2.0) is given below:

Fig 6: Predicted cartoon model of C2(2.0) [N- to C- direction is blue to red]

The only region with pLDDT > 70 was the stretch of amino acids ~30-350, which corresponds to the majority of the OpdA enzyme module, minus around 35 amino acids at its C terminus.

Again, we used Expasy's ProtParam tool to look at the same parameters for C2(2.0) and its constituent modules.

Protein/Module Amino Acid Positions pI t1/2 II GRAVY
C2(2.0) 1 to 441 9.15 > 10 hrs 34.03 -0.118
OpdA signal peptide 1 to 27 9.50 > 10 hrs 34.05 0.537
OpdA enzyme 28 to 384 8.58 > 10 hrs 36.23 -0.014
C2(2.0) C terminal region 385 to 441 8.34 > 10 hrs 17.93 -0.991

Our interpretations of the above results were as follows:

  1. pI values of C2(2.0) and all its subparts are much larger than 7, II values are less than 40, and t1/2 is more than 10 hours, all of which suggest that these components are expected to be satisfactorily stable in vivo.

  2. GRAVY values are mostly negative which indicates hydrophilicity. Except for the C terminal region, the values are quite close to 0 which implies that they are barely hydrophilic. The GRAVY value for the signal peptide is positive and quite high, which suggests that it might be quite hydrophobic.

From the above results and the AlphaFold predicted structure, our conclusions are listed below:

  1. The OpdA enzyme has high pLDDT scores throughout and might fold in a similar fashion as that in its native state. It is not very hydrophilic though.

  2. The OpdA signal peptide at the N terminus had a very low pLDDT score and is most likely an IDR. It is also quite hydrophobic, and we realized that this could lead to aggregation of C2 and C2(2.0) since the enzyme itself is not very hydrophilic.

  3. The C terminal region containing the linkers, SpyTag and the 10X His-tag is also predicted to be an IDR, but it is quite hydrophilic and as a result, is not expected to interfere with the overall folding or stability.

Based on our interpretations of the information we collected in regard to the structure and properties of C2(2.0), we decided to remove the OpdA signal peptide from the N terminus of C2(2.0) to yield C2(3.0) in hopes of improved stability and solubility.


C2(3.0)

Even though we had a very good idea of how C2(3.0) might fold since we just removed the N-terminal OpdA signal peptide from C2(2.0), we decided to predict its structure using AlphaFold as a double check, and also for the sake of completeness. The predicted structure of C2(3.0) is given below:

Figure 7: Predicted cartoon model of C2(3.0) [N- to C- direction is blue to red]

Just as we had assumed, the folding pattern of the rest of the fusion protein should not change due to removal of the OpdA signal peptide. Thus, we are pretty confident that we will be able to verify the solubility of C2(3.0) and eventually also be able to purify it.

References

Jumper J et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021 Aug;596(7873):583-589. doi: https://doi.org/10.1038/s41586-021-03819-2

Our Sponsors



Best viewed on Desktop