Team:SCU-China/Protein Structure

SCU-China

..

1. Abstract

The efficiency of CRISPRa could be influenced by its own construction, meaning that different activation domains(AD) as well as fusion types of AD and Cas protein should be evaluated systematically.

Based on the principle of σ70-based CRISPRa and reference data, four proteins were analyzed by Alphafold2 ,SWISS-MODEL and protein-protein docking.

In this part, we planned to select alternative AD from the N-terminal domain of the RNA polymerase α subunit of Escherichia coli, Vibrionatriegens and P.Aeruginosa, as well as the whole RNA polymerase α subunit and σ subunit of Vibrionatriegens (Abbreviated as E.c NTD, V.n NTD, P.aru NTD, V.n α, V.n σ later). We expected to connect AD to dCas Φ-2 through the linker and recruit RNA polymerase with the help of the binding of the β subunit or β' subunit of RNA polymerase to AD.

To select the most suitable AD from the N-terminal domain species of RNA polymerase α subunit of Escherichia coli, Vibrionatriegens and P.Aeruginosa, we used the alphafold2 public server of colab , SWISS-Model and I-TASSER server to model the protein structure. Besides, we simulated the protein-protein docking of AD and RNA polymerase β subunits by Zdock analysis of the Discovery Studio and Rdock Optimization function. Finally, we used PyMol to compare the docking results and the actual structures to find out their differences.The results showed that the α domain of RNAp from Vibrio natriegens could accomplish the best gene activation compared with others.

Our original design based on the conclusion above showed non-ideal result, however, all of these four ADs impeded the gene expression while classic AD SoxS showed a successful activation. We assumed that this may due to the fusion manner which was then proved by both docking analysis and practical function tests that Cas-linker-AD is better than our first design, AD-linker-Cas.

Moreover, by comparing the modeling results from the I-TASSER server, we selected a relatively more appropriate linker from the six linkers collected.

Our modeling was integrated with our design and provided a direction for our project as shown in Figure1.

Fig 1. Integration of the protein structure model and our design.

..

2. Protein Structure Modeling

Since we didn't find the structure of the three kinds of N-terminal domains of RNA polymerase α subunit and many protein structures in Vibrionatriegens are difficult to see in the public database, we intended to model the protein structure ourselves. We got the sequence of its N-terminal domain from the literature we mentioned earlier, and used the alphafold2 public server of colab and SWISS-MODEL to model the structure of the three AD sequences, respectively. When using the alphafold2, we also adopted its MMseq2 database to generate multiple sequences alignment(MSA) results to guide modeling. We developed four models for each AD, three generated by the Alphafold2 and one by SWISS-MODEL. Then we used PyMol to compare them with the corresponding RNA polymerase subunit structure to select the most similar structure.

Alphafold2 modeling
A.Scoring of Alphafold2 modeling results.

	E.cNTD	P.aru NTD	V.n NTD	V. N α	V. N σ
Model 1	91.98	90.59	88.65	89.74	80.99
Model 2	91.14	89.62	88.29	89.11	79.81
Model 3	90.81	89.48	88.27	88.94	79.56

Table 1. Scores of three models of AD obtained by modeling with Alphafold2.

B. The sequence IDDT prediction map. The higher the IDDT value is, the more reliable the position is.

Fig 2. The IDDT value of E.c NTD model predicted by alphafold2

Fig 3. The IDDT value of P.aru NTD model predicted by alphafold2

Fig 4. The IDDT value of V. N NTD model predicted by alphafold2

Fig 5. The IDDT value of V. N α model predicted by alphafold2

Fig 6. The IDDT value of V. N σ model predicted by alphafold2

C. Summary
From the modeling results generated by the Alphafold2, we found that the difference between the structure of the three models was very slight. So we selected the highest scoring result, model1, as the modeling result of Alphafold2 and compared it with the modeling result of SWISS-MODEL. However, the modeling result score of V. n σ was too poor. So we decided to pay more attention to its later evaluation results.

SWISS-MODE Model modeling
Since our modeling objects (three kinds of AD) have complete template information, we then used SWISS-MODEL to perform homologous modeling. As a result, we got protein with high coverage. Since the result of homology modeling is accurate when the template coverage is above 60% and is not reliable when the template coverage is less than 30%. The modeling results of SWISS-MODEL were pretty credible.

RMSD value comparison
To select the most suitable protein structure for subsequent modeling, we compared the the first models generated by Alphafold2 and the model generated by SWISS-MODEL with the corresponding structure of RNA polymerase α subunit using PyMol.

SWISS-MODEL

E.c NTD	P.aru NTD	V.n NTD	V. N α	V. N σ
0.611	0.594	1.140	0.532	0.114

Table 2. RMSD value obtained by comparing the AD structure obtained by SWISS-MODEL modeling with the original α subunit template.

Alphafold2

E.c NTD	P.aru NTD	V.n NTD	V. N α	V. N σ
1.116	1.089	1.062	1.008	2.781

Table 3. RMSD value obtained by comparing the AD structure obtained by Alphafold2 modeling with the original α subunit template.

Generally speaking, when the RSDs value is less than 2, we believe that the two proteins structure used for comparison are very similar. From this, we learned that the modeling effect of the two methods was fantastic. It is worth mentioning that the structure of SWISS-MODEL was slightly closer to the original structure. Therefore, we uniformly used the modeling results of SWISS-MODEL here.

Preview of the final model structure

Fig 7. Structure comparison of protein predicted by SWISS-MODEL and alphafold2. The yellow one is the result by SWISS-MODEL while the magenta one is by alphafold2.From top to bottom, they are:E.c NTD,V.n α,V.n σ

..

3. Protein Docking Modeling

After getting the protein structure, we will use Discovery Studio to connect the protein structure and Rdock to optimize it. Through the docking of protein structures, we can compare the ability of different AD to recruit RNA polymerase, and then select the most suitable AD.

Protein docking technology is a technique to predict protein interaction and recognition. At present, many programs have been developed for protein-protein dockings, such as Hex Protein Docking, ZDOCK, RZOCK, Rosetta, etc. ZDOCK is a rigid protein docking algorithm based on fast Fourier transform correlation technology, which is created by the University of Massachusetts Medical School, provides rigid-body docking of two protein structures. The fast Fourier transform correlation technique in the algorithm searches the translation and rotation space of a protein-protein system. The docking results and clustering results of these conformations were sorted according to ZRANK or ZDOCK scores.

Fig 8. Two proteins to be docked by Zdock, the left receptor protein is 4LK1 protein’s c chain from PDB database, and the right ligand protein is V.n α protein predicted by SWISS-MODEL.

We used the Discovery Studio to complete the docking of proteins. We obtained the protein structure of E. coli RNA polymerase numbered 4LK1 from the pbd protein database during the docking process. Because its molecular weight is too large, we only retained its β subunit to simulate our docking. We defined the β subunit as receptor protein and the AD as ligand-protein. In the docking process, the Euler angle of rotating sampling of ligand orientation was set to 15° to avoid excessive redundant calculation. Because the ligand inhibitors used in our modeling were very small, we set RMSD Cutoff to the cluster radius of 6.0 and Interface Cutoff to 9.0 to get better results.

E. c α-NTD, V.n α-NTD, P.aru α-NTD and the whole α subunit of Vibrionatriegens (V. n α) were used as ligand proteins, while the β subunits of E. coli RNA polymerase were used as receptor proteins in the first form of docking. At the same time, we also used the σ subunit of V. n as the ligand-protein and the β subunit of E. coli RNA polymerase as the receptor protein for the second form of docking.

RDOCK is an algorithm for modifying and optimizing docking complexes and uses the CHARMm-based program to optimize and score. The docking structure predicted by ZDOCK could be optimized and reordered to select the near-native structure. RDOCK is mainly composed of two-stage energy minimizing scheme, which includes evaluating electrostatic energy and dissolvent energy, and finally ranking the optimized results according to the sum of the two kinds of energy. In energy minimizing of the two stages, RDOCK uses CHARMm to eliminate collision and optimize polarity and charge interaction.

The result information of RDOCK is shown in the table browser.

AD	PoseName	Zdock Score	Density	ZRank Score	E_vdw1	E_elec1	E_vdw2	E_elec2	E_sol	E_RDock
Vn Alpha	Pose1815	15.78	5	133.998	-135.616	-0.207266	-136.887	21.6948	38	57.5254
Ec NTD	Pose1904	16.64	6	159.819	-29.2803	-5.83375	-173.468	15.624	42.5	56.5616
Par NTD	Pose591	21.92	6	8.127	-59.7483	3.9362	-126.705	18.8211	19.9	36.8389
Vn NTD	Pose1631	23.06	4	104.867	82.4443	-5.0172	-204.169	2.1624	47.3	49.2462

Table 4.The results of each AD and RNA polymerase β subunit Rdock score the highest for each pose. In the above table:
> E_elec1 and E_elec2: electrostatic potential energy of protein complexes after the first and second rounds of CHARMm Energy Optimization
E_vdw1 and E_vdw 2: van der Waals non-Bond interaction Energy of protein complexes after the first and second rounds of CHARMm Energy Optimization
E_sol: desolvation energy of protein complexes calculated by the ACE method
E_RDock:The definition of RDOCK score is: E_elec2 + beta × E_sol

The figure above shows that the complete α subunit of Vn and the α-NTD of Ec have the best docking results, and the α subunit of Vn is relatively suitable for our scene. Secondly, the α-NTD of VN is good as well, while the docking results of α-NTD of P.aru and σ subunits of Vn are not so ideal. Therefore, we chose the top scoring combination, the N-terminal domain of the α subunit of Vn, as our AD.

Fig 9. The docking results after Rdock, in which the green part is the 4LK1 protein C chain from PDB database, and the blue part is the V.n α protein predicted by SWISS-MODEL.

To further verify the feasibility of the docking results, we used PyMol to compare the our Zdock docking results with the docking structure of α subunit and β subunit in E. coli RNA polymerase. The RMSD value is 0.081, which means that the docking effect is excellent, and the Zdock docking results are credible.

Fig 10. Structural comparison of the results of our docking with the original protein structure. The blue part is the AC chain of 4LK1 protein from PDB database, and the purple part is our docking result. The RMSD value is 0.081.

..

4. Linker Modeling

We have found the most suitable AD through protein docking, and now we need to get the connection mode of linker connecting fusion protein through modeling. This will include obtaining a suitable linking site between linker and AD and selecting the most suitable linker. We will use SWISS-MODEL and I-TASSER server to model the protein carrying linker, and select it by appropriate methods.

To decide to connect the N-terminal or C-terminal of AD to Cas Φ, we repeated the two steps mentioned above, structural modeling and protein docking, to explore the effect of the linker junction site on AD recruitment of RNA polymerase. It is worth noting that in structural modeling, SWISS-MODEL is unable to model the structure of Linker because of the lack of homologous templates. Therefore, we re-established the protein structure of the N-terminal and C-terminal linked Linker in the N-terminal domain of Vibrionatriegens α subunit by using the alphafold2 server and docked it with RNA polymerase β subunit according to the first and second steps. The following are the results.
Results of Linker modeling

score of modeling results

	Vn α-Linker	Linker- Vn α
Model 1	84.08	82.71
Model 2	83.40	81.44
Model 3	83.19	81.02

Table 5. Scores of three models of two kinds of AD obtained by modeling with alphafold2.

2. Sequence prediction map. The higher the value is, the more reliable the position is.

Fig 11. The IDDT value of Vn α-Linker model predicted by alphafold2

Fig 12. The IDDT value of Linker- Vn α model predicted by alphafold2

3. Preview of the final model structure

Fig 13. The modeling results of protein structure obtained by alphafold2.From top to bottom, they are:E.c αNTD-Linker To 4LK1D and Linker-E.c α NTD To 4LK1D.

Rdock results

AD and linker	PoseName	Zdock Score	Density	ZRank Score	E_vdw1	E_elec1	E_vdw2	E_elec2	E_sol	E_RDock
Linker-V.n α To 4LK1D	Pose1134	17.8	3	39.326	-148.982	0.914272	-152.111	13.715	40.4	52.7435
V.n α-Linker To 4LK1D	Pose789	23.6	3	14.574	679.918	-6.7492	-135.066	15.4786	8.5	22.4308

Table 6. The highest score of Rdock after docking two kinds of AD carrying linker with RNA polymerase α subunit.In the above table:
> E_elec1 and E_elec2: electrostatic potential energy of protein complexes after the first and second rounds of CHARMm Energy Optimization
E_vdw1 and E_vdw 2: van der Waals non-Bond interaction Energy of protein complexes after the first and second rounds of CHARMm Energy Optimization
E_sol: desolvation energy of protein complexes calculated by the ACE method
E_RDock:The definition of RDOCK score is: E_elec2 + beta × E_sol

From the results above, we leaned that both the N-terminal and C-terminal linker of the α subunit would impact the docking effect, as expected. We also find that the docking effect of the C-terminal Linker is significantly better than that of the N-terminal Linker. That is, casφ-linker-AD is a more ideal configuration.

This aroused our interest, and we further explored the original. At first, we planned to compare the differences between the two structures and the original V.n α structure, but then we found that the RMSD values of the three structures were all less than 2, indicating their high similarity. However, when we compare the whole structure after Rdock, we find the reason for the difference.

Fig 14. Docking structure between AD and RNA polymerase β subunit after Rdock. AD from top to bottom is V.n α-Linker, Linker- V.n αand V.n α. The red part is RNA polymerase β subunit, and the blue part is AD.

It can be found that V.n α is closely associated with RNA polymerase β subunit, followed by Linker- Vn α. On the contrary, Vn α-Linker and RNA polymerase β subunit are basically not docked correctly, but more like AD is attached to RNA polymerase β subunit.so we chose the configuration of the Cas Φ-linker-AD for further research.

After determining the linkage site of the Linker, we continued to filter the most appropriate linker.

Flexible linker	cas Φ	AD
GSEASGSGRA	0.436	3.126
GGGGSGGGGSGGGGS	0.486	1.094
GGGGGGGG	0.425	3.139
Rigid Linker
SEAAAREAAAREAAAREAAAR	0.433	2.282
AEAAAKEAAAKEAAAKEAAAKALEAEAAAKEAAAKEAAAKEAAAKA	0.421	2.293
EAAAKEAAAKEAAAK	32.276	3.025

Table 7. RMSD values of fusion proteins carrying different linker compared with cas Φ and AD template proteins.

To screen the most suitable linker, we built a cas Φ-linker-AD structure with different linkers (where the cas Φ sequence is the one used in our experiment). Additionally, we used the N-terminal domain of E. coli RNA polymerase α subunit as AD. When we continued to use colab's alphafold2 public server, perhaps because the structure of Cas Φ hasn’t been logged in, the established model was inferior, so we chose the I-TASSR server to model later.

We continued to explore whether different Linker affects the independent structure of fusion proteins. Therefore, we compared the structure of fusion protein with that of AD and Cas Φ alone. Finally, we screened out the flexible linker, GGGGSGGGGSGGGGS, whose RMSD values were 0.486 with cas Φ and 1.094 with AD, respectively, which was in line with our expectation.

Fig 15. Fusion protein structure obtained by I-TASSER modeling, the blue part is cas Φ, the red part is linker, and the green part is AD.

Reference

1. Yang, J. , Yan, R. , Roy, A. , Xu, D. , Poisson, J. , & Zhang, Y. . (2014). The i-tasser suite: protein structure and function prediction. Nature Methods, 12(1), 7-8.
2. Claudia V K M , Tsong A J , James C . Rational engineering of a modular bacterial CRISPR–Cas activation platform with expanded target range[J]. Nucleic Acids Research(8):8.