Team:USTC-Software/Proof Of Concept

Document

Proof of Concept

The project is supported by a number of papers:

Prediction of Secondary Structure(JPred4):

On the blind test, the average secondary structure prediction Q3 score increased to 82.0% from 81.5% for JNet v.2.0, and solvent accessibility prediction accuracy rose to 90.0, 83.6 and 78.1% from 88.9, 82.4 and 77.8% for JNet v.2.0 for each of >0, >5 and >25% relative solvent accessibility thresholds[1].

Prediction of Isoelectric Point(IPC2.0):

The current pKa prediction methods are: MCCE, H++ and pKa Rosetta in three ways, the algorithm used in IPC2.0 achieved the minimum value in RMSE (root mean square error), MAE (mean absolute error)[2].

Prediction of Transmembrane Topology (DeepTMHMM):

TMHMM predicts transmembrane helices from single sequences with a high level of accuracy. Only about 2.5% of the 696 helices in the data set of 160 proteins are missed, and about an equal number of false helices are predicted. About 77–78% of the topologies are predicted correctly, and an additional 7 % were correct except that the topology was inverted, i.e. the cytoplasmic side was predicted as periplasmic and vice versa. On the testing data, TMHMM’s cross-validated accuracy is 79%, while non-cross-validated TMHMM reaches 84% accuracy[3].

Subcellular Locations (Cell-PLoc2.0):

After redeveloping the algorithm, the overall prediction accuracy of Cell-PLoc2.0 has been greatly improved, as shown in the above figure[4].
Comparison
1. Gram-negative bacterial
PSORTb v.2.0 is a predictor is widely used by biologists for predicting the subcellular locations of Gram-negative bacterial proteins. A comparison of the prediction results between Gnegc-mPloc and PSORTb v.2.0 on the Online Support Information B test dataset is shown below.
2. Plant
TargetP is widely used by biologists for predicting the subcellular locations of plant proteins. As reported, the overall success rates by Plant-mPLoc on a testing dataset that contains 1,775 plant proteins of which 1,500 are of chloroplast and 275 of mitochondrion was 86%, which is more than 40% higher than that by TargetP on the same testing dataset[5].

Reference

[1]Alexey Drozdetskiy, Christian Cole, James Procter, Geoffrey J. Barton, JPred4: a protein secondary structure prediction server, Nucleic Acids Research, Volume 43, Issue W1, 1 July 2015, Pages W389–W394, https://doi.org/10.1093/nar/gkv332
[2]Lukasz Pawel Kozlowski, IPC 2.0: prediction of isoelectric point and pKa dissociation constants, Nucleic Acids Research, Volume 49, Issue W1, 2 July 2021, Pages W285–W292, https://doi.org/10.1093/nar/gkab295
[3]Chou, K. and Shen, H. (2010) Cell-PLoc 2.0: an improved package of web-servers for predicting subcellular localization of proteins in various organisms. Natural Science, 2, 1090-1103. doi: 10.4236/ns.2010.210136.
[4]Anders Krogh, Björn Larsson, Gunnar von Heijne, Erik L.L Sonnhammer,Predicting transmembrane protein topology with a hidden markov model: application to complete genomes11Edited by F. Cohen,Journal of Molecular Biology,Volume 305, Issue 3,2001,Pages567-580,ISSN0022-2836, https://doi.org/10.1006/jmbi.2000.4315.
[5]Chou, K.C., Shen, H.B. (2010) Plant-mPLoc: A top- down strategy to augment the power for predicting plant protein subcellular localization. PLoS ONE, 5, e11335
Contact Us

Mail:
USTC_Software2021@163.com

University of Science and Technology of China, No.96, JinZhai Road Baohe District, Hefei, Anhui, 230026, P.R.China