Proof of Concept
The project is supported by a number of papers:
On the blind test, the average secondary structure prediction Q3 score increased to 82.0% from 81.5% for JNet v.2.0, and solvent accessibility prediction accuracy rose to 90.0, 83.6 and 78.1% from 88.9, 82.4 and 77.8% for JNet v.2.0 for each of >0, >5 and >25% relative solvent accessibility thresholds.
The current pKa prediction methods are: MCCE, H++ and pKa Rosetta in three ways, the algorithm used in IPC2.0 achieved the minimum value in RMSE (root mean square error), MAE (mean absolute error).
TMHMM predicts transmembrane helices from single sequences with a high level of accuracy. Only about 2.5% of the 696 helices in the data set of 160 proteins are missed, and about an equal number of false helices are predicted. About 77–78% of the topologies are predicted correctly, and an additional 7 % were correct except that the topology was inverted, i.e. the cytoplasmic side was predicted as periplasmic and vice versa. On the testing data, TMHMM’s cross-validated accuracy is 79%, while non-cross-validated TMHMM reaches 84% accuracy.
After redeveloping the algorithm, the overall prediction accuracy of Cell-PLoc2.0 has been greatly improved, as shown in the above figure.
1. Gram-negative bacterialPSORTb v.2.0 is a predictor is widely used by biologists for predicting the subcellular locations of Gram-negative bacterial proteins. A comparison of the prediction results between Gnegc-mPloc and PSORTb v.2.0 on the Online Support Information B test dataset is shown below.
2. PlantTargetP is widely used by biologists for predicting the subcellular locations of plant proteins. As reported, the overall success rates by Plant-mPLoc on a testing dataset that contains 1,775 plant proteins of which 1,500 are of chloroplast and 275 of mitochondrion was 86%, which is more than 40% higher than that by TargetP on the same testing dataset.
Alexey Drozdetskiy, Christian Cole, James Procter, Geoffrey J. Barton, JPred4: a protein secondary structure prediction server, Nucleic Acids Research, Volume 43, Issue W1, 1 July 2015, Pages W389–W394, https://doi.org/10.1093/nar/gkv332
Lukasz Pawel Kozlowski, IPC 2.0: prediction of isoelectric point and pKa dissociation constants, Nucleic Acids Research, Volume 49, Issue W1, 2 July 2021, Pages W285–W292, https://doi.org/10.1093/nar/gkab295
Chou, K. and Shen, H. (2010) Cell-PLoc 2.0: an improved package of web-servers for predicting subcellular localization of proteins in various organisms. Natural Science, 2, 1090-1103. doi: 10.4236/ns.2010.210136.
Anders Krogh, Björn Larsson, Gunnar von Heijne, Erik L.L Sonnhammer,Predicting transmembrane protein topology with a hidden markov model: application to complete genomes11Edited by F. Cohen,Journal of Molecular Biology,Volume 305, Issue 3,2001,Pages567-580,ISSN0022-2836, https://doi.org/10.1006/jmbi.2000.4315.
Chou, K.C., Shen, H.B. (2010) Plant-mPLoc: A top- down strategy to augment the power for predicting plant protein subcellular localization. PLoS ONE, 5, e11335