A Protein Secondary Structure Prediction Server

About JPred

JPred is a Protein Secondary Structure Prediction server and has been in operation since approximately 1998. JPred incorporates the Jnet algorithm in order to make more accurate predictions. In addition to protein secondary structure JPred also makes predictions on Solvent Accessibility and Coiled-coil regions (Lupas method).

The current version of JPred (v4) has the following improvements and updates incorporated:

  • Retrained on the latest UniRef90 and SCOPe/ASTRAL version of Jnet (v2.3.1) - mean secondary structure prediction accuracy of >82%
  • Upgraded the Web Server to the latest technologies (Bootstrap framework, JavaScript) and updating the web pages – improving the design and usability through implementing responsive technologies.
  • Upgraded the results reporting – both, on the web-site, and through the optional email summary reports: improved batch submission, added results summary preview through Jalview results visualization summary in SVG and adding full multiple sequence alignments into the reports.
  • Improved help-pages, incorporating tool-tips, and adding one-page step-by-step tutorials.

The JPred v3 followed on from previous versions of JPred developed and maintained by James Cuff and Jonathan Barber (see References). This release added new functionality and fixed lots of bugs. The highlights are:

  • New, friendlier user interface
  • Retrained and optimised version of Jnet (v2) - mean secondary structure prediction accuracy of >81%
  • Batch submission of jobs
  • Better error checking of input sequences/alignments
  • Predictions now (optionally) returned via e-mail
  • Users may provide their own query names for each submission
  • JPred now makes a prediction even when there are no PSI-BLAST hits to the query
  • PS/PDF output now incorporates all the predictions

The static HTML pages of JPred 2 are still available for reference.

Data

DCPB1500 (JNet v.2.3.1 data and results)

The latest SCOPe/ASTRAL-derived data used to train Jnet v.2.3.1

All details input sequences, DSSP, PSI-BLAST profiles, JPred/JNet results, and summary tables are available through the following page: link.

CB513

The original training data used to train the Jnet neural networks as described in the Cuff & Barton (2000) paper.

513 non redundant sequences, that can be used to test new secondary structure prediction methods. 396 sequences are derived from the 3Dee database of protein domains plus 117 proteins from the Rost and set of 126 non redundant proteins. All sequences in this set have been compared pairwise, and are non redundant to a 5SD cut-off.

The file contains definitions from the DSSP, DEFINE and STRIDE definition methods. No 8 to 3 state reduction is carried out on the definition data. Each protein also has a multiple sequence alignment associated with the target sequence. This alignment was built using the automatic alignment method within JPred.

The format is as simple comma separated variable file e.g.:

DSSP:-,-,-,G,G,G,-,-,-,E,E,E,E,E,-,-,-,H,H,H,H,H,-,

No predictions from any method are included in this file, only definitions.

CB406
The 'blind' data to validate the CB513-trained neural networks. A final Q3 accuracy of 76.4% was achieved.

Both the CB513 and CB406 datasets are internally non-redundant and there is no detectable sequence redundancy between the two datasets either.

396_predictions_distribute.tar.gz

This set contains the 396 predictions as used in the Cuff J. A. and Barton G. J., Proteins. (1999) paper. The Q3 accuracies were generated by taking G and B as Helix and Strand respectively. This file is also in 'concise' comma separated format.

CASP predictions

Predictions for all the CASP targets that were not docking targets. These predictions were done during the CASP3 assessment, therefore all CASP3 targets are valid predictions, where as predictions from the other CASP's may be contaminated as the prediction methods may now have those structures in their databases.

CASP profile predictions tar.gz file

BLOCK format file containing all the CASP predictions.

JPred predictions (legacy)

These are HTML rendered predictions for the current methods that are implemented in the JPred server. The prediction accuracies for the 126 protein set will be artificially high as all the methods had this data in their training set. Only PHD was run in cross validated mode, for this set.

The 396 domain proteins that are of the form 1edmc-1-XXXX were used to obtain the results for JPred. This sub set contains no sequence homology to the 126 protein set, and achived a Q3 accuracy of 72.9% for the consensus and 71.9% for PHD, the next best method. The 513 set shown here is composed of the 396 non redundant set plus 117 proteins of the 126 set that achieved a similarity score to the proteins in the 396 set that was lower than 5SD.

Jnet

The Jnet v2.3.1 C source code can be found here:
jnet_src_v.2.3.1.tar.gz

We are in the process of making the JPred and Jnet available as a standalone installation soon (early 2015).

Usage Statistics

JPred historical and recent usage can be viewed here.

Primary citation: Drozdetskiy A, Cole C, Procter J & Barton GJ. Nucl. Acids Res.
(first published online April 16, 2015) doi: 10.1093/nar/gkv332 [link]
More citations: link.
.