[Dundee Uni] [Jpred]

The Barton Group

A method for protein secondary structure prediction


Frequently Asked Questions

Q. What is Jpred? A. Jpred is a web server that takes a protein sequence or multiple alignment of protein sequences, and from these predicts secondary structure using a neural network called Jnet. The prediction is the definition of each residue into either alpha helix, beta sheet or random coil secondary structures.
Q. How does it work?

A. The server runs in two modes; single sequence and multiple sequence.

  1. Multiple sequence If you already have a set of aligned sequences you may submit them as either MSF format or BLC format, and the predictions will run. Your alignment will be modified so that it does not contain gaps in the first sequence. The first sequence should therefore always be your target sequence.
  2. Single sequence For single sequences a multiple alignment is constructed. It is created by the PSI-BLAST algorithm with 3 iterations. Redundant sequences are removed and gaps that have appeared in the query sequence are removed along with the aligned positions in the sequences. The prediction algorithms are then run. More detail can be found in the technical detail section.
Q. What do I need to run Jpred? A. Jpred should work with any WWW browser that supports tables and automatic forwarding. However, if you want to use Jalview you will need a browser that supports Java 1.1, this includes most modern browsers.
Q. Are there any example files?

A. On the submission form, next to where you select the data format is a question mark that is a hyperlink to an example of the formats that have been tested with Jpred.

Q. How long do I have to wait?

A. The length of time you have to wait for your prediction to finish depends your position in the queue. However, once your predictions have started, you have 1 hour for the prediction to complete. The length and number of sequences in the alignment will affect the run time. If the server takes any longer than one hour to complete your request, the job will be killed. This will allow other requests in the queue to run. A message stating that your job has been terminated will be sent to you. If you have a very long sequence to analyse check that to see if it can be split into domains, as this will probably allow the job to finish in time. Requests for analysis of very long sequences may be sent to us (www-jpred@compbio.dundee.ac.uk), and we may be able to run your request at a lower priority.

If the attempts to catch format errors failed, the job will fail. There are error checking routines, but they can be fooled, it's always best to double check your input, and then try again.

The best way to predict how long you job will take is to assume that each job in front of you in the queue will take one hour, that way if you get you results earlier, you will be pleasantly surprised.

Q. I would like a copy of the server to run locally, where can I download it?

A. Sorry. The source is in no state to be released due to general ickyness. However, we may get round to making the source less horrific and releasing it. Drop us a line and if enough people request a release we will consider it.

Q. What are all those files in the RAW directory?

A. The files in the RAW directory are those that were used in the process of predicting your sequence. Some are very useful, for example the *.concise files are a collection of easy to parse files that contains your alignment and predictions. The .pair file is an all pairs comparison of all sequences in your prediction.

Q. That Java viewer is neat, can I have my own copy?

A. The viewer and editor, Jalview was written by Michele Clamp. The source code and the program is available from here.

Q. Who sees the data I submit?

A. Nobody will look at your sequence data, unless there is a specific bug or problem. However, in order to monitor the server, we maintain statistics on the use of the server.

Q. What is the ID used for?

A. This is an identifier that allows some degree of security with the server. Unless you know the ID code you can't look at any other requests on the server. It's akin to a very weak password mechanism, to allow you to come back to you data at a later date, without anyone else being able to see it.

Q. Are there any known bugs?

A. Sequences longer than 800 residues will are not allowed. This is because most protein domains are less than 800 residues in length, and so there is no point in predicting longer sequences. If you really think that a domain is likely to be this long, chop the sequence into 800 residue chunks and submit them separately.

A. PSI-BLAST seems to truncate some query sequences it gets, and so sometimes you may get a shorter protein back than you were expecting. This is a PSI-BLAST problem, and not something we can work around.

If you see any other strange behaviour don't hesitate to get in touch with the service, and we'll try our best to repair or explain the problem. Alternatively, the mailing list may contain information of use.

Q. Where did Jpred come from? A. The server is the result of a large-scale comparative analysis of secondary structure prediction algorithms (Cuff & Barton, 1999). We developed flexible software to standardise the input and output requirements of the 6 prediction algorithms.

During the analysis, we also found that we needed a quick and effective way to build automatic alignments that would still give good predictions. This led to the development of the automatic method for building multiple sequence alignments and redundancy filters that are used in Jpred.

These methods were then replaced by a neural network program called Jnet (Cuff & Barton, 2000), which achieves the same accuracy as the combined prediction programs, and this is what is currently used.

Q. What are do the characters 'E', 'H' and '-' represent in the prediction?

A. They represent extended, helical and other types of structure respectively.

Q. What do the colours mean in the Postscript/PDF output from Jpred?

A. The postscript is generated by Alscript. The code to carry out the colouring and format the alignment as Alscript input data was written by Rob Russell and Rich Copley. The columns are calculated by allowing up to 2 invariant positions within each column. The colours correspond to:

Colour Meaning
Pale blue hydrophobic
Pale green conserved polar
Small letters small residues
Red fully conserved
Blue text Proline
Red text Histidine
Boxed Aliphatic (L, I or V)
Yellow Cystine

The 'props' line indicate which residue or property is conserved using the following key:

Character Property
p conserved polar
h conserved hydrophobic
+ conserved positive charge
s conserved small residues