Six different secondary structure prediction methods were run on the alignments, each is briefly described here.
PHD is a 3 level artificial neural network. The different levels consist of a sequence to secondary structure network, with a window of 13 amino acids, a structure to structure network, with a window of 17 amino acids, and finally an arithmetic average over a number of independently trained networks. The structure to structure network, improves prediction of the final length distributions of secondary structures. The arithmetic average has the effect of smoothing random noise that is seen in all artificial neural networks. The method also applies balanced training, percentage amino acid composition and conservation, sequence length, and insertions and deletions (indels) to enhance prediction accuracy.
DSC applies GOR residue attributes, with the addition of hydrophobicity and amino acid position, which are combined with information from the multiple sequence alignment (conservation and indels). Optimal weights are deduced by linear discrimination, with filtering applied to remove erroneous predictions. This method has an advantage in that the prediction method is both implicit and effective.
NNSSP is a scored nearest neighbour method. It is based upon the environmental scoring scheme proposed by Bowie. The NNSSP method extends the Bowie method by considering N and C terminal positions of -helices and -strands. The size of the database used for scanning is also altered to reflect similarity to the query sequence, reducing computation time, and improving the final accuracy. PREDATOR is slightly different to other methods discussed here, in that it uses an internal pairwise alignment method, rather than reading a global multiple sequence alignment. The SIM software is applied to produce local alignments between sequence pairs. The original PREDATOR algorithm is then used to predict the secondary structure segments. This algorithm also includes propensities for hydrogen bonding characteristics of -sheets. Seven different secondary structure propensities are generated for the query sequence, with a nearest neighbour implementation applied to calculate propensities for -helix, -strand and coil.
ZPRED is also based on the GOR method, but with the addition of weights from calculated conservation values. The conservation value is calculated from amino acid properties as proposed by Taylor. The ZPRED method improved the accuracy of the GOR method by noting that insertions and high sequence variability tend to occur in loop regions.
MULPRED (Barton, unpublished) is a combination of single sequence methods that are combined to give a prediction profile, from which a consensus is taken. The methods within MULPRED are Lim, GOR, Chou-Fasman, Rose and Wilmot & Thornton turn prediction methods.