For each of the 480 training-set sequences, a multiple sequence alignment was constructed. For comparison, both BLAST and PSIBLAST were used to search the SWALL [43] non redundant protein sequence database, with a p-value cutoff of 0.0001. For PSIBLAST, 3 iterations were applied to search the sequence database. For each of the sequences found, the method described previously [34] was applied to generate multiple sequence alignments. To compare the effect of different multiple sequence alignment methods, AMPS [42] and CLUSTALW [44] were both used. CLUSTALW [44] was executed with default parameters while for AMPS[42], a BLOSUM62 matrix, and gap penalty of 10 were applied.
The alignments were represented as profiles for input to the neural network and the profiles were scored in three ways:
Figure 2 summarises an attempt to improve the alignments obtained from PSIBLAST by post-processing the result of the PSIBLAST search. As shown in Figure 2 full length sequences were taken from the PSIBLAST search, the alignment was then constructed by making successive global alignments to the profile by adding sequences in the order determined by the p-value scores from the initial PSIBLAST sequence search. At each iteration the ends of the alignment were trimmed, to force the global alignment method to represent the query sequence.
In addition to the method summarised in Figure 2 each of the PSIBLAST alignments were also represented by the profiles in the PSIBLAST report file. Two profiles were extracted, the simple frequency counts (denoted in the PSIBLAST report as position characters, multiplied by 10 and rounded), and that denoted as the position-based scoring matrix.