Next: Cluster analysis with Up: No Title Previous: Scanning with an

Comparing all pairs of sequences

This feature is not fully developed, but it is useable (and useful!). For pairwise comparisons, the .seq file MUST NOT contain any non-amino acid characters or spaces in the sequence part of the file.

Having checked this, you must first create a copy of the .seq file (call this .sec). The .sec file could contain secondary structure definitions for the protein, or any other characters that you want to align with the sequences. Check that your SCANPS defaults file has the value of MAX_NSEQ set greater than the number of sequences in your sequence file, then for example, for the file test.seq type:



scanps -stest.seq -ttest.sec -T

This gives the score for each pair comparison to stdout. You could redirect the output to a file.



553 HAJUA HAHOD
543 HAJUA HAHOK
475 HAJUA HAKOAW
481 HAJUA HAJSA
461 HAJUA HAFEDR
261 HAJUA HBOTE
646 HAHOD HAHOK
490 HAHOD HAKOAW
502 HAHOD HAJSA
471 HAHOD HAFEDR
306 HAHOD HBOTE
484 HAHOK HAKOAW
490 HAHOK HAJSA
461 HAHOK HAFEDR
292 HAHOK HBOTE
587 HAKOAW HAJSA
433 HAKOAW HAFEDR
269 HAKOAW HBOTE
439 HAJSA HAFEDR
274 HAJSA HBOTE
307 HAFEDR HBOTE

Each line of the output shows the score and a the corresponding pair of ID codes.

Pairwise comparisons may also be performed using the NALL method. Currently, this only works if you also request probability scores. For example:



scanps -stest.seq -ttest.sec -T -a1 -F1

gives ...



7.2765e-88 HAJUA HAHOD
1.0265e-85 HAJUA HAHOK
2.5983e-71 HAJUA HAKOAW
1.4448e-72 HAJUA HAJSA
2.1362e-68 HAJUA HAFEDR
2.0553e-29 HAJUA HBOTE
3.4899e-108 HAHOD HAHOK
1.868e-74 HAHOD HAKOAW
5.5253e-77 HAHOD HAJSA
1.7759e-70 HAHOD HAFEDR
1.2143e-37 HAHOD HBOTE
3.3974e-73 HAHOK HAKOAW
1.868e-74 HAHOK HAJSA
2.1362e-68 HAHOK HAFEDR
4.8684e-35 HAHOK HBOTE
3.1609e-95 HAKOAW HAJSA
1.2637e-62 HAKOAW HAFEDR
7.5967e-31 HAKOAW HBOTE
7.4395e-64 HAJSA HAFEDR
9.5138e-32 HAJSA HBOTE
7.8891e-38 HAFEDR HBOTE

You can also get the alignments corresponding to these pair comparisons by adding the -v command line argument.



scanps -stest.seq -ttest.sec -T -a1 -F1 -v

The output of this comparison will include the characters from the .sec file aligned along with the sequences.

The final option in pairwise mode is to output the scores in a form that can be analysed by the cluster analysis program ``oc''. To produce suitable output, simply add a -X to the command line.

For example:



scanps -stest.seq -ttest.sec -T -X

for raw scores or:

scanps -stest.seq -ttest.sec -T -a1 -E -F1 -X

for probabilities

The -E option is necessary to prevent scanps from writing all local alignment scores. For cluster analysis you only need the top scoring alignment.


gjb@bioch.ox.ac.uk