Next: Time considerations Up: MULTALIGN - Instructions Previous: MULTALIGN - Instructions

Pairwise Alignment

An example command file for this mode is globin_pairs.com. This is shown below:


Commands                                      Explanation number
--------                                      ------------------

output_file=globin_pairs.out                  1.
mode=pairwise                                 2.
matrix_file=ampsdir:md.mat                    3.
pairwise_random=100,100,1                     4.
gap_penalty=8.0                               5.
constant = 8                                  6.
seq_file=globin.seq                           7.

  1. Command to amult. set the file for output of results to 'globin_pairs.out' This command must ALWAYS be the first. Note: The output file should NOT be set to the log file name.

  2. Specify that pairwise mode is to be used. If this command is not included, the program defaults to mode=multiple.

  3. Define the matrix file to be used in the comparisons. This file is the Dayhoff mutation data matrix or similar file containing pairscore values for each amino acid pair.

  4. Specify that 100 randomizations of each sequence pair are to be performed in order to estimate the statistical significance of the alignment obtained.

  5. define a gap penalty of 8.0

  6. define a constant of 8 to be added to the matrix.

  7. specify the file containing the sequences to be aligned pairwise.

This command file causes all pairwise comparisons to be performed on the sequences in the file 'globin.seq'. In other words, for the 7 sequences sequence 1 is aligned with 2, 1 with 3 and so on. For N sequences there are N*(N-1)/2 comparisons performed.

For each sequence pair (eg. 1 and 2), a full Needleman and Wunch sequence comparison is performed. Then the sequences are shuffled and recompared ( in this case 100 times) in order to find the expected distribution of scores that would be obtained if the sequences were unrelated but have the same length and composition as 1 and 2. Various statistics on the comparisons are then output to the specified output_file (globin_pairs.out).

The output file from this sequence alignment run contains the following information:

A banner describing the program name, source and limitations of use. Information on the maximum length and number of sequences allowed. Information on the commands specified to the program, files etc. A list of the sequences defined in the sequence file. Finally, a set of numerical results of the run in a series of fields as follows.



Field                                         Description
-----                                         -----------
  1 and 2           number describing the sequences aligned on this row.
  3 and 4           The lengths of the sequences described in 1 and 2.
  5                 The match score obtained for the comparison of the two
                    sequences.
  6                 The number of internal gaps in the alignment (overhangs
                    at the ends are not counted)
  7                 The number of positions at which two amino acids are
                    aligned.
  8                 The number of positions at which identical amino acids are 
                    aligned
  9                 Percentage identity (8/7)
 10                 Normalised Alignment score - The match score divided by
                    the number of aligned positions * 100. (5/7)*100.
 11                 Alternative Normalised Alignment score - The match score 
                    minus the number of gaps times the gap penalty all divided 
                    into the number of aligned positions.
                    (7/((5-6)*gap_penalty))*100.
 12                 Number of randomizations performed 
 13                 mean score for the randomizations
 14                 standard deviation of the random scores
 15                 Significance score for the alignment.
                    Given by the mean random score minus the match score all
                    all divided by the standard deviation.
                    (13-5)/14.
 16                 Comparison number.

This output file is in the correct format for input by the program ORDER. Clearly, however the actual alignments have not been output. If the alignments are required you must include the command line(s)

print_horizontal= (for horizontal format)

print_vertical= (for vertical format)

Note that a file containing alignments cannot be read directly by ORDER. (the alignments would first have to be deleted using a text editor).



Next: Time considerations Up: MULTALIGN - Instructions Previous: MULTALIGN - Instructions


gjb@bioch.ox.ac.uk