This defines the order in which a tree based multiple alignment is performed. The format is FIXED and very simple. The aim of tree based alignment at each stage is to perform a pairwise alignment on two clusters of sequences that have already been aligned, or are individual sequences. The tree_file defines which sequences belong to each cluster at each stage of the alignment.
For example, we may have 5 sequences 1,2,3,4,5. At each stage we are aligning two clusters of sequences A and B. The tree file might look like this.
The tree (my comments - not in the file) 1 number of seqs in cluster A 3 seq 3 1 number of seqs in cluster B 4 seq 4 (A and B are now aligned) 1 Number of seqs in next cluster A 1 seq 1 1 Number of seqs in next cluster B 5 seq 5 (new A and B are now aligned) 2 Number of seqs in next cluster A 3 4 seqs 3 and 4 2 1 5 seqs 1 and 5 ( now aliged to 3 and 4) 1 . 2 . 4 . 3 4 1 5 Finally seq 2 is aligned to 3,4,1,5.
This tree can be gerated by program ORDER, or might be input from a more sophisticated clustering program.
Format(1x,20i5) Example: globin_pairs.tree
Note. The sequence numbers identified in the tree_file point to the sequences as stored in the order internally by MULTALIGN. Normally an order_file would be used in conjuction with the tree_file so that similar sequences are clustered together on output. See the documentation on program ORDER for details of producing compatible tree and order files.