This defines the order in which a tree based multiple alignment is performed. The format is FIXED and very simple. The aim of tree based alignment at each stage is to perform a pairwise alignment on two clusters of sequences that have already been aligned, or are individual sequences. The tree_file defines which sequences belong to each cluster at each stage of the alignment.
For example, we may have 5 sequences 1,2,3,4,5. At each stage we are aligning two clusters of sequences A and B. The tree file might look like this.
The tree (my comments - not in the file)
1 number of seqs in cluster A
3 seq 3
1 number of seqs in cluster B
4 seq 4 (A and B are now aligned)
1 Number of seqs in next cluster A
1 seq 1
1 Number of seqs in next cluster B
5 seq 5 (new A and B are now aligned)
2 Number of seqs in next cluster A
3 4 seqs 3 and 4
2
1 5 seqs 1 and 5 ( now aliged to 3 and 4)
1 .
2 .
4 .
3 4 1 5 Finally seq 2 is aligned to 3,4,1,5.
This tree can be gerated by program ORDER, or might be input from a more sophisticated clustering program.
Format(1x,20i5) Example: globin_pairs.tree
Note. The sequence numbers identified in the tree_file point to the sequences as stored in the order internally by MULTALIGN. Normally an order_file would be used in conjuction with the tree_file so that similar sequences are clustered together on output. See the documentation on program ORDER for details of producing compatible tree and order files.