We now have a pattern and a file containing sequences with which it gives a good score. In order to see the alignments of the pattern and each sequence, we must re-run the MULTALIGN program with the print_horizontal (or _vertical) command. This produces a VERY verbose output.
For example, the command file bash_ge_4_scan1_top20.com:
output_file=bash_ge_4_scan1_top20.alig mode = scan block_file/pattern=bash_ge_4.bloc,1 matrix_file=md.mat database=bash_ge_4_scan1.top20seq print_horizontal=
produces the output shown in file bash_ge_4_scan1_top20.alig.
A far more compact form of output may be obtained by using the PRINT_HORIZONTAL/PATTERN=fname command. This produces a file in a special format for the program PATT.
output_file=bash_ge_4_scan1_top20.out mode = scan block_file/pattern=bash_ge_4.bloc,1 matrix_file=md.mat database=bash_ge_4_scan1.top20seq print_horizontal/pattern=bash_ge_4_scan1_top20.patt
We can now run the program PATT on the resulting file:
------------------------ Program P A T T E R N ------------------------ Processes FSCAN print_horizontal/pattern output Author: Geoff Barton Maximum Pattern Length: 2000 Maximum Pattern Hits + Pattern to Display: 2000 Enter Pattern file: bash_ge_4_scan1_top20.patt Enter Output file: bash_ge_4_scan1_top20.pattout Enter page width for horizontal output (>50, Def:132): Output width: 132 Reading Pattern Description ---Initializing ---Done Reading Pattern Alignment 153.14 >HZPG 141 1 1 Hemoglobin zeta chain - Pig 152.57 >HZCZ 142 2 1 Hemoglobin zeta-1 chain - Chimpanzee 152.57 >HZHU 141 3 1 Hemoglobin zeta chain - Human 151.43 >HACHPE 141 4 1 Hemoglobin pi' chain - Chicken 149.57 >HADKP 141 5 1 Hemoglobin pi' chain - Muscovy duck 149.00 >HEMSY2 146 6 1 Hemoglobin epsilon-y2 chain - Mouse 148.00 >HBRB3 147 7 1 Hemoglobin gamma (beta-3) chain - Rabbit 147.29 >HBPY 146 8 1 Hemoglobin beta chain - Pigeon 147.14 >HBFG3T 146 9 1 Hemoglobin beta chain - Bullfrog tadpole 147.00 >HBMSH0 147 10 1 Hemoglobin beta-h0 chain - Mouse 146.86 >HBTG 146 11 1 Hemoglobin beta chain - Australian echidna 146.86 >HBTTP 146 12 1 Hemoglobin beta chain - Western painted turt 146.71 >HGMQJ 146 13 1 Hemoglobin gamma chain - Japanese macaque 146.71 >HGMQR 146 14 1 Hemoglobin gamma chain - Rhesus macaque 146.71 >HGBAY 146 15 1 Hemoglobin gamma chain - Yellow baboon 146.71 >HGMQP 146 16 1 Hemoglobin gamma chain - Pig-tailed macaque 146.71 >HGHUA 146 17 1 Hemoglobin gamma chains - Human and chimpanz 146.71 >HGMKS 146 18 1 Hemoglobin gamma chain - Spider monkey 146.57 >HBOR 146 19 1 Hemoglobin beta chain - Duckbill platypus 146.57 >HEGT1 147 20 1 Hemoglobin epsilon-I chain - Goat Sort the scores? [Y] Formatting Alignments Adding Flexible Gap Details Writing Vertical Format Alignment Writing Horizontal Format Alignment
The only options in this program are the output width for the results and the option to sort the scores that are displayed. The sorting option is necessary if multiple patterns per sequence are calculated using the PATTERN_LEVEL=N option described below.
The output file from the program PATT contains vertical and horizontal format multiple alignments of the pattern with all the sequences in the list. This is a considerable compression of data over the print_horizontal= format (file bash_ge_4_scan1_top20.alig). The vertical format output may be used to define a further pattern for database scanning.