Iterative Searching

Next: Getting more from SCANPS Up: Running SCANPS Previous: SCANS with Affine Gaps

Iterative Searching

Iterative searching will normally be able to find more remote similarities to the query sequence than a single sequence search. This is illustrated in an example below.

Iterative searching is enabled by adding the -niter command on the command line. For example:

scanps -s hahu.fa -mode 200 -niter 5 -bdb sprot > niter_test.log

will run SCANPS with 5 iterations. Examples of scans with 5 iterations, that use the human alpha haemoglobin sequence as a query are shown in the files with ``niter5'' in their name.

An iterative search starts just the same as a non-iterative search, the query sequence is compared to the database and the score list, pairwise and multiple alignment outputs are reported. The multiple alignment is then used to create a query ``profile'' that contains information about the types of amino acid seen at each position in the alignment. This profile is then searched against the database, a score list, pairwise and multiple alignments are output and the process is then repeated. The iterations will stop either when the number of iterations has been reached, or if two successive iterations find exactly the same sequences.

The key parameter that controls iterative searching is probcut2. This controls which sequences from a search will be included in the profile with which the next search is done

The file: hahu_202_probcut30_niter5.log shows an iterative search with human alpha haemoglobin. This file includes pairwise output, but normally one would switch this off with -aptt 0 -max_aout 0 on the command line in order to minimise the output file. In the scan, probcut2 was set to 0.1 by default and in Iteration 0, there are 695 sequences that score above the probcut2 value:

 deleted lines

 674  144 22.25  2.4e-05  MYG_PHOSI        (P30562) Myoglobin            
 675  143 22.05  2.94e-05 MYG_MOUSE        (P04247) Myoglobin            
 676  142 21.85  3.6e-05  MYG_ELEMA        (P02186) Myoglobin            
 677  141 21.76  3.94e-05 GLB3_MYXGL       (P02209) Globin III           
 678  140 21.53  4.93e-05 GLBA_SCAIN       (P14821) Globin II, A chain (H
 679  135 20.56  0.00013  GLP2_GLYDI       (P21659) Globin, polymeric com
 680  136 20.53  0.000135 GLBC_CAUAR       (P80018) Globin C, coelomic   
 681  117 19.79  0.000281 HBE_MACEU        (P81042) Hemoglobin epsilon ch
 682  129 19.35  0.00044  GLB_NASMU        (P31331) Globin (Myoglobin)   
 683  128 19.06  0.000586 GLB_CERRH        (P02215) Globin (Myoglobin)   
 684  124 18.33  0.00121  GLP1_GLYDI       (P23216) Globin, major polymer
 685  121 17.72  0.00223  GLB_BUSCA        (P02214) Globin (Myoglobin)   
 686  120 17.69  0.00231  Y211_AQUAE       (O66586) Hypothetical globin-l
 687  121 17.66  0.00237  GLBA_ANATR       (P14395) Globin I alpha chain 
 688  100 16.52  0.00742  HBB_PAPAN        (Q9TSP1) Hemoglobin beta chain
 689  100 16.52  0.00742  HBB_COLGU        (Q9TT33) Hemoglobin beta chain
 690   98 15.70  0.0168   HBO_MACEU        (P81041) Hemoglobin omega chai
 691  111 15.29  0.0254   GLB3_LUMTE       (P11069) Globin III, extracell
 692  107 14.79  0.0419   GLBP_CHITH       (P11582) Globin CTT-E/E' precu
 693  106 14.73  0.0443   GLBB_RIFPA       (P80592) Giant hemoglobins B c
 694   90 14.47  0.0578   HBB_PONPY        (Q9TT34) Hemoglobin beta chain
 695  105 14.40  0.0617   GLBB_SCAIN       (P14822) Globin II, B chain (H
 696  102 13.64  0.132    GLBY_CHITP       (P18968) Globin CTT-Y precurso
 697   99 13.31  0.184    GLB_APLJU        (P14393) Globin (Myoglobin)   

 more deleted lines

the next iteration (Iteration 1) reports 777 sequences to be above the probcut2 threshold. At the end of the score list, is a report on which new sequences are found, and which (if any) sequences now fall below the threshold. As shown here:

# End_Scores: ------------------------------------------------------
#
# Reported in iteration 0 but below the threshold in this iteration (1)
#
# Reported in this iteration (1) but not in the previous iteration (0)
#
 689  133 30.75  4.9e-09  GLBB_ANATR (P04251) Globin I beta chain                      
 690  128 29.35  1.99e-08 GLB1_SCAIN (P02213) Globin I (Dimeric hemoglobin) (HBI)      
 692  127 29.02  2.75e-08 GLB4_LUMTE (P13579) Globin IV, extracellular (Erythrocruori  
 693  124 28.21  6.18e-08 GLB_APLKU (P02211) Globin (Myoglobin)                       
 694  124 28.20  6.28e-08 GLB1_ANABR (P02212) Globin I                                 
 695  122 27.62  1.12e-07 GLB2_ANATR (P14394) Globin IIB                               
 696  122 27.62  1.12e-07 GLB1_ARTSX (P19363) Globin E1, extracellular                 
 698  120 27.01  2.06e-07 GLBM_ANATR (P25165) Globin, minor                            
 699  119 26.78  2.6e-07  GLB_APLJU (P14393) Globin (Myoglobin)                       
 700  119 26.75  2.69e-07 GLB3_TYLHE (P13578) Globin IIB, extracellular (Erythrocruor  
 701  119 26.64  2.97e-07 GLBH_CHITP (P29242) Globin CTT-VIIB-7 precursor              
 702  129 26.51  3.41e-07 HMPA_ALCEU (P39662) Flavohemoprotein (Hemoglobin-like prote  
 703  118 26.44  3.63e-07 GLB2_LUCPE (P41261) Hemoglobin II (Hb II)                    
 704  118 26.36  3.96e-07 GLBH_CHITH (P12550) Globin CTT-VIIB-7 precursor              
 706  117 26.19  4.7e-07  GLB_DOLAU (P09965) Globin (Myoglobin)                       
 707  116 25.77  7.09e-07 GLBV_CHITP (P29243) Globin CTT-V precursor (HBV)             
 708  114 25.32  1.12e-06 GLP3_GLYDI (P21660) Globin, polymeric component P3           
 711  111 24.46  2.64e-06 GLB_APLLI (P02210) Globin (Myoglobin)                       
 712  111 24.34  2.99e-06 GLBZ_CHITH (Q23761) Globin CTT-Z precursor (HBZ)             
 714  109 23.76  5.31e-06 GLBZ_CHITP (P29245) Globin CTT-Z precursor (HBZ)             
 715  118 23.64  6.02e-06 HMPA_VIBCH (Q9KMY3) Flavohemoprotein (Hemoglobin-like prote  
 716  118 23.60  6.27e-06 HMPA_BACSU (P49852) Flavohemoprotein (Hemoglobin-like prote  

 lines deleted...

The next iteration (2) finds 801 sequences above the probcut2 threshold, the third iteration pushes this up to 802, but Iteration 4 does not change the output.

Next: Getting more from SCANPS Up: Running SCANPS Previous: SCANS with Affine Gaps

Geoff Barton (GJB) 2002-07-23