Next: Generating the local Up: Simple Scanning - Previous: Extracting the sequences

Generating the local alignments

We now have the 15 top scoring sequences in a file called sh2.top15.seq. We can re-run scanps on this file to generate the alignments.

Since scanps is able to find many local alignments between the query sequence and the database it is necessary to set a cutoff score otherwise you will output thousands of insignificant alignments in addition to those that are useful. A suitable value for the cutoff score will depend on the search you have completed, but values of 80-100 make a good starting point.

In order to illustrate the NALL alignment feature of scanps I have added the sequence ``S01966 GTPase-activating protein - bovine'' in place of the 15th sequence (TVFVS1).

For example we can now type:



scanps -ssh2.seq -a1 -c90 -d < sh2.top15.seq > sh2.top15.alig

We are using the sh2.seq sequence to scan the sh2.top15.seq file. The -a1 means ``generate alignments'', the -c90 sets the cutoff score to 90 and the -d means read the database from standard input - in this example, the file ``sh2.top15.seq''.

The output of this command looks like this:


---------------------------
Comparison with: TVHUSC protein-tyrosine kinase (EC 2.7.1.112) src - human    538 Residues
Raw Score: 497.0 TVHUSC Allen: 97 Score/Allen: 5.123711
      **************************************************
    1 WYFGKITRRESERLLLNAENPRGTFLVRESETTKGAYCLSVSDFDNAKGL    50
  151 WYFGKITRRESERLLLNAENPRGTFLVRESETTKGAYCLSVSDFDNAKGL   200

      ***********************************************
   51 NVKHYKIRKLDSGGFYITSRTQFNSLQQLVAYYSKHADGLCHRLTTV    97
  201 NVKHYKIRKLDSGGFYITSRTQFNSLQQLVAYYSKHADGLCHRLTTV   247
---------------------------


         13 BORING ALIGNMENTS DELETED


---------------------------
Comparison with: S01966 GTPase-activating protein - bovine   1046 Residues
Raw Score: 171.0 S01966 Allen: 90 Score/Allen: 1.900000
      ** **. *  .*  * .* .. *..*.***.   *.. . .* * .   .
    1 WYFGKITRRESERLLLNAENPRGTFLVRESETTKGAYCLSVSDFDNAKGL    50
  178 WYHGKLDRTIAEERLRQAGKS GSYLIRESDRRPGSF V LS FLSQTNV   223

       *.*..*  .  *..** .* .*.** .*..*** *   *
   51 NVKHYKIRKLDSGGFYITSRTQFNSLQQLVAYYSKHADGL    90
  224  VNHFRIIAM CGDYYIGGR RFSSLSDLIGYYS HVSCL   259

Raw Score: 130.0 S01966 Allen: 86 Score/Allen: 1.511628
      *. ***...*.  **.   ..  .**** *. * * * *    *  .   
    1 WYFGKITRRESERLLLNAENPRGTFLVRESETTKGAYCLSVSDFDNAKGL    50
  348 WFHGKISKQEAYNLLMTVGQA CSFLVRPSDNTPGDYSL Y  F RTSE    391

      *....**    .  * . .*  .**. ...  * *.
   51 NVKHYKIRKLDSGGFYITSRTQFNSLQQLVAYYSKH    86
  392 NIQRFKICPTPNNQFMMGGRY YNSIGDIIDHYRKE   426

Each local alignment is shown with the Raw Score for the alignment, the length of the alignment and the score/length (this value is not actually very useful and will be removed in future versions of the program).

Stars highlight identities and dots show positions that give positive scores in the pair score matrix that is being used. The match with S01966 illustrates the ability of SCANPS to find multiple hits to the same sequence. Lowering the cutoff score would find more alignments, but they would be unlikely to be significant.


gjb@bioch.ox.ac.uk