Processing the results of the NALL scan

Next: Further analysis of Up: Advanced Scanning - Previous: Running sortsco to

Processing the results of the NALL scan

The result of the NALL scan once sorted looks like this. I have truncated the output to 80 characters and removed much of the file for brevity. See the file sh2.all.scan.sorted for the full output.



 497   97 1.1e-82  0 1    1   97  151  247 TVHUSC   protein-tyrosine kinase 
 497   97 1.1e-82  0 1    1   97  156  252 A43610   protein-tyrosine kinase 
 492   97 1.4e-81  0 1    1   97  148  244 TVFVS2   protein-tyrosine kinase 
 492   97 1.4e-81  0 1    1   97  148  244 TVFVPR   protein-tyrosine kinase 
 492   97 1.4e-81  0 1    1   97  148  244 TVFV60   protein-tyrosine kinase 
                                .
                                .
                                .
**  171   90 8.6e-18  0 2    1   90  178  259 S01966   GTPase-activating pro
                                .
                                .
                                .

 144   90 2.2e-13  0 2    1   87  110  198 A42031   hematopoietic cell phosp
 144   94 4.2e-13  0 1    1   94  127  213 TVHUA    protein-tyrosine kinase 
 140   93 1.7e-12  1 2    2   91   11   96 A40802   protein-tyrosine kinase 
 138   90 1.9e-12  0 2    1   87  110  198 A38189   tyrosine phosphatase=hSH
 138   90 1.9e-12  0 2    1   87  112  200 S17234   Protein-tyrosine-phospha
 138   90 1.9e-12  0 2    1   87  112  200 S20837   Protein-tyrosine-phospha
 139   91 2.5e-12  0 2    1   87  112  201 S27398   protein-tyrosine phospha
 139   91 2.5e-12  0 2    1   87  112  201 A46209   SH2-containing phosphoty
 139   91 2.5e-12  0 2    1   87  112  201 S31767   protein-tyrosine phospha
 139   91 2.5e-12  0 2    1   87  112  201 A47244   SH-PTP2=SH2-containing p
 139   91 2.5e-12  0 2    1   87  112  201 A46210   phosphotyrosine phosphat
 136   89 3.9e-12  0 1    1   89  271  352 TVFFA    protein-tyrosine kinase 
 139   99 4.5e-12  0 1    1   97  603  693 TVHUVV   transforming protein (va
 136   92 7.1e-12  0 2    1   91  111  195 A43254   protein tyrosine phospha
 135   91 1.0e-11  1 2    1   90    6   88 S27398   protein-tyrosine phospha
 135   91 1.0e-11  1 2    1   90    6   88 A47244   SH-PTP2=SH2-containing p
 135   91 1.0e-11  1 2    1   90    6   88 A46209   SH2-containing phosphoty
 135   91 1.0e-11  1 2    1   90    6   88 A46210   phosphotyrosine phosphat
 135   91 1.0e-11  1 2    1   90    6   88 S31767   protein-tyrosine phospha
 116   43 1.3e-11  0 1    1   43   13   53 B45022   CRK-I - human
 116   43 1.3e-11  0 1    1   43   13   53 A45022   CRK-II - human
 134   96 2.5e-11  0 1    1   91  434  524 C46243   GRB-7=epidermal growth f

**  130   86 3.1e-11  1 2    1   86  348  426 S01966   GTPase-activating pro

 113   43 3.6e-11  0 1    1   43   44   84 A46243   GRB-3=epidermal growth f
 129   86 4.4e-11  1 2    1   86  174  252 B40121   GTPase-activating protei
 129   86 4.4e-11  1 2    1   86  351  429 A40121   GTPase-activating protei
 130   98 9.9e-11  1 2    1   97    4   93 A42031   hematopoietic cell phosp
 128   98 1.9e-10  1 2    1   97    6   95 S20837   Protein-tyrosine-phospha
 128   98 1.9e-10  1 2    1   97    6   95 S17234   Protein-tyrosine-phospha
 124   93 4.2e-10  1 2    2   92   11   97 A44266   ZAP-70=70 kda protein-ty
 122   89 4.7e-10  1 2    1   88    6   86 A43254   protein tyrosine phospha
 124   98 7.3e-10  1 2    1   97    4   93 A38189   tyrosine phosphatase=hSH
                                .
                                .
                                .

Two lines are shown with ``**'' at the start. These stars do not appear in the output file but are here to draw your attention to the lines for discussion below.

There are 11 columns of information in this file.

Column 1:

This is the raw score for the local alignment. i.e. the sum of the pairscore matrix values for the alignment, less the gap penalty times the number of gaps.

Column 2:

This is the length of the local alignment. Simply the length including the gaps.

Column 3:

The probability calculated using the length dependent statistics. The output is sorted into increasing probability order.

Column 4:

The rank of the alignment in the comparison with this database sequence. This number is 0 if this is the highest scoring alignment with the database sequence, 1 if the second highest, 2 if the third and so on.

Column 5:

This shows how many local alignments are found with this database sequence. For example, if Column 5/6 show values of ``0 7'', then this line is giving statistics on the highest ranked alignment out of 7 found. ``2 7'' would be the third ranked alignment with the database sequence.

Columns 6 and 7

These indicate the starting and ending residues from the query sequence of the fragment that is aligned.

Columns 8 and 9

These show the staring and ending residues of the section of database sequence that is aligned to the query.

Column 10

The identifier code for the database sequence.

Column 11

The title line for the database sequence. This is not truncated.

The first line highlighted by ``**'' shows a score between the query and the database sequence S01966 of 171 for a length of 90 residues. The probability is 8.6e-18 and this is the highest scoring alignment of two that are found with the database protein. The alignment is from residue 1 to 90 of the query and 178 to 259 of the database sequence.

If we look further down the file, we can see the second match to S01966. This scores 130 with a length of 86, probability of 3.1e-11. The region matched is 348-426.

Next: Further analysis of Up: Advanced Scanning - Previous: Running sortsco to

gjb@bioch.ox.ac.uk