Examination of the top of the sorted output file shows the highest scoring hits to the query sequence. You must now inspect this file and decide on how many of the top scoring sequences you would like to examine by alignment. If you use the default parameters (PAM250 and penalty of 8) then scores below 90 are often uninteresting. However, this is not an absolute rule and each scan will require careful scrutiny of the score list. It is usually better to include a lot of sequences at this stage since ``interesting'' matches may emerge even for low scores.
There are no programs supplied with scanps to help you look at the ``.sorted'' file. You must use the Unix tools ``more'' or ``head'' to inspect and extract the interesting parts of the file. Or you could use your favourite text editor (vi, emacs, jot, pico etc).
In order to keep this guide to manageable length, I will illustrate the following sections using only the top 15 scoring sequences. In practice, the top 150 or so in this scan would be worth looking at. To get the top 15 sequence scores into a file you could type:
head -15 < sh2.sorted > sh2.top15
This saves the top 15 score/ID pairs in a file called sh2.top15. Here it is:
497 A43610 497 TVHUSC 492 TVCHS 492 TVFV60 492 TVFVPR 492 TVFVS2 490 TVFVS1 488 TVFVMT 474 B34104 473 A34104 458 OKFVYR 458 S15582 458 S20808 456 TVFVR 443 S20676
Unless you know the identifier codes, this is pretty unhelpful. If you have built an indexed database, then it is easy to get the titles of these sequences back using the program ``sortsco'', see Section 6.3.3. For now, we can extract the sequences that correspond to these protein identifiers using the program ``select''.