seq_file

Next: matrix_file Up: MULTALIGN file formats Previous: MULTALIGN file formats

seq_file

This file contains one or more protein sequence in ONE LETTER CODE. The format follows that of the NBRF (PIR) databases .SEQ file as follows.


>P1;IDENT
this is the title line
a s dhjAALLDKHGDK(D,K,L).P
L
WWPGS*
>P1;IDENT2
this is another title line
a R G S DF SDSDDDSSDAKKKFG
*
etc...

Each sequence entry starts with a '>'. This is followed by an identification code (max 10 characters). The next record is a title line (max 500 characters). The following record(s) give the one letter code sequence. Maximum of 500 characters to a line. The end of the sequence is identified by a '*' character. Note that lowercase letters are put into uppercase, and only alphabetic characters are read on the sequence lines, any alphabetic characters that are not standard one_lettercodes are translated as 'X' (unknown).

Minimum requirements are a record with '>', a title line and one residue followed by a '*'.

Examples: globin.seq, myoglobins.seq. Source: PSQ 'COPY' command, or typed in manually.

gjb@bioch.ox.ac.uk