[Jalview-discuss] Alignment Annotation File - SEQUENCE_REF format

Jim Procter jprocter at compbio.dundee.ac.uk
Wed Jan 15 11:33:33 GMT 2014


Hi Steffen - thanks for your mail!

Steffen Schmidt wrote:
> I your manual about annotation files you describe:
> http://www.jalview.org/help/oldhelp/html/features/annotationsFormat.html
>
> ...
> You can associate an annotation with a sequence by preceding its 
> definition with the line:
>
> SEQUENCE_REFseq_name[startIndex]
> ...
>
> I wonder what the exact format of seq_name is:
>
> Image I get a fasta file like this:
>> db|183474|my_pet_protein
>
> Do I have to put in the full id or are other variations ok?
>
> SEQUENCE_REFdb|183474|my_pet_protein1
> SEQUENCE_REF1834741
> SEQUENCE_REFmy_pet_protein1
>
> Background: Since most often accession numbers don’t tell you the 
> species name, I would like to add the species info to the sequence 
> name to quickly spot the organism. e.g. 
> my_pet_protein|Escherichia_coli. But then, I would need to change the 
> annotation file seq_name if I can’t use a shorthand…
Jalview's annotation file format works on exact string matches to 
associate tracks with a sequence. We made that decision because the 
format was designed to be a way for other programs to generate data for 
import in to Jalview.

It is reasonably straightforward to allow substring based matching like 
you suggest - Jalview does that for Newick tree import already, so the 
function is available - so I can create a patch right away, if you like. 
I've created a new feature request for this at 
http://issues.jalview.org/browse/JAL-1427

However, there might be some backwards compatibility problems in the 
case where an alignment includes different sequences where one 
sequence's ID is wholly contained in another, so I don't think I can 
make substring matching the default behaviour when parsing the 
SEQUENCE_REF tag in annotation files. Any thoughts ?

Jim.



More information about the Jalview-discuss mailing list