[Jalview-discuss] Alignment Annotation File - SEQUENCE_REF format

Jim Procter jprocter at compbio.dundee.ac.uk
Wed Jan 15 11:33:33 GMT 2014

Hi Steffen - thanks for your mail!

Steffen Schmidt wrote:
> I your manual about annotation files you describe:
> http://www.jalview.org/help/oldhelp/html/features/annotationsFormat.html
> You can associate an annotation with a sequence by preceding its 
> definition with the line:
> SEQUENCE_REFseq_name[startIndex]
> I wonder what the exact format of seq_name is:
> Image I get a fasta file like this:
>> db|183474|my_pet_protein
> Do I have to put in the full id or are other variations ok?
> SEQUENCE_REFdb|183474|my_pet_protein1
> SEQUENCE_REFmy_pet_protein1
> Background: Since most often accession numbers don’t tell you the 
> species name, I would like to add the species info to the sequence 
> name to quickly spot the organism. e.g. 
> my_pet_protein|Escherichia_coli. But then, I would need to change the 
> annotation file seq_name if I can’t use a shorthand…
Jalview's annotation file format works on exact string matches to 
associate tracks with a sequence. We made that decision because the 
format was designed to be a way for other programs to generate data for 
import in to Jalview.

It is reasonably straightforward to allow substring based matching like 
you suggest - Jalview does that for Newick tree import already, so the 
function is available - so I can create a patch right away, if you like. 
I've created a new feature request for this at 

However, there might be some backwards compatibility problems in the 
case where an alignment includes different sequences where one 
sequence's ID is wholly contained in another, so I don't think I can 
make substring matching the default behaviour when parsing the 
SEQUENCE_REF tag in annotation files. Any thoughts ?


