[Jalview-discuss] Alignment Annotation File - SEQUENCE_REF format
jprocter at compbio.dundee.ac.uk
Wed Jan 15 11:33:33 GMT 2014
Hi Steffen - thanks for your mail!
Steffen Schmidt wrote:
> I your manual about annotation files you describe:
> You can associate an annotation with a sequence by preceding its
> definition with the line:
> I wonder what the exact format of seq_name is:
> Image I get a fasta file like this:
> Do I have to put in the full id or are other variations ok?
> Background: Since most often accession numbers don’t tell you the
> species name, I would like to add the species info to the sequence
> name to quickly spot the organism. e.g.
> my_pet_protein|Escherichia_coli. But then, I would need to change the
> annotation file seq_name if I can’t use a shorthand…
Jalview's annotation file format works on exact string matches to
associate tracks with a sequence. We made that decision because the
format was designed to be a way for other programs to generate data for
import in to Jalview.
It is reasonably straightforward to allow substring based matching like
you suggest - Jalview does that for Newick tree import already, so the
function is available - so I can create a patch right away, if you like.
I've created a new feature request for this at
However, there might be some backwards compatibility problems in the
case where an alignment includes different sequences where one
sequence's ID is wholly contained in another, so I don't think I can
make substring matching the default behaviour when parsing the
SEQUENCE_REF tag in annotation files. Any thoughts ?
More information about the Jalview-discuss