[Jalview-discuss] Calculating the percent identity between two sequences

Joel Guenther guenthej at gmail.com
Sun Feb 27 22:16:54 GMT 2011

Hi, Jim.

Thanks for the reply. Following your advice, I was able to calculate a
percent identity between two sequences (with empty columns removed) using:
Calculate —> Calculate Tree —> Neighbor joining using % Identity

If you have time, adding an percentage identity matrix out to Jalview would
be nice, but not essential.

Thanks again!


On Sun, Feb 27, 2011 at 6:32 AM, Jim Procter
<jprocter at compbio.dundee.ac.uk>wrote:

> Hello Joel
> On 25/02/2011 21:08, Joel Guenther wrote:
> > I'd like to be able to calculate the percent identity for two
> > sequences in an alignment. The attached alignment (with several empty
> > columns) contains two sequences that were pulled from a larger
> > structure-based alignment generated by Dali. In Jalview, when I select
> > the two sequences and perform a pairwise alignment calculation
> > (Calculate —> Pairwise Alignments...) the output (attached) only
> > includes an alignment that contains only 7 columns, but the two
> > sequences are 204 and 224 aa in length and the structures are highly
> > conserved throughout.
> Confirmed.
> > Why isn't Jalview comparing the sequences along their full length, and
> > can I force it to do so?
> I suspect you may not realise that the 'Pairwise alignment' option
> actually computes a Needleman and Wunsch pairwise alignment for each
> pair of sequences in the selected set, using a BLOSUM 62 matrix and
> nominal gap parameters (120 for opening, 20 for widening). Whilst these
> parameters give a reasonable alignment for sequences with high sequence
> homology, it they can fail for less homologous pairs.  In your case,
> you're trying to align a pair of structurally homologous protein
> sequences which have quite a low sequence identity - and the algorithm
> just returns a stretch of 7 aa that align well, without any of the other
> regions of the two sequences, because the gaps introduced into the
> alignment make them far less optimal.
> >
> > If Jalview won't compare full length sequences, is there another
> > program that will?
> There are plenty out there (checkout EMBOSS, for instance:
> http://emboss.sourceforge.net/servers/#pise), but I get the impression
> that what you actually want is the percentage identity of the pair of
> sequences as aligned by DALI. Apart from looking in the DALI report
> (where,if I remember correctly, you will always find a percent identity
> score in addition to Dali's own Z-score),  the quickest way to do this
> in the current version of Jalview is to copy one or both of sequences
> into the same alignment, and then calculating a percent identity tree.
> The branches will be labelled with the %age difference between the
> sequences, *under current alignment length*. The reason I stress this is
> because If I do this with your DALI alignment as you sent it, I get a
> value of 9.3 - ie the sequences are 90.7% identical - however, if I
> exclude the gapped columns in the alignment (using Edit->Remove empty
> columns), I get 37.5 - ie 63.5% identical. This number is probably still
> not reliable, because there are a fair few 'X' symbols in both sequences
> that do not align to ther Xes, and Jalview will count these as a
> mismatch, rather than a match (also now reported as a bug).
> I will schedule for implementation a new function allowing a pairwise
> %age identity matrix (or flat report) to be generated, enabling you to
> do these calculations more easily.
> Hope this clears things up - thanks for the email!
> Jim.
> ps. if you find the last comment about gaps/non gaps confusing, you
> might want to check out Geoff Barton's paper about percentage identity,
> and this wiki page :
> http://openwetware.org/wiki/Wikiomics:Percentage_identity
> --
> -------------------------------------------------------------------
> J. B. Procter  (JALVIEW/ENFIN)  Barton Bioinformatics Research Group
> Phone/Fax:+44(0)1382 388734/345764  http://www.compbio.dundee.ac.uk
> The University of Dundee is a Scottish Registered Charity, No. SC015096.
> _______________________________________________
> Jalview-discuss mailing list
> Jalview-discuss at jalview.org
> http://www.compbio.dundee.ac.uk/mailman/listinfo/jalview-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.compbio.dundee.ac.uk/pipermail/jalview-discuss/attachments/20110227/96b0195d/attachment.html 

More information about the Jalview-discuss mailing list