[Jalview-discuss] Calculating the percent identity between two sequences

Engin Özkan eozkan at stanford.edu
Mon Feb 28 07:01:20 GMT 2011

Dear Jim,

Let me second a request for adding a percent id matrix output, 
especially straight from the multiple sequence alignment. MSAs can be 
better than pairwise alignments especially in the case of distant pairs, 
and reporting pairwise identities based on a new N-W alignment, while 
the better alignment (hopefully) exists within the MSA, seems unwanted. 
Come to think of it, the option to calculate "new" pairwise alignments 
should still exist, in case the MSA is of lower quality.

Thanks for the great work,


On 2/27/11 2:16 PM, Joel Guenther wrote:
> Hi, Jim.
> Thanks for the reply. Following your advice, I was able to calculate a 
> percent identity between two sequences (with empty columns removed) using:
> Calculate —> Calculate Tree —> Neighbor joining using % Identity
> If you have time, adding an percentage identity matrix out to Jalview 
> would be nice, but not essential.
> Thanks again!
> -Joel
> On Sun, Feb 27, 2011 at 6:32 AM, Jim Procter 
> <jprocter at compbio.dundee.ac.uk <mailto:jprocter at compbio.dundee.ac.uk>> 
> wrote:
>     Hello Joel
>     On 25/02/2011 21 <tel:25%2F02%2F2011%2021>:08, Joel Guenther wrote:
>     > I'd like to be able to calculate the percent identity for two
>     > sequences in an alignment. The attached alignment (with several
>     empty
>     > columns) contains two sequences that were pulled from a larger
>     > structure-based alignment generated by Dali. In Jalview, when I
>     select
>     > the two sequences and perform a pairwise alignment calculation
>     > (Calculate —> Pairwise Alignments...) the output (attached) only
>     > includes an alignment that contains only 7 columns, but the two
>     > sequences are 204 and 224 aa in length and the structures are highly
>     > conserved throughout.
>     Confirmed.
>     > Why isn't Jalview comparing the sequences along their full
>     length, and
>     > can I force it to do so?
>     I suspect you may not realise that the 'Pairwise alignment' option
>     actually computes a Needleman and Wunsch pairwise alignment for each
>     pair of sequences in the selected set, using a BLOSUM 62 matrix and
>     nominal gap parameters (120 for opening, 20 for widening). Whilst
>     these
>     parameters give a reasonable alignment for sequences with high
>     sequence
>     homology, it they can fail for less homologous pairs.  In your case,
>     you're trying to align a pair of structurally homologous protein
>     sequences which have quite a low sequence identity - and the algorithm
>     just returns a stretch of 7 aa that align well, without any of the
>     other
>     regions of the two sequences, because the gaps introduced into the
>     alignment make them far less optimal.
>     >
>     > If Jalview won't compare full length sequences, is there another
>     > program that will?
>     There are plenty out there (checkout EMBOSS, for instance:
>     http://emboss.sourceforge.net/servers/#pise), but I get the impression
>     that what you actually want is the percentage identity of the pair of
>     sequences as aligned by DALI. Apart from looking in the DALI report
>     (where,if I remember correctly, you will always find a percent
>     identity
>     score in addition to Dali's own Z-score),  the quickest way to do this
>     in the current version of Jalview is to copy one or both of sequences
>     into the same alignment, and then calculating a percent identity tree.
>     The branches will be labelled with the %age difference between the
>     sequences, *under current alignment length*. The reason I stress
>     this is
>     because If I do this with your DALI alignment as you sent it, I get a
>     value of 9.3 - ie the sequences are 90.7% identical - however, if I
>     exclude the gapped columns in the alignment (using Edit->Remove empty
>     columns), I get 37.5 - ie 63.5% identical. This number is probably
>     still
>     not reliable, because there are a fair few 'X' symbols in both
>     sequences
>     that do not align to ther Xes, and Jalview will count these as a
>     mismatch, rather than a match (also now reported as a bug).
>     I will schedule for implementation a new function allowing a pairwise
>     %age identity matrix (or flat report) to be generated, enabling you to
>     do these calculations more easily.
>     Hope this clears things up - thanks for the email!
>     Jim.
>     ps. if you find the last comment about gaps/non gaps confusing, you
>     might want to check out Geoff Barton's paper about percentage
>     identity,
>     and this wiki page :
>     http://openwetware.org/wiki/Wikiomics:Percentage_identity
>     --
>     -------------------------------------------------------------------
>     J. B. Procter  (JALVIEW/ENFIN)  Barton Bioinformatics Research Group
>     Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk
>     The University of Dundee is a Scottish Registered Charity, No.
>     SC015096.
>     _______________________________________________
>     Jalview-discuss mailing list
>     Jalview-discuss at jalview.org <mailto:Jalview-discuss at jalview.org>
>     http://www.compbio.dundee.ac.uk/mailman/listinfo/jalview-discuss
> _______________________________________________
> Jalview-discuss mailing list
> Jalview-discuss at jalview.org
> http://www.compbio.dundee.ac.uk/mailman/listinfo/jalview-discuss

Engin Özkan
Post-doctoral Scholar
Laboratory of K. Christopher Garcia
Howard Hughes Medical Institute
Dept of Molecular and Cellular Physiology
279 Campus Drive, Beckman Center B173
Stanford School of Medicine
Stanford, CA 94305
ph: (650)-498-7111

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.compbio.dundee.ac.uk/pipermail/jalview-discuss/attachments/20110227/5f059f7e/attachment-0001.html 

More information about the Jalview-discuss mailing list