Next: Variable length segments Up: Sequence comparison without Previous: Sequence comparison without

Correlation methods

Several experimental, and semi-empirical properties have been derived associated with amino acid types, for example hydrophobicity (e.g. [21]), and propensity to form an - helix (eg. [22]). Correlation methods for the comparison of protein sequences exploit the large number of amino acid properties as an alternative to comparing the sequences on the basis of pair scoring schemes.

Kubota et al. [23] gathered 32 property scales from the literature and through application of factor analysis selected 6 properties which for carp parvalbumin gave good correlation for the comparison of the structurally similar CE- and EF-hand region binding sites and poor correlation in other regions. They expressed their sequence comparisons in the form of a comparison matrix similar to that of McLachlan [6] and demonstrated that their method could identify an alignment of - lytic protease and Streptomyces Griseus protease A which agrees with that determined from comparison of the available crystal structures.

Argos [24] determined the most discriminating properties from a set of 55 by calculating correlation coefficients for all pairs of sequences within 30 families of proteins that had been aligned on the basis of their three-dimensional structures. The correlation coefficients for each property were then averaged over all the families to leave 5 representative properties. Unlike Kubota et al. [23], Argos applied the correlation coefficients from the five properties in addition to a more conventional segment comparison method using the Dayhoff matrix scoring scheme. He also combined the result of using more than one segment length on a single diagram such that the most significant scores for a particular length always prevail.


geoff.barton@ox.ac.uk