Additional Details

Next: Increasing the Speed Up: Materials and Methods Previous: Applying the methods

Additional Details

The MSV (minimum split value) is used to decide whether two segments are distinct or correlated. If the split value found is less than the MSV, the two segments are correlated, otherwise they are distinct.

A segment can consist of any number of residues, but the residues must form a continuous sequence along the chain. There are three types of constraints on the number of residues in a segment (Table 6), minimum domain size (MDS), minimum no contact cut-off (MNCC) and minimum segment size (MSS). They are chosen such that MDS > MNCC > MSS. A segment that has and is distinct from the rest of the parent domain, is considered to form a child domain. This constraint provides control over the minimum size of the domain and prevents the protein being split into small pieces. A segment with size < MDS, but , that is found to be distinct from the rest of the parent domain, is not large enough to form a child domain. Instead it is classed as a `chopped segment'. Chopped segments allow the algorithm to remove small segments from a domain which are not strongly correlated to it and later reassign them to other domains, or back to the original one. This allows domains to consist of more than two non-contiguous segments. The treatment of chopped segments is discussed below. Segments with size < MNCC, but , are used by two segment scans (both Methods 2 and 3). In these scans, two segments can come together to form a single domain. It is possible that one of the segments may be small. To allow for this, segments that have a size in this range are only allowed if they are correlated with another segment, such that the total size of the two segments is . Segments with size < MSS are not allowed, thus preventing very small segments from occurring. When domains are inspected one often finds small segments at the N or C termini that cross domains. Segments in the middle of the chain as small as this do not cross domains. Thus to allow for this difference , MNCC and MSS are divided into two categories: segments that are present at in the middle of the chain and those that have one end connected to the end of the chain, to give MNCCm, MNCCe, MSSm and MSSe. The values of these constraints are given in Table 6.

Helices form a relatively large number of contacts per residue (contact density) when compared to coil and sheet. The average contact density in 2446 coil regions, 1324 helices and 1563 strands was found to be: , for helices, for coil, and for strands. Accordingly, helical regions have a tendency not to be split, but more importantly they raise the number of internal contacts in the segment that contains them. This can lead to segments containing helices being split incorrectly. To compensate for this the number of internal contacts in a helix containing segment is reduced to the average level for coil regions. The value to which it is reduced is termed Helix Coil Density (HCD).

sheets may sometimes be split across domains. A constant BW (standing for sheet Weighting) is used to reduce the likelihood of this occurring. The number of external contacts between two regions is increased by BW percent for every hydrogen bond (as defined by DSSP [Kabsch & Sander, 1983]) between strands that spans the two regions. Therefore, the greater the number of strand-forming hydrogen bonds that bridge two regions, the less likely they are to be distinct.

Once all the domains have been found their compactness is checked. If a domain is found to be non-compact it is combined with the domain with which it has the lowest split value. The process is repeated until either all the domains are compact or all the domains have been combined together. A domain is defined as non-compact if its radius of gyration deviates from a theoretical curve (of radius of gyration against size of the domain) by more than the constraint maximum allowed compactness (MAC) [Russell, 1993].

Next: Increasing the Speed Up: Materials and Methods Previous: Applying the methods

as@bioch.ox.ac.uk