The MSV (minimum split value) is used to decide whether two segments are distinct or correlated. If the split value found is less than the MSV, the two segments are correlated, otherwise they are distinct.
A segment can consist of any number of residues, but the
residues must form a continuous sequence along the chain. There are
three types of constraints on the number of residues in a segment
(Table 6), minimum domain size (MDS), minimum no contact cut-off
(MNCC) and minimum segment size (MSS). They are chosen such that
MDS > MNCC > MSS. A segment that has and is
distinct from the rest of the parent domain, is considered to form
a child domain. This constraint provides control over the minimum size
of the domain and prevents the protein being split into small pieces.
A segment with size < MDS, but
, that is found to be
distinct from the rest of the parent domain, is not large enough to
form a child domain. Instead it is classed as a `chopped
segment'. Chopped segments allow the algorithm to remove small
segments from a domain which are not strongly correlated to it and
later reassign them to other domains, or back to the original
one. This allows domains to consist of more than two non-contiguous
segments. The treatment of chopped segments is discussed below.
Segments with size < MNCC, but
, are used by two segment
scans (both Methods 2 and 3). In these scans, two segments can come
together to form a single domain. It is possible that one of the
segments may be small. To allow for this, segments that have a size in
this range are only allowed if they are correlated with another
segment, such that the total size of the two segments is
.
Segments with size < MSS are not allowed, thus preventing very small
segments from occurring. When domains are inspected one often finds
small segments at the N or C termini that cross domains.
Segments in the middle of the chain as small as this do not
cross domains. Thus to allow for this difference , MNCC and MSS
are divided into two categories: segments that are present at in the
middle of the chain and those that have one end connected to the end
of the chain, to give MNCCm, MNCCe, MSSm and MSSe. The values
of these constraints are given in Table 6.
Helices form a relatively large number of contacts per residue
(contact density) when compared to coil and sheet. The
average contact density in 2446 coil regions, 1324 helices and 1563
strands was found to be:
, for helices,
for coil, and
for strands. Accordingly, helical
regions have a tendency not to be split, but more importantly they
raise the number of internal contacts in the segment that contains
them. This can lead to segments containing helices being split
incorrectly. To compensate for this the number of internal contacts
in a helix containing segment is reduced to the average level for coil
regions. The value to which it is reduced is termed Helix Coil Density
(HCD).
sheets may sometimes be split across domains. A
constant BW (standing for
sheet Weighting) is used to reduce
the likelihood of this occurring. The number of external contacts
between two regions is increased by BW percent for every hydrogen
bond (as defined by DSSP [Kabsch & Sander, 1983]) between strands that spans the
two regions. Therefore, the greater the number of strand-forming
hydrogen bonds that bridge two regions, the less likely they are to be
distinct.
Once all the domains have been found their compactness is checked. If a domain is found to be non-compact it is combined with the domain with which it has the lowest split value. The process is repeated until either all the domains are compact or all the domains have been combined together. A domain is defined as non-compact if its radius of gyration deviates from a theoretical curve (of radius of gyration against size of the domain) by more than the constraint maximum allowed compactness (MAC) [Russell, 1993].