DOMAK User Guide


All the usual disclaimers

1. DOMain MAKer - Introduction
------------------------------

THis guide describes how to run domak from the command line and the meaning
of the DOMAK parameters. For full details on the algorithm see the article in
Protein Science.

usage : domak -c<PDB code> (-d<chain> -p<parameters file> -o<outputfile>)

pdb code is the four letter code identifying the PDB structure

chain is a single letter telling DOMAK which chain to scan if more than
one is present (note the case must be the same as the case of the letter in
the file)

paramters file is a file containing the values of parameters as defined by
the user if default parameters are not used.

output file is the file to which the final definition should be printed.

Error messages go to standard error

Rasmol file is directed to <PDB code>.rasmol

all other output goes to standard output. This is described below.

2. DOMAKDIR
-----------

the enviroment variable DOMAKDIR must be set to point to a directory
conatining three files

dssp_files
contacts_files
pdb_files

these three files tell the program where to search for files containing the
dssp, contacts and pdb files respectively.

the format of these files is as follows

<directory> <prefix> <suffix>

each file may contain several lines indicating that the files may be found
in all of those places. The program will look for a matching file type in
a format corresponding to each of the lines in order.

e.g.

/usr/local/data/dssp/ _ .all
/usr/local/data/dssp/ _ .ALL
/usr/local/data/dssp/extra extra .all

first look for file 
/usr/local/data/dssp/<PDB code>.all
then
/usr/local/data/dssp/<PDB code>.ALL
finally
/usr/local/data/dssp/extra/extra<PDB code>.all

if the file is not found an error will result.

3. PARAMETERS FILE
------------------

The following parameters are used by DOMAK, any or all can be changed by
specifying a paramters file, the format of which is

<parameter> <value>

The most of the paramters are described in the paper

MIN_DOMAIN_SIZE 40

The minimum number of residues to make up a domain

MIN_SEGMENT_SIZE_END 5

The minimum size that a segment may be that is at either the N or C terminus
of a chain

MIN_SEGMENT_SIZE_MID 25

The minium size that a segment may be that is in the middle if a chain

MIN_DOUBLE_SPLIT 120

The minimum size that a domain must be before it can be considered to
contain a two segment domain

MIN_NO_CONTACT_CUTOFF_MID 30

The minium size that a segment in the middle of the chain can be before it
can be considered to form a distinct unit (but not a domain) - termed a
chopped segment

MIN_NO_CONTACT_CUTOFF_END 10

Same as above except for segments that are at either the C or N terminus of
the chain

E_WEIGHT 0.100000

Weighting to prevent beta sheets from being split

MAX_ALLOWABLE_GLOB 2.850000

The maximum deviation of globularity of a domain from a theoretical curve
(see paper). If a domain exceeds this threshold it is combined with other
domains.

MIN_PEAK_C 9.500000
MIN_PEAK_DC 9.500000
MIN_PEAK_MC 9.500000

These three parameters should always have the same value. They determine the
split value when using all contacts.

MIN_PEAK_SS_ONLY_C 17.049999
MIN_PEAK_SS_ONLY_DC 17.049999
MIN_PEAK_SS_ONLY_MC 17.049999

These three parameters should always have the same value. They determine the
split value when using secondary structure contacts only.

MIN_PEAK_BLO_C 60.000000
MIN_PEAK_SS_ONLY_BLO_C 60.000000
MIN_PEAK_BLO_DC 60.000000
MIN_PEAK_SS_ONLY_BLO_DC 60.000000

These four value should always have the same value. They determine the
split value for deciding whether chopped segments should re-assigned to
protein domains.

MIN_SS_PER 0.570000

This in the minimum fraction of secondary structure content allowed before
the program decides to use secondary structure contacts only on the segment.

MIN_HELIX_LENGTH 5

This is the minimum length that helix must be before its internal contacts are
reduced.

HELIX_RAMP 4

This is the maximum number of residues at the start and end of the helix that
form a ramp function (see below for detail)

HELIX_REDUCE_C_DENS 10.320000

This the level to which internal contacts in a helix are reduced (see below
for detail)

INCREMENT_DIVIDER 250

If the number of residues being considered (i.e. secondary structure residues
only if they are the only ones being used) is greater than 250 the program
skips over every other residue. If it is greater than 500 it skips 2 residues
in a row and so on. As in the case of secondary structures only being used
when the split value occurs over a range of residues the alogorithm goes
back and focuses in on the residue at which to make the split.

4. What the Output Means ?
--------------------------

First a list of the parameters and their value are outputed

# DOMAK running....
# MIN_DOMAIN_SIZE 40
# MIN_SEGMENT_SIZE_END 5
# MIN_SEGMENT_SIZE_MID 25
# MIN_DOUBLE_SPLIT 120
# MIN_NO_CONTACT_CUTOFF_MID 30
# MIN_NO_CONTACT_CUTOFF_END 10
# E_WEIGHT 0.100000
# MAX_ALLOWABLE_GLOB 2.850000
# MIN_PEAK_SS_ONLY_C 17.049999
# MIN_PEAK_C 9.500000
# MIN_PEAK_SS_ONLY_DC 17.049999
# MIN_PEAK_DC 9.500000
# MIN_PEAK_SS_ONLY_MC 17.049999
# MIN_PEAK_MC 9.500000
# MIN_PEAK_BLO_C 60.000000
# MIN_PEAK_SS_ONLY_BLO_C 60.000000
# MIN_PEAK_BLO_DC 60.000000
# MIN_PEAK_SS_ONLY_BLO_DC 60.000000
# MIN_SS_PER 1.000000
# MIN_HELIX_LENGTH 5
# HELIX_RAMP 4
# HELIX_REDUCE_C_DENS 10.320000
# INCREMENT_DIVIDER 250
# Reading in contacts file....
# Analysing residues in chain A of 1abb

Then the domain being analysed is printed in STAMP format

/usr/local/data/pdb//pdb1abb.ent 1abb-1 { A 10 _ to A 837 _ }
# 1 Domain currently being analysed

The numbering system for domains and chopped segemnts is as follows

if a domain is split in to n parts the id code of those n parts are
the id code of the original domain with .1, .2, .3, ...... .n appended

e.g.

if domain 1.2 is split into 3 parts, they become 1.2.1, 1.2.2 and 1.2.3

The output is informative.

5. The Three Screens
--------------------

The three screens are implimented in the programs small_ins, small_dom and
n_res_per_seg.
The awk script domak_format can be used to run them (see example).
Results of the three screens are given in the same order as they are
described in the paper

6. Example
----------

This is example if for the the PDB file 1bia.

given that pdb_files, contact_files and dssp_files are properly formatted
and exist in the current directory
prompt>setenv DOMAKDIR ./

prompt>contacts -c 1bia
Using all default distances
VDW radii to be read in from /data/contacts/VDW_FILE
Van der-Waals radii:
   C  2.500
   N  2.500
   O  2.500
   S  2.500
   P  2.500
   I  2.500
parameters:
 Ca - Ca 40.000
 Atom - Atom contact <= VDW radii +  0.000
 C - C "hydrophobic" contact <=  5.000
 Hydrogen bond distance <=  3.000
 Disulphide bond distance <=  2.200
 Too close for contacts except H-bonds and S-S <=  2.000
 Too close for H-bonds and S-S bonds <=  1.200
 Waters are to be ignored
 Acetylation/Formylation atoms are to be ignored

input:
PDBfile: /usr/local/data/pdb/pdb1bia.ent
DSSP file: /usr/local/data/dssp/defn//1bia.all
Output:
 complete: 1bia.all; summary: 1bia.sum
reading PDB file....
               done.
reading DSSP file...
               done.
calculating contacts...
                 done.
elapsed cpu time   14.75000 seconds

prompt>domak -c1bia
# DOMAK running....
# Using default parameters
# MIN_DOMAIN_SIZE 40
# MIN_SEGMENT_SIZE_END 5
# MIN_SEGMENT_SIZE_MID 25
# MIN_DOUBLE_SPLIT 120
# MIN_NO_CONTACT_CUTOFF_MID 30
# MIN_NO_CONTACT_CUTOFF_END 10
# E_WEIGHT 0.100000
# MAX_ALLOWABLE_GLOB 2.850000
# MIN_PEAK_SS_ONLY_C 17.049999
# MIN_PEAK_C 9.500000
# MIN_PEAK_SS_ONLY_DC 17.049999
# MIN_PEAK_DC 9.500000
# MIN_PEAK_SS_ONLY_MC 17.049999
# MIN_PEAK_MC 9.500000
# MIN_PEAK_BLO_C 60.000000
# MIN_PEAK_SS_ONLY_BLO_C 60.000000
# MIN_PEAK_BLO_DC 60.000000
# MIN_PEAK_SS_ONLY_BLO_DC 60.000000
# MIN_SS_PER 0.570000
# MIN_HELIX_LENGTH 5
# HELIX_RAMP 4
# HELIX_REDUCE_C_DENS 10.320000
# INCREMENT_DIVIDER 250
# pdb file : /usr/local/data/pdb//pdb1bia.ent
# dssp file : /data/newdssp//1bia.all
# contacts file : .//1bia.all
# Reading in contacts file....
# Analysing all residues in 1bia

/usr/local/data/pdb//pdb1bia.ent 1bia-1 { _ 1 _ to _ 317 _ }
# 1 Domain currently being analysed

/usr/local/data/pdb//pdb1bia.ent 1bia-1.1 { _ 1 _ to _ 64 _ }
# 1.1 The above domain has been extracted
# 1.1 with split value 1550177.875000

/usr/local/data/pdb//pdb1bia.ent 1bia-1.1 { _ 1 _ to _ 64 _ }
# 1.1 Domain currently being analysed

/usr/local/data/pdb//pdb1bia.ent 1bia-1.1 { _ 1 _ to _ 64 _ }
# 1.1 This domain is too small to be subdivided further

/usr/local/data/pdb//pdb1bia.ent 1bia-1.2 { _ 65 _ to _ 317 _ }
# 1.2 Domain currently being analysed

/usr/local/data/pdb//pdb1bia.ent 1bia-1.2.1 { _ 65 _ to _ 270 _ }
# 1.2.1 The above domain has been extracted
# 1.2.1 with split value 815.492126

/usr/local/data/pdb//pdb1bia.ent 1bia-1.2.1 { _ 65 _ to _ 270 _ }
# 1.2.1 Domain currently being analysed

/usr/local/data/pdb//pdb1bia.ent 1bia-1.2.1 { _ 65 _ to _ 270 _ }
# 1.2.1 Split value too small to allow futher splitting

/usr/local/data/pdb//pdb1bia.ent 1bia-1.2.1.1 { _ 145 _ to _ 185 _ }
# 1.2.1.1 This is the domain that would have been extracted
# 1.2.1.1 had the previous domain been split
# 1.2.1.1 with split value of 4.283037

/usr/local/data/pdb//pdb1bia.ent 1bia-1.2.2 { _ 271 _ to _ 317 _ }
# 1.2.2 Domain currently being analysed

/usr/local/data/pdb//pdb1bia.ent 1bia-1.2.2 { _ 271 _ to _ 317 _ }
# 1.2.2 This domain is too small to be subdivided further

/usr/local/data/pdb//pdb1bia.ent 1bia-1 { _ 1 _ to _ 317 _ }
# 1 Analysing compactness of sub-domains of above domain
/usr/local/data/pdb//pdb1bia.ent 1bia-1.1 { _ 1 _ to _ 64 _ }
# 1.1 Compactness of domain 0.492947
/usr/local/data/pdb//pdb1bia.ent 1bia-1.2.1 { _ 65 _ to _ 270 _ }
# 1.2.1 Compactness of domain 0.064336
/usr/local/data/pdb//pdb1bia.ent 1bia-1.2.2 { _ 271 _ to _ 317 _ }
# 1.2.2 Compactness of domain 0.208968

DOMAK DOMAINS
/usr/local/data/pdb//pdb1bia.ent 1bia-1.1 { _ 1 _ to _ 64 _ }
/usr/local/data/pdb//pdb1bia.ent 1bia-1.2.1 { _ 65 _ to _ 270 _ }
/usr/local/data/pdb//pdb1bia.ent 1bia-1.2.2 { _ 271 _ to _ 317 _ }


prompt>mv output.res 1bia.dom
prompt>domak_format 1bia
DOMAK 1bia PASS PASS PASS
STAMP 1bia 1bia-1 { _ 1 _ to _ 64 _ }
STAMP 1bia 1bia-2 { _ 65 _ to _ 270 _ }
STAMP 1bia 1bia-3 { _ 271 _ to _ 317 _ }
DOMAK END