DOMAK User Guide
All the usual disclaimers
1. DOMain MAKer - Introduction
------------------------------
THis guide describes how to run domak from the command line and the meaning
of the DOMAK parameters. For full details on the algorithm see the article in
Protein Science.
usage : domak -c<PDB code> (-d<chain> -p<parameters file> -o<outputfile>)
pdb code is the four letter code identifying the PDB structure
chain is a single letter telling DOMAK which chain to scan if more than
one is present (note the case must be the same as the case of the letter in
the file)
paramters file is a file containing the values of parameters as defined by
the user if default parameters are not used.
output file is the file to which the final definition should be printed.
Error messages go to standard error
Rasmol file is directed to <PDB code>.rasmol
all other output goes to standard output. This is described below.
2. DOMAKDIR
-----------
the enviroment variable DOMAKDIR must be set to point to a directory
conatining three files
dssp_files
contacts_files
pdb_files
these three files tell the program where to search for files containing the
dssp, contacts and pdb files respectively.
the format of these files is as follows
<directory> <prefix> <suffix>
each file may contain several lines indicating that the files may be found
in all of those places. The program will look for a matching file type in
a format corresponding to each of the lines in order.
e.g.
/usr/local/data/dssp/ _ .all
/usr/local/data/dssp/ _ .ALL
/usr/local/data/dssp/extra extra .all
first look for file
/usr/local/data/dssp/<PDB code>.all
then
/usr/local/data/dssp/<PDB code>.ALL
finally
/usr/local/data/dssp/extra/extra<PDB code>.all
if the file is not found an error will result.
3. PARAMETERS FILE
------------------
The following parameters are used by DOMAK, any or all can be changed by
specifying a paramters file, the format of which is
<parameter> <value>
The most of the paramters are described in the paper
MIN_DOMAIN_SIZE 40
The minimum number of residues to make up a domain
MIN_SEGMENT_SIZE_END 5
The minimum size that a segment may be that is at either the N or C terminus
of a chain
MIN_SEGMENT_SIZE_MID 25
The minium size that a segment may be that is in the middle if a chain
MIN_DOUBLE_SPLIT 120
The minimum size that a domain must be before it can be considered to
contain a two segment domain
MIN_NO_CONTACT_CUTOFF_MID 30
The minium size that a segment in the middle of the chain can be before it
can be considered to form a distinct unit (but not a domain) - termed a
chopped segment
MIN_NO_CONTACT_CUTOFF_END 10
Same as above except for segments that are at either the C or N terminus of
the chain
E_WEIGHT 0.100000
Weighting to prevent beta sheets from being split
MAX_ALLOWABLE_GLOB 2.850000
The maximum deviation of globularity of a domain from a theoretical curve
(see paper). If a domain exceeds this threshold it is combined with other
domains.
MIN_PEAK_C 9.500000
MIN_PEAK_DC 9.500000
MIN_PEAK_MC 9.500000
These three parameters should always have the same value. They determine the
split value when using all contacts.
MIN_PEAK_SS_ONLY_C 17.049999
MIN_PEAK_SS_ONLY_DC 17.049999
MIN_PEAK_SS_ONLY_MC 17.049999
These three parameters should always have the same value. They determine the
split value when using secondary structure contacts only.
MIN_PEAK_BLO_C 60.000000
MIN_PEAK_SS_ONLY_BLO_C 60.000000
MIN_PEAK_BLO_DC 60.000000
MIN_PEAK_SS_ONLY_BLO_DC 60.000000
These four value should always have the same value. They determine the
split value for deciding whether chopped segments should re-assigned to
protein domains.
MIN_SS_PER 0.570000
This in the minimum fraction of secondary structure content allowed before
the program decides to use secondary structure contacts only on the segment.
MIN_HELIX_LENGTH 5
This is the minimum length that helix must be before its internal contacts are
reduced.
HELIX_RAMP 4
This is the maximum number of residues at the start and end of the helix that
form a ramp function (see below for detail)
HELIX_REDUCE_C_DENS 10.320000
This the level to which internal contacts in a helix are reduced (see below
for detail)
INCREMENT_DIVIDER 250
If the number of residues being considered (i.e. secondary structure residues
only if they are the only ones being used) is greater than 250 the program
skips over every other residue. If it is greater than 500 it skips 2 residues
in a row and so on. As in the case of secondary structures only being used
when the split value occurs over a range of residues the alogorithm goes
back and focuses in on the residue at which to make the split.
4. What the Output Means ?
--------------------------
First a list of the parameters and their value are outputed
# DOMAK running....
# MIN_DOMAIN_SIZE 40
# MIN_SEGMENT_SIZE_END 5
# MIN_SEGMENT_SIZE_MID 25
# MIN_DOUBLE_SPLIT 120
# MIN_NO_CONTACT_CUTOFF_MID 30
# MIN_NO_CONTACT_CUTOFF_END 10
# E_WEIGHT 0.100000
# MAX_ALLOWABLE_GLOB 2.850000
# MIN_PEAK_SS_ONLY_C 17.049999
# MIN_PEAK_C 9.500000
# MIN_PEAK_SS_ONLY_DC 17.049999
# MIN_PEAK_DC 9.500000
# MIN_PEAK_SS_ONLY_MC 17.049999
# MIN_PEAK_MC 9.500000
# MIN_PEAK_BLO_C 60.000000
# MIN_PEAK_SS_ONLY_BLO_C 60.000000
# MIN_PEAK_BLO_DC 60.000000
# MIN_PEAK_SS_ONLY_BLO_DC 60.000000
# MIN_SS_PER 1.000000
# MIN_HELIX_LENGTH 5
# HELIX_RAMP 4
# HELIX_REDUCE_C_DENS 10.320000
# INCREMENT_DIVIDER 250
# Reading in contacts file....
# Analysing residues in chain A of 1abb
Then the domain being analysed is printed in STAMP format
/usr/local/data/pdb//pdb1abb.ent 1abb-1 { A 10 _ to A 837 _ }
# 1 Domain currently being analysed
The numbering system for domains and chopped segemnts is as follows
if a domain is split in to n parts the id code of those n parts are
the id code of the original domain with .1, .2, .3, ...... .n appended
e.g.
if domain 1.2 is split into 3 parts, they become 1.2.1, 1.2.2 and 1.2.3
The output is informative.
5. The Three Screens
--------------------
The three screens are implimented in the programs small_ins, small_dom and
n_res_per_seg.
The awk script domak_format can be used to run them (see example).
Results of the three screens are given in the same order as they are
described in the paper
6. Example
----------
This is example if for the the PDB file 1bia.
given that pdb_files, contact_files and dssp_files are properly formatted
and exist in the current directory
prompt>setenv DOMAKDIR ./
prompt>contacts -c 1bia
Using all default distances
VDW radii to be read in from /data/contacts/VDW_FILE
Van der-Waals radii:
C 2.500
N 2.500
O 2.500
S 2.500
P 2.500
I 2.500
parameters:
Ca - Ca 40.000
Atom - Atom contact <= VDW radii + 0.000
C - C "hydrophobic" contact <= 5.000
Hydrogen bond distance <= 3.000
Disulphide bond distance <= 2.200
Too close for contacts except H-bonds and S-S <= 2.000
Too close for H-bonds and S-S bonds <= 1.200
Waters are to be ignored
Acetylation/Formylation atoms are to be ignored
input:
PDBfile: /usr/local/data/pdb/pdb1bia.ent
DSSP file: /usr/local/data/dssp/defn//1bia.all
Output:
complete: 1bia.all; summary: 1bia.sum
reading PDB file....
done.
reading DSSP file...
done.
calculating contacts...
done.
elapsed cpu time 14.75000 seconds
prompt>domak -c1bia
# DOMAK running....
# Using default parameters
# MIN_DOMAIN_SIZE 40
# MIN_SEGMENT_SIZE_END 5
# MIN_SEGMENT_SIZE_MID 25
# MIN_DOUBLE_SPLIT 120
# MIN_NO_CONTACT_CUTOFF_MID 30
# MIN_NO_CONTACT_CUTOFF_END 10
# E_WEIGHT 0.100000
# MAX_ALLOWABLE_GLOB 2.850000
# MIN_PEAK_SS_ONLY_C 17.049999
# MIN_PEAK_C 9.500000
# MIN_PEAK_SS_ONLY_DC 17.049999
# MIN_PEAK_DC 9.500000
# MIN_PEAK_SS_ONLY_MC 17.049999
# MIN_PEAK_MC 9.500000
# MIN_PEAK_BLO_C 60.000000
# MIN_PEAK_SS_ONLY_BLO_C 60.000000
# MIN_PEAK_BLO_DC 60.000000
# MIN_PEAK_SS_ONLY_BLO_DC 60.000000
# MIN_SS_PER 0.570000
# MIN_HELIX_LENGTH 5
# HELIX_RAMP 4
# HELIX_REDUCE_C_DENS 10.320000
# INCREMENT_DIVIDER 250
# pdb file : /usr/local/data/pdb//pdb1bia.ent
# dssp file : /data/newdssp//1bia.all
# contacts file : .//1bia.all
# Reading in contacts file....
# Analysing all residues in 1bia
/usr/local/data/pdb//pdb1bia.ent 1bia-1 { _ 1 _ to _ 317 _ }
# 1 Domain currently being analysed
/usr/local/data/pdb//pdb1bia.ent 1bia-1.1 { _ 1 _ to _ 64 _ }
# 1.1 The above domain has been extracted
# 1.1 with split value 1550177.875000
/usr/local/data/pdb//pdb1bia.ent 1bia-1.1 { _ 1 _ to _ 64 _ }
# 1.1 Domain currently being analysed
/usr/local/data/pdb//pdb1bia.ent 1bia-1.1 { _ 1 _ to _ 64 _ }
# 1.1 This domain is too small to be subdivided further
/usr/local/data/pdb//pdb1bia.ent 1bia-1.2 { _ 65 _ to _ 317 _ }
# 1.2 Domain currently being analysed
/usr/local/data/pdb//pdb1bia.ent 1bia-1.2.1 { _ 65 _ to _ 270 _ }
# 1.2.1 The above domain has been extracted
# 1.2.1 with split value 815.492126
/usr/local/data/pdb//pdb1bia.ent 1bia-1.2.1 { _ 65 _ to _ 270 _ }
# 1.2.1 Domain currently being analysed
/usr/local/data/pdb//pdb1bia.ent 1bia-1.2.1 { _ 65 _ to _ 270 _ }
# 1.2.1 Split value too small to allow futher splitting
/usr/local/data/pdb//pdb1bia.ent 1bia-1.2.1.1 { _ 145 _ to _ 185 _ }
# 1.2.1.1 This is the domain that would have been extracted
# 1.2.1.1 had the previous domain been split
# 1.2.1.1 with split value of 4.283037
/usr/local/data/pdb//pdb1bia.ent 1bia-1.2.2 { _ 271 _ to _ 317 _ }
# 1.2.2 Domain currently being analysed
/usr/local/data/pdb//pdb1bia.ent 1bia-1.2.2 { _ 271 _ to _ 317 _ }
# 1.2.2 This domain is too small to be subdivided further
/usr/local/data/pdb//pdb1bia.ent 1bia-1 { _ 1 _ to _ 317 _ }
# 1 Analysing compactness of sub-domains of above domain
/usr/local/data/pdb//pdb1bia.ent 1bia-1.1 { _ 1 _ to _ 64 _ }
# 1.1 Compactness of domain 0.492947
/usr/local/data/pdb//pdb1bia.ent 1bia-1.2.1 { _ 65 _ to _ 270 _ }
# 1.2.1 Compactness of domain 0.064336
/usr/local/data/pdb//pdb1bia.ent 1bia-1.2.2 { _ 271 _ to _ 317 _ }
# 1.2.2 Compactness of domain 0.208968
DOMAK DOMAINS
/usr/local/data/pdb//pdb1bia.ent 1bia-1.1 { _ 1 _ to _ 64 _ }
/usr/local/data/pdb//pdb1bia.ent 1bia-1.2.1 { _ 65 _ to _ 270 _ }
/usr/local/data/pdb//pdb1bia.ent 1bia-1.2.2 { _ 271 _ to _ 317 _ }
prompt>mv output.res 1bia.dom
prompt>domak_format 1bia
DOMAK 1bia PASS PASS PASS
STAMP 1bia 1bia-1 { _ 1 _ to _ 64 _ }
STAMP 1bia 1bia-2 { _ 65 _ to _ 270 _ }
STAMP 1bia 1bia-3 { _ 271 _ to _ 317 _ }
DOMAK END