next up previous contents
Next: Generating transformed coordinates Up: Worked examples Previous: Using scans as a

Protein domain databases

The program PDBC may be used to output a set of STAMP readable domain descriptions. Given a list of four letter brookhaven codes and an optional set of chains. This will only work if you have a suitable `pdb.directories' file. See the chapter on installation for details on how to do this.

pdbc -d 2hhba >! globin_fold.domains
pdbc -d 2hhbb >> globin_fold.domains
pdbc -d 4mbn  >> globin_fold.domains
pdbc -d 1lh1  >> globin_fold.domains
pdbc -d 1cola >> globin_fold.domains
pdbc -d 1cpca >> globin_fold.domains

will produce the following output (ignoring comments, which are specified by a `%` in column 0):

/(PDB PATH)/pdb2hhb.ent 2hhba { CHAIN A }
/(PDB PATH)/pdb2hhb.ent 2hhbb { CHAIN B }
/(PDB PATH)/pdb4mbn.ent 4mbn { ALL }
/(PDB PATH)/pdb1lh1.ent 1lh1 { ALL }
/(PDB PATH)/pdb1col.ent 1cola { CHAIN A }

Where (PDB PATH) denotes the location of the relevant PDB file on your system. Note that your PDB files may be called (code).pdb instead, or may follow some other convention. This is OK, see Chapter 5 (installation) for details as to setting this up.

Note that there doesn't need to be a filename in the domain file. One can merely leave it as `Unknown` or some other string (i.e. not empty spaces), and the programs will try and find where the file corresonding to the four letter code is one your system. In other words, the files given in this distribution should work on your system, provided that you have all the PDB files.

Note that PDBC can be used to probe information about a PDB entry by using the `-q' option. Try it and see. This is a good test of whether STAMP has been set up properly on your system. If you just want to test where STAMP is looking for PDB and DSSP files, then use the `-m' (minimal) options. This just reports PDB/DSSP files if found and exits.

STAMP database comparisons are computationally intensive, so it is prudent to avoid comparisons that are redundant (e.g. multiple mutants or binding studies of the same protein, T4 lysozyme for example). The STAMP distribution contains a series of non-redundant databases derived by a parsing of the SCOP database. In the STAMPDIR/defs directory there are several databases:

Domain database N Description
scop.dom 17891 All PDB entries classified in SCOP
scop_domain.dom 10741 The above, though ignoring multiple copies of the same chain
scop_species.dom 3495 One representative from protein of every species
scop_prot.dom 2420 One representative from each protein
scop_fam.dom 1031 One representative from each protein family
scop_supf.dom 716 One representative from each protein superfamily
scop_fold.dom 506 One representative per fold

Probably the first two databases are too big to be used sensible with STAMP (and contain too much redundancy); they have only been included for completeness. I tend to use ``scop_species.dom'' or ``scop_prot.dom'', but probably one could get away with using ``scop_fam.dom''. It entirely depends on your patience and CPU resources.


next up previous contents
Next: Generating transformed coordinates Up: Worked examples Previous: Using scans as a
Geoff Barton
1999-04-16