The program PDBC may be used to output a set of STAMP readable domain descriptions. Given a list of four letter brookhaven codes and an optional set of chains. This will only work if you have a suitable `pdb.directories' file. See the chapter on installation for details on how to do this.
pdbc -d 2hhba >! globin_fold.domains pdbc -d 2hhbb >> globin_fold.domains pdbc -d 4mbn >> globin_fold.domains pdbc -d 1lh1 >> globin_fold.domains pdbc -d 1cola >> globin_fold.domains pdbc -d 1cpca >> globin_fold.domains
will produce the following output (ignoring comments, which are specified by a `%` in column 0):
/(PDB PATH)/pdb2hhb.ent 2hhba { CHAIN A } /(PDB PATH)/pdb2hhb.ent 2hhbb { CHAIN B } /(PDB PATH)/pdb4mbn.ent 4mbn { ALL } /(PDB PATH)/pdb1lh1.ent 1lh1 { ALL } /(PDB PATH)/pdb1col.ent 1cola { CHAIN A }
Where (PDB PATH) denotes the location of the relevant PDB file on your
system. Note that your PDB files may be called (code).pdb instead, or
may follow some other convention. This is OK, see Chapter 5 (installation) for
details as to setting this up.
Note that there doesn't need to be a filename in the domain file. One
can merely leave it as `Unknown` or some other string (i.e. not empty
spaces), and the programs will try and find where the file corresonding
to the four letter code is one your system. In other words, the files
given in this distribution should work on your system, provided that
you have all the PDB files.
Note that PDBC can be used to probe information about a PDB entry by
using the `-q' option. Try it and see. This is a good test of whether
STAMP has been set up properly on your system. If you just want to test
where STAMP is looking for PDB and DSSP files, then use the `-m' (minimal)
options. This just reports PDB/DSSP files if found and exits.
STAMP database comparisons are computationally intensive, so it is prudent
to avoid comparisons that are redundant (e.g. multiple mutants or binding studies
of the same protein, T4 lysozyme for example). The STAMP distribution
contains a series of non-redundant databases derived by a parsing of the
SCOP database. In the STAMPDIR/defs directory there are several databases:
Domain database | N | Description |
scop.dom | 17891 | All PDB entries classified in SCOP |
scop_domain.dom | 10741 | The above, though ignoring multiple copies of the same chain |
scop_species.dom | 3495 | One representative from protein of every species |
scop_prot.dom | 2420 | One representative from each protein |
scop_fam.dom | 1031 | One representative from each protein family |
scop_supf.dom | 716 | One representative from each protein superfamily |
scop_fold.dom | 506 | One representative per fold |
Probably the first two databases are too big to be used sensible with STAMP (and contain too much redundancy); they have only been included for completeness. I tend to use ``scop_species.dom'' or ``scop_prot.dom'', but probably one could get away with using ``scop_fam.dom''. It entirely depends on your patience and CPU resources.