The program PDBC may be used to output a set of STAMP readable domain descriptions. Given a list of four letter brookhaven codes and an optional set of chains. This will only work if you have a suitable `pdb.directories' file. See the chapter on installation for details on how to do this.
pdbc -d 2hhba >! globin_fold.domains pdbc -d 2hhbb >> globin_fold.domains pdbc -d 4mbn >> globin_fold.domains pdbc -d 1lh1 >> globin_fold.domains pdbc -d 1cola >> globin_fold.domains pdbc -d 1cpca >> globin_fold.domains
will produce the following output (ignoring comments, which are specified by a `%` in column 0):
/(PDB PATH)/pdb2hhb.ent 2hhba { CHAIN A } /(PDB PATH)/pdb2hhb.ent 2hhbb { CHAIN B } /(PDB PATH)/pdb4mbn.ent 4mbn { ALL } /(PDB PATH)/pdb1lh1.ent 1lh1 { ALL } /(PDB PATH)/pdb1col.ent 1cola { CHAIN A }
Where (PDB PATH) denotes the location of the relevant PDB file on your
system. Note that your PDB files may be called (code).pdb instead, or
may follow some other convention. This is OK, see Chapter 5 (installation) for
details as to setting this up.
Note that there doesn't need to be a filename in the domain file. One
can merely leave it as `Unknown` or some other string (i.e. not empty
spaces), and the programs will try and find where the file corresonding
to the four letter code is one your system. In other words, the files
given in this distribution should work on your system, provided that
you have all the PDB files.
Sensitive STAMP database comparisons can take a long time. For this
reason we have compared the current PDB database to itself based on
sequence, and clustered the data such that only one member of each
sequence family is in our domain database. We have also split
these structures into domains using author definitions. This database, even
when a high degree of sequence similarity is required for
clustering, reduces the size of the brookhaven database
drastically (from over 4000 independent chains down to just under
600 protein domains). It is probably sensible to scan this
database, which contains just one representative of each sequence
family, then if something interesting is found one can scan other
structures related to the representative by sequence.
A copy of this representative database is in the file
brookhaven_subset.domains.
Note that PDBC can be used to probe information about a PDB entry by
using the `-q' option. Try it and see. This is a good test of whether
STAMP has been set up properly on your system.