Newsletter contents...
NEWS FROM THE UPPSALA SOFTWARE FACTORY - 9
Déjà-vu all over again
Gerard J. Kleywegt
Department of Molecular Biology
Biomedical Centre, Uppsala University
Uppsala - Sweden
While building a protein model into electron density, one often comes
across features
of the model that make one wonder: "(where) have I seen this before?
". At the level of the overall fold, there is plenty of software
available nowadays
that can help answer this question (DALI, DEJAVU, TOP, etc.
). But when it comes to recognising smaller "motifs" (e.g.
, a set of residues involved in binding a ligand or metal ion, or with
seemingly "unusual"
side chain-side chain interactions), answering the question "has
this been observed in any other protein structure?
" is not as simple.
At the 1995 CCP4 meeting, Peter Artymiuk described a program called ASSAM
[1] [2]
that could recognise spatial arrangements of side chains by comparing
them to a database
of protein structures. This provided the inspiration for the SPASM
package [3] [4] [5]
that contains programs for the recognition of arbitrary patterns or
motifs in protein
structures, interfaced with O [6]
and other programs.
SPASM
SPASM is a program that can be used to recognise user-defined motifs in a
database
of protein structures (derived from the PDB). The user merely has to
carve out those
residues that (s)he is interested in (e.g., catalytic residues, a strange
loop, ligand-binding residues, a weird Met-Trp interaction, a
helix-turn-helix motif, etc. etc.;
whatever is selected will be referred to as a "motif" from now on) and
put them
into a small PDB file. The program will read this file as well as its
database,
will prompt for values for a few parameters (the default values will do
in most cases), and
will subsequently find all instances of the motif in the proteins that
are in the
database. (The nitty-gritty and some of the bells and whistles are
discussed in
[5]
and [6]
.)
Besides simply listing the "hits", SPASM can also generate a macro file
for use with
O which, when executed, will automatically read the hits, apply the
rotation-translation
operator that superimposes the hits with the user's motif, and draw the
hits. Thus, within five to ten minutes one obtains a visual answer to
the original question:
"(where) has this motif been observed previously?
".
If you find hits that display similarity to your own protein that extend
beyond the
matched motif (e.g., similar fold or domain), global superpositioning of
the hits
and your own model can be carried out by LSQMAN. An input file for
LSQMAN that does
this can be generated by SPASM as well, making this a very rapid process.
Finally, an
interface exists to the SBIN package of programs [4] [7]
, that can be used to analyse superimposed structures to find
similarities in their
sequences. These, in turn, can be used to attempt "database mining" in
sequence
databases such as SWISS-PROT [8]
, in the hope of identifying other proteins that might have the same
fold, or share
a common domain.
RIGOR
RIGOR is another program in the SPASM package that does in essence the
opposite of
SPASM. Where SPASM compares a user-defined motif to a database of
protein structures,
RIGOR looks for instances of a large number of predefined motifs in the
user's model.
Of course, the utility of this approach depends critically on the
quality of the
database. At present, it contains a few hand-crafted motifs, but the
overwhelming
majority has been generated automatically. These automatically generated
motifs
were extracted from proteins in the SPASM database, and consist mostly of
sets of residues whose
side chains cluster in space, or are all in close proximity to a
hetero-entity.
Just like SPASM, RIGOR is interfaced to O allowing for rapid
visualisation of the
results. Users are welcome to submit additional motifs for inclusion in
future releases
of the RIGOR database. Eventually, I hope to develop software that takes
a more
intelligent approach to detecting motifs that recur in several or many
structures.
APPLICATIONS
Obviously, the SPASM package can be tremendously useful in the analysis
of newly determined
protein structures. The programs help crystallographers to make the most
of their
models, prior to publication and deposition. After all, nobody likes to
see papers in which professional database scrutinisers (for want of a
better word) announce
that they have found an unexpected similarity between one's own protein
(the structure
determination of which may have taken you years) and some other protein
that had
been in the database for years.
In addition, SPASM can be used in comparative structural analysis, where
one will
typically be interested in finding all proteins that contain a certain
arrangement
of helices, strands, turns, and loops, or in all proteins that contain a
certain
constellation of residues or side chains. Other potential applications
lie in the areas of
protein design and engineering, and prediction of structure and
function.
AVAILABILITY
The SPASM package contains the programs SPASM and RIGOR, as well as two
programs to
generate private databases for use with these programs (e.g.
, with in-house structures that have not yet been released by the PDB).
SPASM and
friends (including databases and manuals) are available free of charge to
academic
users from
ftp://alpha2.bmc.uu.se/pub/gerard/spasm/
. Commercial users may contact GJK for more information
(gerard@xray.bmc.uu.se
). For more information about O
, contact Alwyn Jones (
alwyn@xray.bmc.uu.se
). The O WWW site is at
http://imsb.au.dk/~mok/o/
, and the Uppsala Software Factory can be found at
http://alpha2.bmc.uu.se/usf/
.
REFERENCES
[1]
Artymiuk, P.J., Poirrette, A.R., Grindley, H.M., Rice, D.W. and Willett,
P. (1994).
A graph-theoretic approach to the identification of three-dimensional
patterns of
amino acid side-chains in protein structures. J. Mol. Biol.
243
, 327-344.
[2]
Artymiuk, P.J., Poirrette, A.R., Rice, D.W. and Willett, P. (1995).
Comparison of
protein folds and sidechain clusters using algorithms from graph theory.
In
"From First Map to Final Model" (Bailey, S., Hubbard, R. and Waller,
D.A., Eds.),
pp. 71-81, SERC Daresbury Laboratory, Daresbury, U.K.
[3]
Kleywegt, G.J. and Jones, T.A. (1998). Databases in protein
crystallography. Acta Cryst.
D54
, in press. (A preprint of this paper is available at URL:
http://alpha2.
bmc.uu.se/~gerard/papers/databases.html.
)
[4]
Kleywegt, G.J. (1998). Recognition of spatial motifs in protein
structures. Submitted.
[5]
The manuals for the SPASM programs are available at URL:
http://alpha2.bmc.uu.se/usf/s
pasm.html.
[6]
Jones, T.A., Zou, J.Y., Cowan, S.W. and Kjeldgaard, M. (1991). Improved
methods
for building protein models in electron density maps and the location of
errors in
these models. Acta Crystallogr.
A47
, 110-119.
[7]
The manuals for the SBIN programs are available at URL:
http://alpha2.bmc.uu.se/usf/sb
in.html.
[8]
Bairoch, A. and Apweiler, R. (1997). The SWISS-PROT protein sequence
data bank and
its supplement TrEMBL. Nucl. Acids Res.
25
, 31-36.
Newsletter contents ...