Newsletter contents...
mmCIF in the CCP4 Suite
Martyn Winn
Daresbury Laboratory,
Daresbury,
Warrington
WA4 4AD, U.K.
m.d.winn@dl.ac.uk
Introduction
The macromolecular Crystallographic Information File (mmCIF) format
was developed by a working group of the IUCr formed in 1990. It
represents an extension of the CIF format used by small molecule
crystallographers, and is the IUCr-recommended medium for electronic
transfer of macromolecular crystallographic data. Consequently, mmCIF
is likely to be encountered more and more by practising protein
crystallographers, and already occurs in a number of contexts in the
CCP4 suite.
The aim of this article is to outline the mmCIF resources
currently available in CCP4, as well as plans for some future ones.
Some of these resources represent extra functionality of the suite,
while others are resources for the program developer. While the instances
of mmCIF described below vary widely in purpose, they are connected by
their use of the mmCIF format, and share the semantics implied by the
mmCIF dictionary, and it therefore useful to
consider them together.
Full details of the mmCIF format can be found on the
mmCIF Home Page or one of its mirrors. Links to the various
CCP4 resources are included below.
mmCIF dictionary
mmCIF files are text files with a flexible format
based around either <data_name> <data_value> pairs or a
loop structure (works like a table). A wide variety of
data items are supported, and these are defined in the associated
mmCIF dictionary. As well as listing the data items
that may be included in an mmCIF file, the dictionary details attributes
of each data item, such as type, allowed range, whether or not compulsory,
and to which other data items it is related. Data items are grouped
into categories.
The standard mmCIF dictionary is maintained by the IUCr. However, the
dictionary is designed to be extensible, and local extensions are possible
which may then be
submitted for inclusion in the main dictionary.
An mmCIF dictionary is distributed with the CCP4 suite as
$CCP4/lib/data/cif_mm.dic, consisting
of version 1.0.00 of the mmCIF dictionary together with extensions
required for data harvesting. A binary symbol table
representation ($CCP4/lib/data/cif_mmdic.lib) of the dictionary
is built during compilation of the suite, and it is in fact this that is
used by the libccif library routines.
libccif: the core mmCIF library routines
Peter Keller's C language library of routines for reading and writing
mmCIF files was included in release 4.0 of the CCP4 suite. The source files
are held in $CCP4/lib/ccif, and when compiled give the separate
archive file libccif.a or the shared library libccif.so.
These routines are used by the CCP4 library routines harvlib.f
(used in data harvesting) and cciflib.f (see
below). For those wishing to use libccif to read/write
mmCIF files, there is a Fortran-callable
interface, which is described in $CCP4/doc/ccifdoc.ps.
Data Harvesting
Data Harvesting was introduced in CCP4 4.0, and the technique has been
described in
Newsletter 37. The implementation of Data Harvesting uses mmCIF files
to store information from the programs SCALA, TRUNCATE, MLPHARE, REFMAC
and RESTRAIN for future transfer to the deposition site. These files
(which should not be edited!) can be found in directory
$HARVESTHOME/DepositFiles where $HARVESTHOME
defaults to the user's home directory.
mmCIF reflection files
MTZ reflection files can be converted to mmCIF format by the CCP4
program
MTZ2VARIOUS. The output file may then be used for the deposition
of structure factors to the PDB.
Rasmol 2.7
Version 2.7 of the popular molecular viewer Rasmol, which is included
in the CCP4 distribution, will display molecules input from CIF or mmCIF
format files (other formats are also supported). Some restrictions are imposed,
for example the chain identifier (_atom_site.label_asym_id) is restricted
to one character whereas there is no such restriction in the full mmCIF format.
See the program documentation for more details.
mmCIF major mode for Emacs
The editor Emacs can be run in various so-called "major modes" which allow
one to set colour schemes, key binding, etc. appropriately for a particular
file type. The standard Emacs distribution provides major modes for HTML,
Fortran and many others. In CCP4 4.0, a file cif.el is provided which
defines a major mode for mmCIF (see the top of the file for how to load it).
A simple colour scheme helps viewing of mmCIF
files, while a "CIF" menu provides some extra functionality, for example
finding the dictionary entry for a particular data item. I hope to extend
the functionality of cif.el in future.
cciflib.f: application interface for coordinate handling
It has been proposed that mmCIF should be used by CCP4 programs as a working
format for coordinate files, to replace the current use of PDB files. Just
as CCP4 currently only use a subset of the full PDB format, so only a subset
of the full mmCIF format would be used, nicknamed "ccif". As a step towards
this, a set of Fortran routines have been written (which in turn call
libccif routines) which provide an application interface
for CCP4 programs. This effectively replaces the rwbrook routines
currently used for PDB files.
These routines were included in CCP4 4.0, see the accompanying
documentation.
A number of CCP4 programs have been converted to use these routines, and some
additional utilities are in development. Details can be found on the
developers' web pages.
REFMAC
Version 5.0 of REFMAC (at the time of writing, not yet released) will include
several new features, one of which is a completely new mechanism for handling
geometric restraints. Restraint information for residues, cofactors, etc. is
held in dictionary files mon_lib_*.cif, which is designed to be
easily extensible to include new chemical species. REFMAC will also be able
to read coordinate files in either PDB or mmCIF format.
Newsletter contents...