Newsletter contents... UP


CCP4 Logo

mmCIF in the CCP4 Suite

Martyn Winn

Daresbury Laboratory,
Daresbury,
Warrington
WA4 4AD, U.K.
m.d.winn@dl.ac.uk


Introduction

The macromolecular Crystallographic Information File (mmCIF) format was developed by a working group of the IUCr formed in 1990. It represents an extension of the CIF format used by small molecule crystallographers, and is the IUCr-recommended medium for electronic transfer of macromolecular crystallographic data. Consequently, mmCIF is likely to be encountered more and more by practising protein crystallographers, and already occurs in a number of contexts in the CCP4 suite.

The aim of this article is to outline the mmCIF resources currently available in CCP4, as well as plans for some future ones. Some of these resources represent extra functionality of the suite, while others are resources for the program developer. While the instances of mmCIF described below vary widely in purpose, they are connected by their use of the mmCIF format, and share the semantics implied by the mmCIF dictionary, and it therefore useful to consider them together.

Full details of the mmCIF format can be found on the mmCIF Home Page or one of its mirrors. Links to the various CCP4 resources are included below.

mmCIF dictionary

mmCIF files are text files with a flexible format based around either <data_name> <data_value> pairs or a loop structure (works like a table). A wide variety of data items are supported, and these are defined in the associated mmCIF dictionary. As well as listing the data items that may be included in an mmCIF file, the dictionary details attributes of each data item, such as type, allowed range, whether or not compulsory, and to which other data items it is related. Data items are grouped into categories.

The standard mmCIF dictionary is maintained by the IUCr. However, the dictionary is designed to be extensible, and local extensions are possible which may then be submitted for inclusion in the main dictionary.

An mmCIF dictionary is distributed with the CCP4 suite as $CCP4/lib/data/cif_mm.dic, consisting of version 1.0.00 of the mmCIF dictionary together with extensions required for data harvesting. A binary symbol table representation ($CCP4/lib/data/cif_mmdic.lib) of the dictionary is built during compilation of the suite, and it is in fact this that is used by the libccif library routines.

libccif: the core mmCIF library routines

Peter Keller's C language library of routines for reading and writing mmCIF files was included in release 4.0 of the CCP4 suite. The source files are held in $CCP4/lib/ccif, and when compiled give the separate archive file libccif.a or the shared library libccif.so. These routines are used by the CCP4 library routines harvlib.f (used in data harvesting) and cciflib.f (see below). For those wishing to use libccif to read/write mmCIF files, there is a Fortran-callable interface, which is described in $CCP4/doc/ccifdoc.ps.

Data Harvesting

Data Harvesting was introduced in CCP4 4.0, and the technique has been described in Newsletter 37. The implementation of Data Harvesting uses mmCIF files to store information from the programs SCALA, TRUNCATE, MLPHARE, REFMAC and RESTRAIN for future transfer to the deposition site. These files (which should not be edited!) can be found in directory $HARVESTHOME/DepositFiles where $HARVESTHOME defaults to the user's home directory.

mmCIF reflection files

MTZ reflection files can be converted to mmCIF format by the CCP4 program MTZ2VARIOUS. The output file may then be used for the deposition of structure factors to the PDB.

Rasmol 2.7

Version 2.7 of the popular molecular viewer Rasmol, which is included in the CCP4 distribution, will display molecules input from CIF or mmCIF format files (other formats are also supported). Some restrictions are imposed, for example the chain identifier (_atom_site.label_asym_id) is restricted to one character whereas there is no such restriction in the full mmCIF format. See the program documentation for more details.

mmCIF major mode for Emacs

The editor Emacs can be run in various so-called "major modes" which allow one to set colour schemes, key binding, etc. appropriately for a particular file type. The standard Emacs distribution provides major modes for HTML, Fortran and many others. In CCP4 4.0, a file cif.el is provided which defines a major mode for mmCIF (see the top of the file for how to load it). A simple colour scheme helps viewing of mmCIF files, while a "CIF" menu provides some extra functionality, for example finding the dictionary entry for a particular data item. I hope to extend the functionality of cif.el in future.

cciflib.f: application interface for coordinate handling

It has been proposed that mmCIF should be used by CCP4 programs as a working format for coordinate files, to replace the current use of PDB files. Just as CCP4 currently only use a subset of the full PDB format, so only a subset of the full mmCIF format would be used, nicknamed "ccif". As a step towards this, a set of Fortran routines have been written (which in turn call libccif routines) which provide an application interface for CCP4 programs. This effectively replaces the rwbrook routines currently used for PDB files.

These routines were included in CCP4 4.0, see the accompanying documentation. A number of CCP4 programs have been converted to use these routines, and some additional utilities are in development. Details can be found on the developers' web pages.

REFMAC

Version 5.0 of REFMAC (at the time of writing, not yet released) will include several new features, one of which is a completely new mechanism for handling geometric restraints. Restraint information for residues, cofactors, etc. is held in dictionary files mon_lib_*.cif, which is designed to be easily extensible to include new chemical species. REFMAC will also be able to read coordinate files in either PDB or mmCIF format.


Newsletter contents... UP