Newsletter contents... UP

Development of the CCP4 software library II

Martyn Winn, Charles Ballard, Peter Briggs and Eugene Krissinel November 2002

What it's all about

Behind the scenes, the CCP4 software suite is undergoing a major overhaul, with the strategic aim of bringing the suite into line with the new high-throughput era. As part of this, the CCP4 software library, upon which most of the CCP4 programs depend, is being substantially re-written, and new libraries are being added. These new libraries will support new applications written in a modern object-orientated fashion. They will also act as computational modules for use in a scripting environment - particularly pertinant to automation efforts.

But Don't Panic!. All your favourite programs, and others, will continue to be supported. However, with the new framework in place, you will increasingly find new tools, more automation and better data management.

A preliminary account of the new software library was given in Newsletter 40. Therefore, this article will concentrate on recent progress and plans for the next CCP4 release.

What's happening with the core library

MMDB

The MMDB C++ library is designed to assist CCP4 developers in working with coordinate data, as obtained from PDB or mmCIF files. MMDB provides various high-level tools for working with coordinate files, which include not only reading and writing, but also orthogonal-fractional coordinate transformations, generation of symmetry mates, editing the molecular structure and others. Recently added functionality includes:

A Fortran-callable interface, built on MMDB, replicates the functionality of the old rwbrook.f library file. MMDB also provides model handling for the CCP4 Molecular Graphics project (E. Potterton, S. McNicholas, E. Krissinel, K. Cowtan and M. Noble, Acta Cryst. D58, 1955-1957 (2002)).

More information can be found on the MMDB project pages (http://msd.ebi.ac.uk/~keb/cldoc/).

CMTZ

The CMTZ library implements the hierarchical view of reflection data:

        File -> Crystal -> Dataset -> Column -> Reflection data

A `Crystal' is essentially a single crystal form, while a `Dataset' is a set of observations on a crystal. Note that the `Project' used in Data Harvesting (Newsletter #37) is now simply an attribute of the crystal.

The MTZ file format has been extended slightly to record this heirarchy. CMTZ is a C function library to read/write these extended MTZ files, and to manipulate a data structure representing the above data model. A Fortran-callable interface to CMTZ replicates the functionality of the old mtzlib.f library file.

Older MTZ files will lack the Crystal level of the hierarchy. The new library will assume that each project consists of a single crystal, unless different cell dimensions (recorded for each dataset) indicate the presence of different crystals. As with the earlier introduction of dataset information, it is important that the user establish the correct data model at an early stage, for example by the correct labelling of datasets in MOSFLM. Given a correct data model, software downstream can infer appropriate relationships and thus work in a more automated manner.

The CMTZ library can now work in two modes, one which holds all reflection data in memory, and one which leaves the reflection data on disk for sequential processing. The latter is the traditional method used by Fortran programs, but the former is likely to be more useful for newer applications. The mode can be selected by an environment variable CMTZ_IN_MEMORY.

Recent work has concentrated on testing the Fortran interface and ensuring robust support for existing programs. When this work is completed, our attention will turn to providing new and improved tools. An early target is MTZ file handling in ccp4i which is currently done by interpreting MTZDUMP output. The tcl interface to the new library enables direct access to the MTZ data structure. This is both more robust and allows more advanced graphical manipulation of MTZ files.

CMAP

Charles Ballard has written a C language library for the reading and writing of CCP4 format map files. A Fortran API mimics the existing maplib.f. This work is essentially complete.

CSYM

The old implementation of symmetry held tabulated information in a manually-produced file symop.lib, together with other information distributed amongst routines in symlib.f (e.g. real space asymmetric unit limits in subroutine SETLIM). This set-up works in most cases, but was error-prone and difficult to maintain.

In the new formulation, symop.lib is replaced by another data file syminfo.lib which is automatically generated. This is currently done using a short program which uses functions from sgtbx (part of the Computational Crystallography Toolbox, http://cctbx.sourceforge.net) . The new data file is more likely to be error-free, and is also more complete, in that many non-standard settings can be included easily. The new data file contains most quantities of interest, and only a few pieces of tabulated data are retained in the code (e.g. specifications of centric and epsilon zones).

The new CCP4 library contains C functions to manipulate this symmetry information. When a spacegroup is identified by its name, number or operators, all the information connected with that spacegroup is loaded into memory, where it can be accessed easily. Wrapper functions mimic the old symlib.f routines. Recently added functionality includes:

Other library functions

The new CCP4 library also contains a number of other functions which give the traditional look-and-feel of CCP4 programs, for example for parsing CCP4-style keyworded input and for writing the CCP4 banner at the top of the log file. There are also various utility functions which return date, time, program name, user name, etc. This functionality is now available to C-level programs as well as Fortran programs.

The library also retains various Fortran subroutine libraries where conversion is not appropriate or has not yet been attempted, e.g. certain routines in ccplib.f and all of plot84driver.f

Plans for CCP4 5.0

The next major release of CCP4 will include the new libraries as an integral part of the suite. In addition, it is hoped to include Kevin Cowtan's Clipper library and the FFTw Fourier transform library. For the average user, there should be few visible changes. There will be a few additional applications based on the new libraries, for example some coordinate manipulation programs based on the MMDB library. MTZ files will gain the CRYSTAL level of the reflection data model. There will be a graphical viewer for MTZ files which highlights the hierarchical nature of the data.

On the other hand, for developers CCP4 5.0 will provide a more powerful environment for writing applications and complex tasks. The new libraries, together with Clipper and FFTw, provide functionality for writing applications in C++, C or Fortran. In addition, much of this functionality is available to scripts written in python, tcl or perl using the SWIG-generated programming interfaces. Makefiles to be distributed with CCP4 allow the generation of shared libraries which then form loadable modules for a scripting environment.

At the moment, ccp4i executes a job as a separate process running a wish script. From CCP4 5.0, ccp4i will also be able to execute python scripts, with job parameters being saved in the database as usual. With the object-orientated capabilities of python, this allows the creation of more sophisticated, data-orientated tasks within the familiar user environment of ccp4i.

Acknowledgements

The formulation of this library has benefited from many discussions with Kevin Cowtan, York (who also provided some core functions). Alun Ashton (Daresbury) has helped with the Windows port of the library. Nick Sauter (Lawrence Berkeley NL) has given useful feedback on CMTZ. Phil Evans and Eleanor Dodson have tested the Fortran interface.
Newsletter contents... UP