Development of the CCP4 software library
Martyn Winn, Charles Ballard, Peter Briggs and Eugene Krissinel
Status
This development effort is now over, and the new C/C++ libraries
are integrated into the CCP4 suite. This document is largely historical!
Aims
The CCP4 software suite is based around a library of routines which
cover common tasks, such as file opening, parsing keyworded input,
reading and writing of standard data formats, applying symmetry
operations, etc. Programs in the suite call these routines which, as
well as saving the programmer some effort, ensure that the varied
programs in the suite have a similar look-and-feel.
Around the period 2001-2003, there was a major effort to re-write
much of the CCP4 library. The aims were:
- To implement a better representation of the underlying data model.
For example, Eugene Krissinel's mmdb library acts on a data structure
which represents the various levels of structure of a protein model.
The new MTZ library encapsulates the crystal/dataset hierarchy that
is increasingly being used by programs.
- To maintain support for existing programs. In particular, the
existing Fortran APIs will be maintined, although they will now often
be only wrappers to functions in the new library. It is hoped that many
existing programs will be migrated to using the new library directly.
- To provide support for scripting. It is possible to generate APIs
for Python, Tcl and Perl automatically from the core C code. Thus, much
of the standard CCP4 functionality wil be available to scripts used
e.g. in ccp4i or the molecular graphics project.
This incremental approach, maintaining the existing suite while
improving the underlying code, puts constraints on what is possible, but
is considered more appropriate for a collaborative project like CCP4.
Major components
For a lengthier description, see the article in
Issue 40 of the CCP4 Newsletter.
Here are some useful links:
- Strategy and
Design documents for the new library.
- Extensive
documentation of the library generated from the source code using
Doxygen
This is now integrated into the main
CCP4 documentation.
- Some simple examples of using cmtz are
here.
- Some simple examples of using csym are
here.
- A description of the CMTZ library from June 2001
- Another description of the CMTZ library from August 2001
- Eugene Krissinel's pages
on the MMDB coordinate library
- My brief summary of
SWIG used to generate
interfaces to the python, perl and tcl scripting languages.
Data Model
A data model for high-throughput protein structure determination is
being developed by staff at the EBI.
Most effort has so far gone into protein production and crystallisation.
But X-ray data collection and structure solution are now being considered.
Look at these examples:
The new library will aim to be consistent with this data model.
See Also
Acknowledgements
cmtzlib was originally based on Jan-Pieter Abrahams'
functions in the program solomon. These have been
substantially altered and added to, and so all problems are my
creations, but I am grateful for the excellent starting point.
The formulation of this library has benefited from many discussions with Kevin
Cowtan (who also provided some core functions), Eugene Krissinel, Airlie McCoy,
and all members of the Daresbury team (Alun Ashton, Peter Briggs, Charles Ballard).