CCP4 General news:

Peter Briggs, Charles Ballard, Martyn Winn, Daniel Rolfe, Graeme Winter, Ronan Keegan, Norman Stein, Francois Remacle, Paul Emsley*

CCP4, Daresbury Laboratory, Warrington WA4 4AD, UK
*Structural Biology department, York University, York,UK

Table of Content




Future Release 6.0 of CCP4

General information

The Collaborative Computational Project Number 4 in Protein Crystallography was set up in 1979 to support collaboration between researchers working on such software in the UK, and to assemble a comprehensive collection of software to satisfy the computational requirements of the relevant UK groups. CCP4 was originally supported by the UK Science and Engineering Research Council (SERC), and is now supported by the Biotechnology and Biological Sciences Research Council (BBSRC). The project is coordinated at CCLRC Daresbury Laboratory. The results of this effort gave rise to the CCP4 program suite, which is now distributed to academic and commercial users world-wide.

During its history it passed through different releases. Each of these releases adding new programs from the developers community, offering new tools and techniques to make the suite more complete in order to provide a powerful tool to its users.

Now, version 6.0 is being developed and tested. We are going to outline below its new features and improvements from last release.

What's new

In future releases, the CCP4 Suite will be separated into a number of packages, in order to provide the user with an easier way to download and install the programs, and in order to facilitate subsequent updates. In release 6.0, the following different packages will be available:

CCP4 program suite: Containing usual programs, libraries, tutorials, examples, CCP4i and new tools as you will see below.
CCP4 Molecular Graphics: From authors Liz Potterton and Stuart McNicholas, CCP4MG enables to displays molecules with simple, flexible selection tools and a variety of display styles and colouring schemes through a simple interface. It also provides different structure analysis.
COOT: From author Paul Emsley, Coot is a tool that enables to display maps and models and allows certain model manipulations: idealization, real space refinement, manual rotation/translation, rigid-body fitting, ligand search, solvation, mutations, rotamers, Ramachandran plots, model validation and others...
Phaser 1.3 and CCTBX: Developed at university of Cambridge, Phaser is a program for phasing macromolecular crystal structures with maximum likelihood methods. It currently has methods for brute force and fast likelihood-based rotation and translation functions for molecular replacement. Methods for experimental phasing are under development.
The Computational Crystallography Toolbox (cctbx) is being developed as the open source component of the PHENIX system. It contains different modules for different purpose in macromolecular crystallography.
CHOOCH: From author Gwyndaf Evans, The program CHOOCH determines values of anomalous scattering factors from raw fluorescence spectra.

The basic CCP4 program suite package will provide a series of new tools:

Bp3*: From author Navraj Pannu, Bp3 is a program for obtaining phase information from an S/MIR(AS) and/or S/MAD experiment(s) by multivariate likelihood estimation. Bp3 takes part in the works done by crank.
Crank*: From author Steven Ness, Crank is a new suite of programs for automated macromolecular structure solution. It uses an XML based framework to join many different crystallography programs into a unified whole. CRANK is intimately linked to the CCP4 package, using CCP4i for job setup and control.
Superpose and SSM: From author Eugine Krissinel, (SSM) Secondary Structure Matching is a tool for protein structure comparison in 3D. Superpose is a program making secondary structure superposition using the functions provided by SSM library.
Pirate: From author Kevin Cowtan, Pirate is a program performing statistical phase improvement by classifying the electron density map by sparseness/denseness and order/disorder, with the aim of obtaining superior results to conventional solvent mask based methods without requiring knowledge of the solvent content.
Clipper Utilities: From author Kevin Cowtan, these are some utilities providing useful functionnalities from Clipper libraries.
Chainsaw: From author Norman Stein, Chainsaw is a utility for Molecular Replacement, which mutates a template pdb file using a sequence alignment between the target and template.
* For more information about the work done by crank and Bp3 you can read the article concerning them in this newletter.

Updates from version 5 of CCP4

In addition to the new series of programs and packages, CCP4 program suite will also include the up to date version of the CCIF, Clipper, MMDB and CCP4 Libraries, the latest versions of Pdb-Extract, Molrep, Mosflm, Refmac, Sfcheck and Scala and the updated version of CCP4i (you can read the article concerning the new version of CCP4i).

OnGoing Projects and Pre-releases

In addition to CCP4 v6.0 there are other projects that are ongoing around CCP4, new delivery systems, new programs. Some of these will available as pre-release together with release of CCP4. Currently there are the following items that will be pre-released:

Linux Install Wizard: Based on the installshield technology this project is trying to create an installer as straightforward and robust as its bigger brother available on windows.
Pointless: From author Phil Evans, Pointless is a program that enables to determine the Laue groups using the symmetry functionalities of CCTBX.



CCP4 and BIOXHIT

BIOXHIT started in January 2004 and is an "integrated project" funded for four years within the 6th Framework Programme of the European Commission. BIOXHIT is coordinating scientists at all European synchrotrons and leading software developers in a joint effort to develop, assemble and provide a highly effective technology platform for Structural Genomics. CCP4 is involved in workpackages which aim to implement data management and project tracking in structure solution, and in work which complements the CCP4 Automation Project. The project currently funds one-full time programmer.

As a key part of this work the CCP4i database is currently being expanded and standardised as a Project database for non-CCP4(i) applications operating in a multi-user computing environment. The scope of the data stored in the database - both the raw data and the history record information - will also be extended as part of the project, and visualisation tools will be developed to help users make sense of the data.

The aim is to provide a system which is useful for both ongoing "work-in-progress" structure determination projects (being performed either manually or through automated systems). We are working with a variety of different partners both within and outside of the BIOXHIT project to ensure that the system will be compatible with and useful to other software projects. Currently prototypes exist for the database and the "broker" application which mediates access to it. Work is also ongoing on visualisation tools.

CCP4 is also contributing to work within the BIOXHIT framework on data models for information exchange between programs for the purposes of automation. In February 2005 CCP4 co-organised a workshop which brought together the developers of a number of automated systems to discuss the issues, and the final report from the meeting along with the supporting documents can be found at http://www.ebi.ac.uk/msd-srv/docs/bioxhit05_1.html.

The main BIOXHIT website is at http://www.bioxhit.org. Information about CCP4 and BIOXHIT can be found at http://www.ccp4.ac.uk/projects/bioxhit.html. Please contact Peter Briggs (p.j.briggs@ccp4.ac.uk) for more information about the CCP4 contribution to the BIOXHIT project.



CCP4 and e-HTPX

e-HTPX is a BBSRC-funded e-science pilot project which aims to link the various stages of protein crystallography into one single all-encompassing interface from which users can initiate, plan, direct and document their experiment either locally or remotely from a desktop computer. The e-HTPX project covers the stages from crystallisation, through data collection to structure solution. The latter is of particular interest to CCP4, and complements efforts in the CCP4 Automation project. Here we describe those aspects of e-HTPX relevant to CCP4 - for more information on e-HTPX itself, see www.e-htpx.ac.uk.

Early work looked at running CCP4 programs on clusters, and parallelisation of the underlying code. Parallelised versions of BEAST and SCALA were written using the MPI library for message passing on distributed memory systems, such as Beowulf clusters. In the former case, the aim is to make it feasible to run a slow program in a reasonable timescale. In the latter case, the aim is to turn a relatively quick program into one that is fast enough to provide real-time feedback during data collection.

Later work has looked at using clusters to do parameter space screening. As an example, a python script has been written that will perform molecular replacement using a variety of template structures, trial model generation methods, and choices of molecular replacement program. This is a very general framework within which a number of different approaches can be tested in parallel. A faster, cut-down version suitable for a desktop will be included in a later version of CCP4.

e-HTPX has also contributed effort to the DNA project, which automates data collection and processing at synchrotron beamlines (home sources may be covered later). Finally, e-HTPX is also developing tools for doing protein crystallography in a Grid environment, which will be relevant for new facilities such as Diamond.



CCP4 automations projects

Autoamore

Autoamore is a project on automated molecular replacement methods, base on the program AMORE distributed in CCP4 suite.

Solving a structure by molecular replacement using the Amore program involves running the program many times, for example rotation functions and translation functions have to be solved for separately. If there is more than one molecule in the asymmetric unit, an additional Amore run is required to find each extra molecule. Autoamore is a Python script which automates the whole procedure, thus allowing Amore to be run with no more user intervention than would be required to run other molecular replacement programs such as Molrep and Phaser. One advantage of using Amore is that it is fast, and therefore particularly attractive to users with less powerful machines.

The Autoamore script calls various other CCP4 programs in addition to Amore. Matthews is used to estimate the number of molecules in the asymmetric unit. Wilson and Baverage are used to determine the difference in B factor between the model and target data, the difference then being input to Amore using the BADD variable. The rotations and translations found by Amore are applied to the atom coordinates using Pdbset and a single output pdb file generated, suitable for subsequent input into model building/refinement programs. A check for clashing is made using Distang. Autoamore generates its own summary file, listing important parameters concisely.

Autoamore also uses the Peakmax program to check if pairs of molecules are potentially related by Translational NCS. If this is the case, the translation vector is supplied to Amore, which then positions molecules on a pairwise basis.

To use Autoamore, the user must create a simple input file, listing the names of the model pdb file and the target mtz file, the column name for the structure factor in the mtz file, the resolution limits desired and the number of residues. Once Autoamore is given the name of this file, no further user input is required. Autoamore forms part of CCP4 Automation and will also be incorporated as a module in the BMP molecular replacement pipeline.

HAPPy

We are working on a new automated experimental phasing system called HAPPy (Heavy Atom Phasing in Python). This project (previously known as PyChart) will replace and expand on the capabilities of Paul's Chart package [1]. The goal is to use processed (i.e. post-TRUNCATE) experimental data, determine the heavy atom structure and phase probability distributions, then take these to optimize the map and potentially build structure. The first release will handle SAD data only, with MAD, MIR and MIRAS modes added later.

As with several other automation projects, HAPPy is being written in Python, and will employ existing packages for the various stages of the structure solution. Where possible, CCP4 [2] programs will be used, but non-CCP4 programs will also be used where appropriate. SHELXD [3] is used for heavy atom substructure determination, followed by Phaser [4] for the SAD phasing and Pirate is used for phase improvement. Buccaneer [5] will be used for the model building in future.

HAPPy will be designed to cooperate with other automation packages, for example using the output from automated data processing software DNA/XIA-DP [6]. Well-defined APIs and data formats will be used wherever data exchange is necessary.

References

[1] Chart: www.chem.gla.ac.uk/~paule/chart
[2] CCP4: www.ccp4.ac.uk
[3] SHELXD: shelx.uni-ac.gwdg.de/SHELX/
[4] Phaser: www-structmed.cimr.cam.ac.uk/phaser/
[5] Buccaneer: www.ysbl.york.ac.uk/~cowtan/buccaneer/buccaneer.html
[6] XIA: Graeme Winter, in preparation