Recent improvements to the `dm' package

Kevin Cowtan

The CCP4 `dm' package is widely used for phase improvement calculations, largely because of its automation and ease-of-use. Over the past year a number of improvements have been made, in terms of the underlying techniques, as well as the user interface. The most significant of these improvements are described here.

1. Phase error estimation

In the past most of the attention in density modification methods has been devoted to the modifications which are applied to the electron density map. However this is only one part, and possibly not the most critical part, of the process. Once the map has been modified it is back-transformed to give rise to a set of modified magnitudes and phases. These phases will subsequently be combined with the initial phase probability distribution in order produce an updated phase probability distribution.

For this phase combination step to take place, the modified phases must be replaced by a probability distribution. The two distributions may then be multiplied, to provide a new (hopefully sharper) distribution. To convert the modified phase to a phase probability distribution, the error in the modified phase must first be estimated. This is a problem which has already been tackled for phases from incomplete models by the sigma-a method (Read, 1986), by which the error in the phases is estimated from the difference between the observed and modified magnitudes.

For the combination of phase probability distributions to be valid, they must contain information from independent sources. In density modification calculations however, phase information from the experimental phasing is combined with phase information from the modified map, which itself is calculated from the experimental phasing. Clearly the two sources of information are not independent, thus the existing phases are reinforced and the phase errors underestimated. The result is that the FOMs from density modification calculations are seriously overestimated. The resulting bias cripples subsequent density modification procedures, and can lead to actual errors in the model if used for phased refinement.

The solution to this problem is to obtain the density modified phases from a source which is independent from the initial phases.

1.1. Reflection Omit

The problem of phase bias was initially tackled in `dm' using the reflection omit method (Cowtan, 1996). Since experimental phasing is calculated on a reflection-by-reflection basis, the phase estimates from individual reflections are independent. Therefore if the density modified phase for a particular reflection depends only on the experimental phases from other reflections, then the density modified phase will be independent of the initial phase from that reflection.

The reflection omit scheme is therefore constructed in the following manner. A set of reflections are set to zero. A map is then calculated. Density modification is applied to the map, and it is back-transformed. The modified phases are then stored only for those reflections which were not used in the initial map. This process is repeated, omitting a different set of reflections, until modified phases are available for all the phases.

This method is successful in reducing the phase bias, but it has two disadvantages:

Each density modification step must be performed many times, making the calculation very slow, and often impractical for averaging calculations.
Omitting large groups of reflections introduces noise into the map.

1.2. The perturbation-gamma

Abrahams (1997) suggested an alternative approach to the problem. If the dependence of the modified map on the initial map is linear, that dependence may be removed by subtracting the initial map, scaled by some factor gamma, from the modified map. The dependence between the initial and modified maps may be calculated by a simple theoretical argument for solvent flattening and averaging calculations.

Histogram matching and other density modifications present more difficulties. Firstly, histogram matching is a non-linear modification, and so it is not necessarily clear that the approach will work at all. Secondly, attempts to calculate a theoretical value for gamma fail consistently.

An alternative approach has been developed in `dm' version 2.0. In this approach, two initial maps are calculated, one with the current data, and one with the current data plus a small `noise' signal. Density modification is applied to both of the maps, and the resulting map coefficients are compared. The level of the noise signal in the modified map as a fraction of the initial noise signal provides an accurate estimate of the linear dependence between the initial and modified maps, and thus of gamma. The noise-free map may then be corrected using this value of gamma. An additional benefit of this approach is that gamma may be estimated for subsets of reflections, or as a function of resolution.

This method has been applied to histogram matching. In practice the linear approximation is sufficient and the resulting phases virtually unbiased. Before bias correction it was thought that histogram matching was a less powerful (but complementary) approach to solvent flattening. Once proper bias correction is introduced, it becomes obvious that histogram matching is actually a far more powerful phase constraint than solvent flattening, but had previously appeared weaker because it was more subject to bias.

The perturbation-gamma correction is much faster than the reflection omit approach, and introduces no noise into the resulting maps. As a result it is now the default method for all density modification calculations.

After a single cycle of density modification, the phase errors estimates are found to be as good as the initial data. However each cycle of density modification introduces correlation between phases, so after further cycles of density modification the phase error estimates deteriorate. However, phases from 3, 5 or 7 cycles of density modification may still be used for phased refinement in `refmac' by careful use of the `refmac' phase blurring parameter.

2. Multi-resolution modification

Multi-resolution modification was first employed in version 1.8 of the `dm' package to exploit the fact that electron density histograms have been predicted over a wide range of resolutions. Solvent flattening and histogram matching constraints are therefore applied at two different resolutions: A low resolution map is initially calculated from a set of reflections truncated to the lower resolution. This map is modified by solvent flattening and histogram matching using the electron density histogram at that resolution. The resulting map coefficients (which extend to higher resolution) are averaged with the initial map coefficients. The new map coefficients are then used to calculate a higher resolution map. This map is modified to produce a map which is more consistent with the density constraints at both resolutions. This technique has provided a small but significant additional improvement over existing methods over a wide range of test cases, and is now widely used for real problems as well.

3. Mask refinement

`dm' contains facilities for the automatic calculation of both solvent and averaging masks. Earlier versions allowed the user to provide their own mask, which was fixed for the whole calculation, or to allow the program to calculate its own mask. In the later case, the solvent mask would be refined at each stage, but the averaging mask would remain fixed.

In version 2.0, complete control is provided over mask calculation. Input masks may be provided for the initial stages, and then recalculated as the calculation proceeds. Alternatively, auto-masks may be calculated as often or as rarely as required.

4. HTML log files

From version 2.0 the `dm' log-file is output in HTML format, to be read in a standard web browser. A contents section at the top of the log-file provides links to important parts of the calculation. The command file is linked directly back into the documentation. Extensive commentary is provided on the user command input to help detect errors and improper use of the program. A section from the logfile is shown in Figure 1.

dm version 2.0.0

dm reference:

K. Cowtan (1994), dm: An automated procedure for phase improvement by density modification. Joint CCP4 and ESF-EACBM Newsletter on Protein Crystallography, 31, p34-38.

Command input
Comments
MTZ input
Data Checking
Data Scaling
Solvent Mask
First Cycle
Output

Command Input

SOLCONT  0.45
MODE     SOLV HIST
COMBINE  PERTURBATION
NCYCLE   3
LABIN    FP=FP SIGFP=SIGFP PHIO=PHIB FOMO=FOM
LABOUT   PHIDM=PHIDM2 FOMDM=FOMDM2

Comments

Density modifications selected:

Solvent flattening
Histogram matching

Number of cycles

You have specified a fixed number of cycles. This is a good choice if you have strong data or averaging, but beware that after many cycles the FOMs will be overestimated.

Figure 1

One innovation is the use of a Java applet, JLogGraph, to display log-graphs in-line in the HTML document. This saves starting up the Loggraph utility, although this may still be used for advanced formatting or printing. An (inactive) image of a JLogGraph is shown in Figure 2.

Figure 2

The HTML formatting has been implemented through a set of library routines, so that similar features may easily be added to other programs.

References

Abrahams J. P. (1997) Bias reduction in phase refinement by modified interference functions: Introducing the gamma correction. Acta Cryst, D53, 371-376
Cowtan K. D., Main P. (1996) Phase combination and cross validation in iterated density modification calculations. Acta Cryst., D52, 43-48
Read R.J. (1986) Improved Fourier coefficients for maps using phases from partial structures with errors. Acta Cryst., A42, 140-149

Newsletter contents...