------------ CCP4 Newsletter - June 1996 ------------
MRC Laboratory of Molecular Biology,
Hills Road,
Cambridge,
CB5 2QH, U.K.
E-mail: jpa@mrc-lmb.cam.ac.uk
The structure determination of F1 ATPase prompted the development of several new procedures for electron density modification [1]. This resulted in a program, called "Solomon", which has allowed the solution of at least 20 different structures over the last 18 months. The main purpose of this short notice is to draw attention to its inclusion in the CCP4 suite, and to give a brief overview of some of the concepts which, until recently, were unique to Solomon.
However, usually more information is available. Rather than being information about individual structure factors, as measured from experimental differences in intensity, this information pertains to the way the structure factors interact with one another, and can be formulated as a set of contraints in real-space. Solomon imposes contraints on solvent flatness, non-crystallographic symmetry, and on the density distribution within the ordered parts of the crystal.
In practice, one first calculates a map using the "best" phases and figure-of-merit weighted structure factor amplitudes. Then one modifies the resulting map according to real-space contraints, and from this map calculates new, modified structure factor amplitudes and phases. The extra information contained within these modified structure factors is then combined with the original phase probability distribution to produce better estimates. The process can be iterated. To summarise, the advised procedure in conjunction with Solomon is:
One should realise that the solvent will have a mean electron density which is very similar to that of the protein, and that solvent and protein are mainly distinguished by the relative featurelessness of the former, and the undulating landscape of the latter. Solomon is unique in its way of locating the solvent by determining a new map, in which at every grid point the local standard deviation of the original map is stored. The user will have to specify the radius within which the local standard deviation is determined, and it was found that a radius slightly larger than the maximum resolution of the map is optimal in most cases. After calculating such a map, Solomon suggests a contour level which will show the protein mask, given a certain solvent content, allowing inspection on a graphics workstation. Solomon uses this map to construct a solvent mask by excluding small islands of protein. If requested, this mask can be stored as an old-style "O" [4] or CCP4 mask and manipulated as any other mask. It is also possible to use solvent masks which were generated in other ways, or were edited by the user.
The solvent masks determined from the local standard deviation of the map have a higher resolution than masks determined by the method suggested by Wang [5]. The higher accuracy of the masks was found to be beneficial.
After locating the solvent, it can be modified. In conventional density modification, the density at every grid point of the solvent is replaced by the mean density of the solvent, but Solomon also allows different types of modification. The density within the solvent can be scaled as follows:
It is evident that setting the solvent multiplier kflip to zero is equivalent to flattening the solvent. Setting it to a negative value will "flip" features within the solvent, and it turns out that doing so is desirable in many cases. The constant sadd can be used to reconstruct low resolution features of the electron density, and by setting it to a negative value, the density of the protein can be "lifted" slightly above that of the solvent. This feature is used in conjunction with protein density truncation and structure factor reconstruction (see below).
The main benefit of flipping solvent features becomes apparent upon combining the modified structure factors with the experimental phase probability distribution. The intricacies of the recombination are beyond the scope of this short report, but an attempt will be made to give the reader a flavour of the sort of difficulties associated with this computation. It can be shown that, provided the sources information are independent, the optimal way of combining the information from model structure factor amplitudes with experimentally determined ones, is through sigmaA-weighting [6]. In the case of density modification, the information carried by the modified structure factors is not strictly independent from the experimental data, but with current methodology it is not possible to calculate how dependent the information actually is. In fact, the degree of independency will vary from structure factor to structure factor and will also crucially rely on the restraints imposed in real-space. As a result of treating the sources of information as independent, the recombined data will be biased. By flipping the solvent instead of flattening it, the modified structure factor amplitudes will be made more different from the original ones, and therefore they will appear to be more independent. Any (accidental) improvement of the phases will result in a more featureless solvent, and the next iteration there will be less density to flip. As a result, the solvent does get flatter as the flipping procedure is iterated, not so much because the solvent is biased to be flat, but rather because of other phase improvements. Solvent flattening cannot be iterated in a similar fashion, but it is entirely possible (and desirable) to flatten the solvent on the very last cycle of the solvent flipping procedure: the bias introduced at this point will not be propagated. It was found that the reduction in the R-factor between the experimental structure factor amplitudes and the modified structure factor amplitudes on the very last flattening cycle is an accurate indicator of the overall phase improvement.
There is a relationship between the solvent content of the crystal and the optimal value for kflip. The higher the solvent content, the less negative kflip should be. If the solvent content is about 30%, kflip should be set to -1.8 to -1.6, if the solvent content is about 50%, the optimum value is about -1, and if the solvent content is 70-75%, a kflip of zero seems to be optimal. Also the amount of averaging influences the optimal value of kflip: with two- or threefold averaging, one should set it at 60% to 80% of the value one would choose in the absence of non-crystallographic averaging, sixfold or higher non-crystallographic symmetry averaging is incompatible with solvent flipping and requires a value for kflip of zero.
In density truncation, grid points within the protein region which have a density below a certain specified threshold, are assigned a density equal to this threshold [7]. The result of density truncation of the protein region is that features of high density become sharpened relative to features of lower density. As such it is comparable to histogram matching techniques. However, the sharpening resulting from truncation seems to be beneficial even at resolutions at which the modification resulting from histogram matching is virtually non-existent. Because of this, density truncation was preferred over histogram matching. Other protocols for sharpening protein features were explored, but none of them were very successful, with one exception. In some cases is better to set the density of truncated grid points to the mean density of all truncated grid points, rather than to the threshold density.
As a result of truncating the density within the protein region, the overall variance of electron density of the protein region relative to the solvent region decreases. This means that the density of the solvent region has to be scaled down since it was found desirable to maintain a constant ratio between the two. This is done automatically by Solomon if requested. Another result of truncation is that the mean density within the protein region will increase, relative to the mean density of the solvent region. Since it was found that this is undesirable if one is reconstructing missing structure factors, this can be corrected for by assigning a value to sadd in equation (1).