Recent ccp4bb discussions

Martyn Winn

Daresbury Laboratory,
Daresbury,
Warrington
WA4 4AD, U.K.
m.d.winn@dl.ac.uk

This article is an attempt to summarise some of the discussions recently held on the CCP4 Bulletin Board. Many of these discussions reflect common concerns of protein crystallographers, and should be of general interest. For each subject below, the original question is given in italics, followed by my summary of the responses sent to ccp4bb (together with some additional material). For the sake of clarity and brevity, I have paraphrased the responses, and all inaccuracies are therefore mine. To avoid misrepresenting people's opinions or causing embarrassment, I have not identified anyone involved: those that are interested in the full discussion can view the original messages (see the CCP4 web pages on how to do this).

These summaries are not complete, since many responses go directly to the person asking the question. While we understand the reasons for this, we would encourage people to share their knowledge on ccp4bb, and also would be happy to see summaries produced by the original questionner. While ccp4bb is obviously alive and well, we think there is still some way to go before the level of traffic becomes inconvenient.

Bulk solvent correction in REFMAC and CNS

(May and July 1999)

: We started our initial refinement using CNS (ver. 0.5). Bulk solvent correction was applied for resolution range 15.00 to 1.9 A. The value of R and R-free was very reasonable (R=21.0 %; Rfree=24.5 %).
[With REFMAC] ... the 'bulk solvent correction' results were not as good as CNS. The R-free value is also higher (26.4%) and the R - R-free difference is also higher when we use REFMAC.
May I know why this difference occurs with REFMAC? If I use SCALE TYPE BULK (taking FPART and PHIPART from CNS results), will it be a good idea?

CNS makes a bulk solvent correction by masking the protein, filling the remaining bulk solvent region with a constant electron density and then making an FFT (with a B-factor applied) to generate FPart and PHIPart for the solvent. In contrast, REFMAC uses the exponential scaling model, based on the Babinet principle which states the Fourier transform of the solvent mask is related to the Fourier transform of the protein mask by a 180 deg phase shift. In the limit of low resolution, this implies that the FT of the solvent region is directly proportional to the (known) FT of the protein region apart from a 180 deg phase shift. However, the bulk solvent often makes a significant contribution up to 4 Angstrom or so, and the Babinet principle is not applicable at this resolution (an extra Bfactor is used to down-weight the contribution in this region).

Hence, the bulk solvent correction in REFMAC is not always very accurate. Users can check the graph of <Fobs> against <Fc> to see if there is a problem. If there is, then it may be better to include the CNS bulk solvent correction in REFMAC. This is done by assigning FPART and PHIPART on the LABIN line, in which case SCALE TYPE SIMPLE should be specified. However, FPART and PHIPART will need to be recalculated whenever there is a significant change in the model.

See for example Dirk Kostrewa, CCP4 NEWSLETTER 34, September 1997.

N.B. An improved bulk solvent correction will be included in a future version of REFMAC.

Cheshire cell in AMORE

(May 1999)

: It is mentioned in the AMoRe manual that the asymmetric unit that needs to be searched for the translation function (1-body) is the Cheshire cell while for n-body translation it is the whole cell.
Can someone enlighten me as to what constitutes the Cheshire cell -Please

In terms of the translation function for Molecular Replacement, the Cheshire cell is the minimum volume which will allow a unique solution. For the first molecule it will be the cell which covers a volume from one possible origin to the next, e.g. for P212121 the Cheshire cell is 0-0.5,0-0.5,0-0.5. For P1, the Cheshire cell consists of a single point (i.e. all points in the space are equivalent), which means that in P1 the translation function for 1 mol./a.u. is already solved before you start! If you are searching for the NMOLth molecule of a set, the Cheshire cell will now be the whole primitive volume. You have assigned the origin by choosing the position of the first molecule, and the other molecules will have to be positioned relative to that choice.

A table of the Cheshire cells for all 230 space groups is available in Hirshfeld (1968) Acta Cryst A24 301-311.

How to choose same test set for related data sets

(May 1999)

: ... our situation is that we have several data sets with different ligands bound to the same protein, all in the same spacegroup and essentially the same unit cell. After solving one of these structures by molecular replacement, we intend to use that model as the starting model for the rest of the datasets, (and then of course look for the ligand in each). My understanding is that we should choose the same reflections to be the test set for all datasets in order to maintain true cross-validation in the later refinements.
My question is: how do we go about ensuring that the same reflections are chosen for the test set in all cases? We are processing the data with Mosflm/Scala, and doing refinement with X-PLOR, so a strategy that works either with mtz files or X-PLOR cv files would be fine.

CAD can be used to transfer a single test set between datasets, e.g.


cad hklin1 in.mtz hklin2 old.mtz \
    hklout out.mtz <<eof
LABI FILE 1 ALL
LABI FILE 2 E1=FreeR_flag
END
eof

transfers a reference free-R column contained in file old.mtz to the file for the current data set in.mtz. uniqueify with the -f option can be used to complete the test set to higher resolution if necessary. One suggestion was to create a dummy data set to really high resolution, assign free_R flags, and then use that as a reference for all subsequent data. This way you don't have to worry about what to do if you collect a higher resolution data set than your chosen reference set.

One word of caution: when you have a significant change in cell dimensions and/or your molecules roll around in the various forms (as you often get when co-crystallising/changing from room temp to cryo/etc.) then your freeR reflections won't really be free any more.

Applying NCS restraints to B factors

(June 1999)

I am refining a structure with six molecules in the asymmetric unit. The data set is only to 2.8 A resolution. I thought I was done with it, having R-free at about 23% and R-work at about 18%. But when I checked average B-factors, here is what I saw:


Chain name     Atoms       Bave       Bsdv       Bmin       Bmax
     M1A        2894      33.57      18.77       2.00     100.00
     M1B        2894      42.50      19.15       2.00     100.00
     M2A        2894   ** 58.96 **   24.67       4.87     100.00
     M2B        2894   ** 56.48 **   25.61       2.00     100.00
     M3A        2894      34.52      21.01       2.00     100.00
     M3B        2894      40.98      21.16       2.00     100.00
     WAT          80      31.76      12.33       5.44      69.18

Two copies of my molecule have alarmingly high average B-factors.

Refinement has been done primarily in X-plor, with one round of refinement in CNS. Strong ncs restraints have been used throughout.

How should I check whether these high B-factors indicate a problem with chains m2a and m2b, or whether they simply have high B-factors? Would rigid-body refinement be appropriate? Or annealed omit maps? Or should I remove the ncs restraints on those two copies?

Respondents supported the use of strong NCS restraints on B factors in general, but accepted that there were some genuine cases where NCS didn't apply. Some reported similar cases where strict NCS wasn't appropriate, whereas one person felt that people were too ready to give up on NCS and that it usually gave better results. It clearly depends on the specific case in question.

Many felt that having uniformly higher B factors for a chain did indicate a problem, probably inappropriate NCS although possibly something more severe such as mistracing. However, if the higher B values were only in loop regions, then strict NCS may be valid, with only local deviations. Other suggestions of things to check included inspecting the quality of the fit to the electron density, and comparing the refinement to one without NCS restraints.

Superpose and rotation angle calculation

(July 1999)

: 1. What is the easiest way to superimpose two structures and get the rotation and translation vectors?
2. Is there a program that can calculate rotation angle between two domains in a protein?

Suggestions for the first question included LSQKAB in the CCP4 suite, LSQMAN (part of the dejavu package) run via the O macro align2.omac, pdbfit routine in XtalView, and programs TOP and MAPS from Guoguang Lu (the former is now in the CCP4 suite as the program TOPP). In O, two structures can be superimposed using lsq_explicit with 3-4 equivalent residues in each molecule, and then lsq_improve is used to improve the fit for all CA's. You can then retrieve the rot-trans matrix by


  write .LSQ_RT_foo ; ;

if you named your alignment "foo".

Regarding the second question, if the two domains are structurally very similar, then the situation is the same as for question one. If they are not, then the answer depends on how the angles are defined, e.g. as the angles between the principle axes of inertia.

Newsletter contents...