Recent CCP4BB Discussions

Maria Turkenburg (mgwt@yorvic.york.ac.uk)
March 2000

Back by popular demand

In the October 1999 Newsletter, Martyn Winn started what will, hopefully, become a trend in keeping track of interesting discussions on the CCP4BB. To make things much easier for both the users of the bulletin board and the track-keeper, members who ask questions or instigate discussions on the board are now asked (urged!) to post a summary of all the reactions received, whether on or off the board.

The introduction to the October 1999 version also goes for this article:

For each subject below, the original question is given in italics, followed by a summary of the responses sent to CCP4BB (together with some additional material). For the sake of clarity and brevity, I have paraphrased the responses, and all inaccuracies are therefore mine. To avoid misrepresenting people's opinions or causing embarrassment, I have not identified anyone involved: those that are interested in the full discussion can view the original messages (see the CCP4BB web pages on how to do this).

These summaries are not complete, since many responses go directly to the person asking the question. While we understand the reasons for this, we would encourage people to share their knowledge on CCP4BB, and also would be happy to see summaries produced by the original questioner. While CCP4BB is obviously alive and well, we think there is still some way to go before the level of traffic becomes inconvenient.

We, or at least I, would like to thank Sasha Urzhumtsev and Xavier Gomis Rueth for their various attempts to get the summarising off the ground. It seems to have paid off. And thanks to all the users who are now dutifully posting summaries. Finally I would like to thank Eleanor Dodson for her corrections and additions.

Subjects covered in this newsletter's offering, are:

'dm'

Averaging

Masking

Twinning

Playing with Symmetry

Local Scaling in P1

CAD in R32

Unexpected and artificial values

Temperature Factors and the Wilson Plot

Anisotropic Scaling to get at 'true' Bfactors

Artificial Bfactors for NMR Structures

What to Expect of Correlation Coefficients in AMoRe

Predicting the Number of Reflections

FOMs in MLPHARE for MAD

REFMAC for Partial Poly-Ala

Various

AMoRe and CCP4 Asymmetric Unit

Distinguish Calcium from Magnesium

Predicted Rfree-R Difference

Packing Density

f' and f''

'`dm`'

(October and November 1999)

Averaging

For a case of complicated NCS-symmetry, namely six copies of a protomer grouped as three dimers A (consisting of A1 and A2), B and C, the following symmetry operators are known:
inter-dimer: A -> B B -> C intra-dimer: A1 -> A2 (from subunit 1 of dimer A to subunit 2 of dimer 1) B1 -> B2 C1 -> C2
where the operators consist of a 3x3 matrix plus a vector. The question is: is this a representative set of operators for successful averaging?

The following fundamental principle should be applied:

YOU ONLY NEED ONE MASK FOR `dm'.
IF YOU HAVE MORE THAN ONE MASK YOU ARE DOING SOMETHING WRONG (unless you are an expert and are solving a very specific problem, in which case you know all about it). What you need to do is supply one mask, which covers A1, and the operators which map A1-A1 (i.e. the identity), A1-A2, A1-B1, A1-B2, A1-C1, A1-C2.

If your dimer rotations are self-inverse, then your mask could equally well cover A instead of A1. If all six NCS operators form a closed group, then you could use a hexamer mask.

: How to generate the symmetry operator A1-B2 if you just know thE operators A-B and B1-B2?

(A1-B2) = (B1-B2)x(A-B) , i.e. multiply the matrices, take the right order! As all symmetry operators X consist of a 3x3 matrix M AND a vector t, for the multiplication you have to multiply the augmented 4x4 matrices:

             (m11 m12 m13  t1)
X= M + t =>  (m21 m22 m23  t2)
             (m31 m32 m33  t3)
             ( 0   0   0    1)

                                       (m(a1-b1)11 m(a1-b1)12 m(a1-b1)13  t1)
X(b1)= {M(a1-b1) + t(a1-b1)} X(a1) =>  (m(a1-b1)21 m(a1-b1)22 m(a1-b1)23  t2) X(a1)
                                       (m(a1-b1)31 m(a1-b1)32 m(a1-b1)33  t3)
                                       (    0          0          0        1)

                                       (m(b1-b2)11 m(b1-b2)12 m(b1-b2)13  t1)
X(b2)= {M(b1-b2) + t(b1-b2)} X(b1) =>  (m(b1-b2)21 m(b1-b2)22 m(b1-b2)23  t2) X(b1)
                                       (m(b1-b2)31 m(b1-b2)32 m(b1-b2)33  t3)
                                       (    0          0          0        1)

so 

X(b2)= {M(b1-b2) + t(b1-b2)} {M(a1-b1) + t(a1-b1)} X(a1)

But it is much easier to use LSQKAB (please use your local version of the CCP4 Program Documentation to view this) to fit the coordinates of A1 onto those of B2, etc.

Masking

: Is there a way to harness the averaging to generate a better map from which to derive the solvent mask; effectively a single run of maskless averaging?

A dimer mask (proper NCS) made using MAPROT and MAPMASK turned out dismal. On the other hand, the one from 'dm', assumed to be calculated on the unaveraged map, is much better. An NCS mask can be simply made by putting an atom of 40Å radius at the centroid of the heavy atom position. In tests this is about as good as a monomer mask.

This stimulated a lively discussion on the use of the phrase "maskless averaging", after which the question was rephrased as How to improve the original 'dm' solvent mask?.

There are several ways of using averaging and the answers need to be summarised differently for each application. It is assumed you know the NCS operators which map the "master" molecule onto all other copies in the asymmetric unit.

To average an existing map using known NCS operators - this is Eleanor's definition of maskless averaging.

If you are only interested in averaging a MAP there is no reason, except for speed, to use a mask. Those parts of the map which obey the NCS symmetry will be improved by averaging, whilst those parts which do not, will deteriorate.

If the NCS operators are "proper" or "form a closed group" such that there is a NCS-defined pure rotation, the whole complex will be improved. The density for molecule 1 will be overlapped and summed with that of molecule 2, that of molecule 2 overlapped with that of molecule 3, until the complete rotation is done.

If the NCS operators are "improper", only the master molecule will be improved.

Hopefully this will also generate a better molecular boundary and thus a better intial solvent flattening mask.
The NCS operators can be refined to improve the correlation between different copies of the electron density. This refinement will be done using density within some masked region but this need not cover the whole molecule. This restricted mask can be obtained in various ways. The simplest is to put a single "atom" at the centre of the molecule, then generate a mask around that "atom" assigning it a large VDW radius.

The centre of mass can be decided either by inspecting the map, or possibly, if the NCS was determined by fitting heavy atoms within each molecule, it could be the centre of the heavy atom cluster. If the map is very noisy, it may be sensible to only use the strong density to refine the NCS operators.
However once the NCS operators have been refined, it is sensible to use density modification to improve the phases. This is done in the following way
1. The maps are averaged over the volume of a single molecule; i.e. a mask is needed. If no mask is supplied the program 'dm' endeavours to determine one, but this procedure may not be as effective as can be determined from visual inspection of an averaged map.
2. The asymmetric unit of the crystal is reconstructed with the improved density in place (and flattened density everywhere outside the mask).
3. Structure factors are calculated by transforming the density to give modified phases.
4. The modified phases are combined in some way with the starting set (this step will be seriously hampered if the low resolution data are missing).
And the whole process cycles round, with the option of phase extension as well as phase refinement.

N.B.

New versions are available of the map manipulation utilities MAPMASK and MAPROT. The changes dramatically simplify the process of cutting out a region of density for use as a molecular replacement search model, e.g. for locating NCS or multi-crystal averaging operators.

As a result, map cutting may be performed by supplying MAPROT with a work map and mask (WRKIN+MSKIN), leaving the unit-cell map (MAPIN) blank, and giving a single operator (which may be unitary) along with GRID and CELL for the cutout map. The cut density is written to the CUTOUT file.

The new maptuils are available by ftp as follows:

> ftp ftp.yorvic.york.ac.uk
login: anonymous
password: type_your_full_email_address_here
ftp > cd pub/ccp4
ftp > get maputils.tar.gz
ftp > quit
> gunzip maputils.tar.gz
> tar xvf maputils.tar
> cd maputils/maprot_
> makemaprot
> cd ../mapmask__
> makemapmask

Twinning

(January 2000)

What conditions have people succesfully used to overcome twinning in crystallization?

What MIR structures have been solved using (de)twinned data (besides cephalosporin synthase)?

Have any structures been solved by MAD using detwinned data?

Has anybody experience in detwinning pseudomerohedral twinned data?

Thanks to the original enquirer for the summary:

Conditions to prevent twinning
Many people mentioned the use of organics in small quantities (0.1-2.0%), particularly dioxane (0.5 +/- 0.4% ; 0.1-0.5%).
Also ran: acetone, DMSO, PEG200, ethanol, n-butanol, glycerol, DMF or b-octylglucoside (5-15mM), MPD (1-3%).

Otherwise change anything: pH (~0.5 unit, or less cf Wim Hol's subtilisin: 0.1 pH unit)

temp (Frazao, Acta Cryst. D55 (1999), 1465)

additives/detergents

ligands

crystallization method etc.

And improve the protein!
Twinned MIR structures
Best answer: search for low twin fraction. However, some structures were solved with proper detwinning (still, no higher than 25-30%?):
- Goldman et al., muconate lactonising enzyme, JMB (1985)
- Hillig, R.C., Renault, L., Vetter, I.R., Drell, T., Wittinghofer, A., and Becker, J. (1999). The crystal structure of rna1p: A new fold for a GTPase-activating protein. Molecular Cell. 3, 781-791
  Using a program specifically written for tetragonal spacegroups.
- Forst, D., Welte, W., Wacker, T., and Diederichs, K. (1998). Structure of the sucrose-specific porin scry from salmonella typhimurium and its complex with sucrose. Nature Structural Biology 5, 37-46
- Wang et al., in progress (MIT), using DETWIN and SOLVE

Detwinning for MAD has not yet been implemented in most software, is not exactly trivial; sometimes succesful MAD by ignoring the twin, when this is only a minor fraction (3-10%). The non-triviality was posed as follows:

Theoretically this can be done. However, when the data is detwinned
you must take precautions that the twin operator deconvolutes I+ to I+
and I- to I-. Otherwise, the detwinning loses your anomalous signal.
As I understand, the CCP4 DETWIN program does not support detwinning of
anomalous data.  

That brings up the next point. For the most part a SeMet MAD experiment
is a low signal experiment (I am assuming you are talking about a
SeMet MAD experiment). Detwinning, while helpful, is not perfect and will
remove some if not all of the signal. Also, with the presence of a twin
operator, you will have two solutions related to each other by the twin
operator. So, finding the consistent solution with its cross peaks
becomes a major undertaking.

In my opinion, to solve a MAD twin problem you will need to have a very
high signal MAD experiment with few sites.

Nice but very complicated example from Eleanor Dodson, in P21 with a* = nc*.

Finally lots of referrals to general things:

$CHTML/ pages on twinning in CCP4 (but please use your local version of the CCP4 Program Documentation to view this)
Crystal Twinning Server
SHELXL refinement against twinned data
DETWIN (please use your local version of the CCP4 Program Documentation to view this).
New (CCP4 Suite version 4.0) features: DETWIN now prints twinning tests for a range of twinning fractions.
- The Yeates plot of <H> = <(Itw1 -Itw2)/(Itw1+Itw2)>.
  The estimate of the twinning fraction is given by 1/2-<H>.
- The Britten plot: essentially the number of negative intensities generated by twinning fractions ranging from 0 to 0.5.
- Moments of <E>**2; these have characteristically different values for twinned and proper data (the same plot is given in TRUNCATE; please use your local version of the CCP4 Program Documentation to view this).
- Correlation coefficients betwee I1 and I2 after detwinning. Ideally these should be zero, but the correlation coefficient can be distorted when there is NCS aligned with a possible twinning axis.
A specific twinning fraction only needs to be given if you want to write detwinned data to HKLOUT.
Detwinning program of Rams (S. Ramaswamy) in Uppsala, referred to in cephalosporin synthase paper, Nature 394 (1998), 805-809

Playing with symmetry

Local Scaling in P1

(January 2000)

In a case where 4 complete 360 degree rotations of data were collected for a high symmetry (spacegroup 96) crystal, scaling all the data together resulted in high Rvalues, presumably due to absorption effects. Since the object was to measure a weak anomalous signal at a relatively long wavelength, an idea would be to take the scaled, unmerged data in P1 and apply a sliding box type of local scaling, averaging all the Bijvoet pairs down to the asymmetric unit in the process. The Matthews & Czerwinski protocol (Local Scaling: A Method to Reduce Systematic Errors in Isomorphous Replacement and Anomalous Scattering Measurements (1975), Acta Cryst A31, 480-487) can not be used, since their derivation assumes a comparison of only two quantities (e.g. F+ and F- or Fnati and Fderi). Here, it would be desirable to average over all 16 of the symmetry-related reflections.
Questions:

Does this sound nuts? It seems to me that if one chooses a sufficiently large box, the simultaneous estimation of even a large number of local scale factors might remain robust.
Can anyone recommend a program to do this? Or toss some snippets of code my way?

Some thoughts on the first, and a firm answer to the second question:

I don't think going to P1 will help you - the improvement in R factors is purely cosmetical. Did you look at Rmerge statistics [Nat Struct Biol 4, 269 (1997)]? These should reveal that the high redundancy actually helps the quality.
I don't quite understand why absorption effects should be so detrimental - I twice collected data at the Fe edge and they scaled beautifully. Could there be a problem with mis-indexing due to origin offset?
If you use XDS/XSCALE you have the option to only use Friedels that are a given max number of frames apart for calculating the anomalous signal.
SCALA (please use your local version of the CCP4 Program Documentation to view this) has a number of useful options to do this. The latest version (ftp://ftp.mrc-lmb.cam.ac.uk/pub/pre/scala_2.7.1.tar.gz or CCP4 release 4.0) has a spherical harmonic scale parameterisation which should work well in this case. It will probably only work with data integrated with MOSFLM, since e.g. SCALEPACK declines to reveal the essential geometrical information in its output.

Then two references to sliding box scaling, the 'original' way:

J Appl Cryst 30: 176 (1997)

Met Enzymol Vol 276, pp. 461-472.

CAD in R32

(September 1999)

: I am running CAD on a data set in the spacegroup R32 and am wondering why cad does what it does. Input reflections (4 -2 6) and (4 -1 -7) are transformed to (2 2 6) and (3 1 -7), respectively. How are these reflections equivalent? The matrix that seems to perform this transformation, is not a symmetry operator for R32. Nor is the transpose, which would presumably cater for real space. Can anyone tell me what asymmetric unit CCP4 uses for R32? And why?

CAD is correct, and those pairs of reflections are indeed equivalent.

Rember that if you generate a symmetry equivalent position in real space using this expression:

[Xj]     (Sj11 Sj12 Sj13 St1)[X1]
[Xj]  =  (Sj21 Sj22 Sj23 St2)[X1]
[Xj]     (Sj31 Sj32 Sj33 St3)[Z1]
         ( 0    0    0     1)[1 ]

an equivalent reflection in reciprocal space is generated by:

                        (Sk11 Sk12 Sk13)
[Hk Kk Lk]=  [H1 K1 L1] (Sk21 Sk22 Sk23) 
                        (Sk31 Sk32 Sk33)

For R32 a symmetry matrix: -Y,X-Y,Z can be represented as:

[X2]         ( 0 -1  0 0)[X1]
[Y2]      =  ( 1 -1  0 0)[X1]
[Z2]         ( 0  0  1 0)[Z1]
             ( 0  0  0 1)[1 ]

The equivalent reflection is

                        ( 0 -1  0)
[Hk Kk Lk]=  [H1 K1 L1] ( 1 -1  0)  = [ K1,-H1-K1,L1]
                        ( 0  0  1)

For R32 the complete set of reciprocal space equivalents are:

[h,k,l] [k,-h-k,l] [-h-k,h,l] [k,h,-l] [-h-k,k,-l] [h,-h-k,-l]
and the equivalent [-h,-k,-l] sets

Symmetry operator 5 gives (4,-2,6) equivalent to (-2,-2,-6) and negating all signs gives (2,2,6).
Symmetry operator 5 gives (4,-1,-7) equivalent to (-3,-1, 7) and negating all signs gives (3,1,-7).
(-h-k) is sometimes written as i, and people refer to the four-index Miller-Bravais symbols (h k (-h-k) l):
(4 -2 6) becomes (4 -2 -2 6); (4 -1 -7) becomes (4 -1 -3 -7).

Unexpected and artificial values

Temperature Factors and the Wilson Plot

(January 2000)

: In a straigtforward REFMAC refinement of a structure with MAD phasing, with data good to 2.4Å, TRUNCATEd and UNIQUEified as should be, unexpected Bfactors ranging from 40 to 130 (average 65) appeared. The Wilson B value was 56.
I guess I have no real specific questions except to wonder if this is telling me something about the quality of either my data, my model, or my crystal. I have been unable to find a reference for the acceptable values for a Wilson B factor. Would anyone suspect that I ran TRUNCATE improperly? Any thoughts would be appreciated.

Thanks to the original enquirer for the summary:

: The general consensus was that I have nothing to worry about. Several people pointed out that most temperature factors for protein structures are underestimated leading to the bias that "good" structures should have average temperature factors in the 20-30 range.
A couple of people asked about the solvent content of the crystal. As some of you guessed, this is high. I have a Vm of 4.0 yielding an approximate solvent content of 69 %. This explains some of the thermal motion. It was also suggested to check the native as well as the SeMet data. The Wilson B values are high for all data sets that I have, however the native is the lowest.
I was also advised to examine the Wilson plot to look for typical features of the dip around 5.5Å and the peak around 4Å. In this respect, my Wilson plot looks fine. One last suggestion was to compare the results to refinement with another program. I will give this a try.
A selection of a few comments I received:
As far as I can tell from your post your data seem to be alright. A B of 56 is quite high but your data extend to 2.4Å resolution only, so that is consistent. A model B being slightly higher than that is also ok, since your model is still incomplete. The only thing you need to watch is whether your model B keeps increasing from one round of REFMAC refinement to the next. If it does you should set the overall value back to the 56 before a new round.
I believe that average B factors given for structures are more often too low than too high. If the data peters out at 3Å it is very difficult to get any sensible estimate of a Wilson plot gradient, but it is most likely that a true Av_B should be in the range 80-100, rather than 10 as is sometimes given!
REFMAC values should reflect the gradient of the Wilson plot, and they do seem to do so quite well in your case.
Seems fine to me, the low B factors (~20) were normal when people only worked on crystals that diffracted strongly on weak x-ray sources. Synchrotrons and mirror optics allow the measurement of weaker diffracting crystals (with higher B-factors).
I found that different crystals within one batch might yield quite different B-factors (45-75 avg). But they didn't really make any difference to the biological interpretation.

Anisotropic Scaling to get at 'true' Bfactors

(September 1999)

: For a 2.3Å structure, using strict NCS and group Bfactors (2 per residue) in XPLOR, with anisotropic scaling, improves the Rfactors but the Bfactors start to behave strangely. They increase, and some main chain atoms have higher Bfactors than the side chain atoms in the same residue. This is found in the Protein Data Bank, too. Am I the only one worried about it?

The general consensus is that striving for low Bfactors is a thing of the past. High Bfactors can be very 'true' Bfactors. Relatively low side chain Bfactors would make sense if they are buried, but connected to an exposed piece of main chain which would then have higher Bfactors. The PDB is not a reliable source of information about Bfactors, since the scaling algorithms in XPLOR inevitably mean that most low resolution structures have totally unrealistic Bfactors. XPLOR scales Fobs rather than Fcalc, which, especially at lower resolution and at the beginning of a refinement procedure, could cause problems. It would be sensible to start refinement not with an average B of 20 (as output by many model building programs), but with an average B that represents the Wilson plot as closely as possible. Also, to apply Bfactor corrections sensibly before each round of refinement.
There may be valid reasons for adjusting the Bfactor of the Fobs, for making maps or Patterson or rotation function searches, sharpening the data to accentuate the high resolution details, but the final atomic Bfactors should be refined against an uncorrected dataset.
With regards to the use of NCS, most people now accept that relaxing it, with care, is good crystallographic practice, and could have a beneficial effect on Bfactors in refinement.

Artificial Bfactors for NMR Structures

(October 1999)

: Can anyone tell me what program I can use to compute artificial temperature factors for a NMR structure based on rmsd of the average structure?

Have a look at:: Wilmanns & Nilges, Acta Cryst (1996) D52:973-982. Molecular Replacement with NMR Models Using Distance-Derived Pseudo B Factors. Script available from Jovine Luca.
and also at:: Molecular Replacement and NMR models - what gives? and the corresponding script through FTP.

What to Expect of Correlation Coefficients in AMoRe

(September 1999)

: The correlation coefficient in AMoRe's FITFUN behaves unexpectedly in an otherwise fairly straightforward molecular replacement case. The starting cc in FITFUN is much lower than the one that came out of the rotation and translation steps, while there is also an unexpected Bfactor correction. Interestingly, this behaviour is not seen when using E instead of F.
Following a request to do so, more specific information was supplied: the data extends to 4Å only, are very strong to 7Å and trail off quite steeply. Estimating the Bfactor from the Wilson plot is therefore not straightforward, but put at approximately 50Å². The average Bfactor of the search model is 44Å².

If you calculate the correlation between Fobs and Fcalc you will always get a positive outcome because they are correlated in the sense that both decrease with resolution. If you use E values you do not see this since E values do not decrease with resolution, as you noted. In your case the effects may have been more severe, perhaps due to too large B-values for the search model?

Predicting the Number of Reflections

(January 2000)

: What is the easiest way to calculate the theoretical number of reflections in a particular resolution shell given that all cell dimensions are known?

Quick and dirty: Get volume of reciprocal lattice unit, and divide volume of spherical shell by this.
More precise: Run UNIQUE to generate a reflection list within the resolution range required and see how many sets are generated.

FOMs in MLPHARE for MAD

(September 1999)

: Does anyone have a feel for whether MLPHARE will over estimate the FOM from its MAD analysis? Secondly, if you include the native and three wavelengths, given the sites are all the same XYZ and differ only slightly in AOCC and OCC, how do we judge the FOM?

It most definitely will if you refine the xyz and occupancy for all wavelengths at once. There is an implicit assumption in the algorithm that each "derivative" is independent, which is obviously not true when a "derivative" is taken as another wavelength, with all differences arising from the same atom sites. The more data sets at different wavelengths with the same sites, the fatter the FOMs get.

The best way to minimise this problem is to refine the coordinates and temperature factors against the largest set of dispersive differences for the centric reflections only (assuming you have such reflections). After that refinement fix the xyz and B for all wavelengths, and the ISOE and occupancies for that pair of dispersive differences. Then estimate the ISOE and occupancies for all other pairs of dispersive differences.

Only then with all XYZ, B, OCC and ISOEs fixed, start refining the anomalous occupancies for all wavelengths. There is a good deal of anecdotal evidence that many wavelengths actually gives worse maps than 2 or 3. This is probably not because the phases are worse, but because the FOMS are overestimated.

Be careful about including the native in the MAD phasing. I wouldn't put it in as the native, but as a derivative (with negative OCC and 0 AOCC). If you put it in as the native, the error estimates in the differences with all the other data sets may be bigger than your actual signal, unless the latter is truly gigantic. In that case the FOMS do reflect the quality of the phases (both would be kind of bad).

REFMAC for Partial Poly-Ala

(January 2000)

: The problem we encountered is with a partial poly-Ala trace and poor MIR phases to 3Å. We employed example 9 in the REFMAC documentation, described as "very bad model, rms error 2Å". The resultant model is drastically distorted with some Ca-Ca distances over 6Å long. The question is which protocol should be followed with an initial Ca trace at 3Å?

Thanks to the original enquirer for the summary:

A straight answer to the question about the distortion, was found in the input to PROTIN: the start of the polypeptide chain needed to be dealt with slightly differently (NTERM 1 rather than NTERM 195).

The question also sparked a discussion about the use of all data in refinement, and the use of torsion angle refinement to decrease the number of parameters to refine. In this case, the data really only extended to 3Å, and careful testing of REFMAC and CNS produced very similar and equally interpretable maps.

Maximum-Likelihood refinement using experimental phases (as in REFMAC) can deal with poor/partial models correctly, if the phases have reliable FOMs. Including all data works best providing you are estimating errors from the Rfree set. Initially the weights assigned to the outer resolution bins are very small, but they still contribute significantly to the important overall scaling parameters. The amount of bias is often a function of the amount of missing data. By default REFMAC substitutes these reflections as Dfc instead of the 2mFo-dFc used for observed data, which usually gives better maps than omitting them altogether (that is tantamount to setting them to zero), but any such term will introduce bias. If you have many missing reflections you may have a problem.

Do not expect miracles, and keep an eye on the theory: ML is a minimisation CRITERION, and NOT a minimisation method. One can compare it with the least-square criterion. It works in the same way except that instead of fitting calculated magnitudes to the observed ones, it suggests to fit them to some MODIFIED VALUES, and these modifications are estimated through the model quality and completeness (see, for example, Maximal Likelihood refinement. It works, but why? in the October 1999 CCP4 Newsletter). If the model is incomplete, it is wrong to fit the FC only to experimental amplitudes (as is the case in the LS refinement) and ML attempts to introduce corrections.
These corrections are estimated in resolution shells. If your model does not fit at all your data at a given resolution shell, the ML puts the "corrected experimental magnitudes" in this shell near to zero. As the model improves, the weighting will increase. This is similar to the "good old scheme" of slowly increasing the resolution to include in the refinement, but less subjective.

Various

AMoRe and CCP4 Asymmetric Unit

(January 2000)

: Is there a program to transfer AMoRe solutions into the CCP4 asymmetric unit (i.e. just redefining rotation and translation from the AMoRe output)?

Any MR program is simply going to find a CRYSTALLOGRAPHIC solution, and not care about building a sensible model. The following procedure would take care of this:

Always shift the first molecule to have translations between -1/2 and 1/2.
Assuming you apply the solution to the coordinate file OUTPUT from the TABFUN run, which has its centre of mass at (0,0,0), proceed with:
```
pdbset xyzin tabfun_out.pdb xyzout soln1.pdb
cell A B C alpha beta gamma (new cell)
rota euler ALPHA BETA GAMMA
shift frac Tx Ty Tz
CHAIN A
end
```
Then for further molecules you may well want a symmetry equivalent of the AMoRe solution to build up a tetramer or something.
Easiest way:
- Generate all molecules from the AMoRe solutions with different CHAIN IDs and make one coordinate file (all_solns.pdb).
- Run:
```
distang xyzin all_solns.pdb 
SYMM whatever
RADI CA 4
dist VDW
END
```
- That will give you contacts between molecules, complete with symmetry codes: e.g. 2 -1 0 1.
  There are various choices, but once you see that maybe there are better contacts between molecule 1 and molecule 2 using symmetry 2 -1 0 1, you would generate a new version of molecule 2:
```
pdbset xyzin soln2.pdb xyzout soln2a.pdb 
SYMGEN -X-1, Y+1/2,-z+1  (symmetry 2 -1 0 1 in spacegroup P21)
```
- Then rebuild the all_solns.pdb with the new soln2a.pdb and run distang again.

You are aiming to get a nice compact molecule with good contacts produced by symmetry X,Y,Z generated by particular symmetry operators. Can be messy but it saves a lot of time on the graphics later!

Distinguish Ca²⁺ from Mg

(December 1999)

: I am having troubles in identifying two metal ions in a structure. I used Mg²⁺ in sample preparation and Ca²⁺ in crystallization. The coordination geometry, refinement statistics (both Rfactors and Bfactors) and maps do not resolve the ambiguity.
I would like to know about the experts' opions. The queston in my mind is whether the bonding distance with the coordinating oxygens is a discriminator (~2.3Å in my case)? Also how much the B-factor can tell? (in my case, Mg²⁺ ~12, Ca²⁺ ~29, average of the molecule ~37).

There are a few things you can do to distinguish between Mg²⁺ and Ca²⁺:

Crystallographically: Do you still have the raw data? If yes, even a little anomalous data will decide this question. Don't merge the Friedel pairs and calculate an anomalous difference Fourier (CCP4 FFT with DANO=D_nat PHI=PHIC, i.e. coefficients (F+ - F-)exp[i(phimodel - pi/2)]). You should see very clear peaks for Ca and none for Mg. This is because Ca has an f'' of 1.286 electrons at CuKa radiation, whereas magnesium has only an f''=0.177.
Everyone collects some anomalous data - but if you run SCALEPACK with ANOM NO, you can lose it. If you always set ANOM YES, you still get a merged <I> for all hkl and -h-k-l pairs but the output also preserves the anomalous differences where observed. This is the default for SCALA.; You should be including the low resolution data if you have it (i.e., 20 - 30Å) and this will allow an accurate bulk solvent correction (this is really just general advice).; Ca²⁺ has 8 more electrons than Mg²⁺ which should give you a much higher electron density (contour at, say, ~4 rmsd(rho): do you see the well ordered sulfurs and your metal, only?). But with this you can only identify Ca²⁺ if its occupancy is close to unity. If the occupancies are close to unity, a falsely placed Mg²⁺ instead of a real Ca²⁺ would have a very low B-factor and vice versa.
Another way of looking at this, is:
The change in B-factor is mopping up the difference. You should check the B-factor of the protein atoms that are the ligands. If the atoms have B-factors around 12, then the Mg²⁺ appears to be appropriate. If they are around 29, then the Ca²⁺ is likely the answer.
Yet another contributor was surprised that difference maps weren't good indicators at 2.2Å: "At 1.9Å we observed respective peaks or troughs in the Fo-Fc maps when too light or too heavy a cation was used in the model, even though the B-factors were soaking up alot of the error."; If you still have crystals, soak one with Mn²⁺ instead of Mg²⁺: Mn²⁺ is a very good substitute for Mg²⁺ but has 13 more electrons. If you calculate an (Fo(Mn²⁺)-Fo(unknown)) electron density map, you should see a clear signal if unknown=Mg²⁺ and a weak signal (if any) if unknown=Ca²⁺
Metal-Ligand Geometry: Looking at bond distances only, is rather risky; the refinement programs often have "hidden" restraints which can distort your geometry. For instance most programs apply a VDW repulsion unless you specifically request that it be turned off. However, if you have been careful, some or all of the following should help:

Predicted Rfree-R Difference

(October 1999)

: Can someone post the reference to the paper dealing with the predicted Rfree-R difference as a function of data resolution again?

Tickle, I.J., Laskowski, R.A. & Moss, D.S. (1998) Acta Cryst D54, 547-557.

Packing Density

(January 2000)

: Can anybody tell me how to calculate the packing density of residues in protein? Is there any standard programme for that?

See Applications for Volume and Packing Calculations (Yale). This will be very helpful, because so many related programmes are availabe in this site. Furthermore, try CCP4's AREAIMOL (please use your local version of the CCP4 Program Documentation to view this), and Columbia's GRASP.

f' and f''

(October 1999)

: I am looking for a tool for calculating f-prime and f-doubleprime from X-ray fluorescence spectra for MAD-datasets. Does anyone know a good one?

CCP4's CROSSEC (please use your local version of the CCP4 Program Documentation to view this)
CHOOCH
Crystallographic Computing Services - Scattering factors
Anomalous Scattering Coefficients

Newsletter contents...

Otherwise change anything:	pH (~0.5 unit, or less cf Wim Hol's subtilisin: 0.1 pH unit)
	temp (Frazao, Acta Cryst. D55 (1999), 1465)
	additives/detergents
	ligands
	crystallization method etc.

Recent CCP4BB Discussions

Back by popular demand

'dm'

Averaging

Masking

Twinning

Playing with symmetry

Local Scaling in P1

CAD in R32

Unexpected and artificial values

Temperature Factors and the Wilson Plot

Anisotropic Scaling to get at 'true' Bfactors

Artificial Bfactors for NMR Structures

What to Expect of Correlation Coefficients in AMoRe

Predicting the Number of Reflections

FOMs in MLPHARE for MAD

REFMAC for Partial Poly-Ala

Various

AMoRe and CCP4 Asymmetric Unit

Distinguish Ca2+ from Mg

Predicted Rfree-R Difference

Packing Density

f' and f''

'`dm`'

Distinguish Ca²⁺ from Mg