In the October 1999 Newsletter, Martyn Winn started what will, hopefully, become a trend in keeping track of interesting discussions on the CCP4BB. To make things much easier for both the users of the bulletin board and the track-keeper, members who ask questions or instigate discussions on the board are now asked (urged!) to post a summary of all the reactions received, whether on or off the board.
The introduction to the October 1999 version also goes for this article:
For each subject below, the original question is given in italics, followed by a summary of the responses sent to CCP4BB (together with some additional material). For the sake of clarity and brevity, I have paraphrased the responses, and all inaccuracies are therefore mine. To avoid misrepresenting people's opinions or causing embarrassment, I have not identified anyone involved: those that are interested in the full discussion can view the original messages (see the CCP4BB web pages on how to do this).These summaries are not complete, since many responses go directly to the person asking the question. While we understand the reasons for this, we would encourage people to share their knowledge on CCP4BB, and also would be happy to see summaries produced by the original questioner. While CCP4BB is obviously alive and well, we think there is still some way to go before the level of traffic becomes inconvenient.
We, or at least I, would like to thank Sasha Urzhumtsev and Xavier Gomis Rueth for their various attempts to get the summarising off the ground. It seems to have paid off. And thanks to all the users who are now dutifully posting summaries. Finally I would like to thank Eleanor Dodson for her corrections and additions.
Subjects covered in this newsletter's offering, are:
- 'dm'
- Averaging
- Masking
- Twinning
- Playing with Symmetry
- Local Scaling in P1
- CAD in R32
- Unexpected and artificial values
- Temperature Factors and the Wilson Plot
- Anisotropic Scaling to get at 'true' Bfactors
- Artificial Bfactors for NMR Structures
- What to Expect of Correlation Coefficients in AMoRe
- Predicting the Number of Reflections
- FOMs in MLPHARE for MAD
- REFMAC for Partial Poly-Ala
- Various
- AMoRe and CCP4 Asymmetric Unit
- Distinguish Calcium from Magnesium
- Predicted Rfree-R Difference
- Packing Density
- f' and f''
(October and November 1999)
inter-dimer: A -> B B -> C intra-dimer: A1 -> A2 (from subunit 1 of dimer A to subunit 2 of dimer 1) B1 -> B2 C1 -> C2where the operators consist of a 3x3 matrix plus a vector. The question is: is this a representative set of operators for successful averaging?
The following fundamental principle should be applied:
If your dimer rotations are self-inverse, then your mask could equally well cover A instead of A1. If all six NCS operators form a closed group, then you could use a hexamer mask.
(A1-B2) = (B1-B2)x(A-B) , i.e. multiply the matrices, take the right order! As all symmetry operators X consist of a 3x3 matrix M AND a vector t, for the multiplication you have to multiply the augmented 4x4 matrices:
(m11 m12 m13 t1) X= M + t => (m21 m22 m23 t2) (m31 m32 m33 t3) ( 0 0 0 1) (m(a1-b1)11 m(a1-b1)12 m(a1-b1)13 t1) X(b1)= {M(a1-b1) + t(a1-b1)} X(a1) => (m(a1-b1)21 m(a1-b1)22 m(a1-b1)23 t2) X(a1) (m(a1-b1)31 m(a1-b1)32 m(a1-b1)33 t3) ( 0 0 0 1) (m(b1-b2)11 m(b1-b2)12 m(b1-b2)13 t1) X(b2)= {M(b1-b2) + t(b1-b2)} X(b1) => (m(b1-b2)21 m(b1-b2)22 m(b1-b2)23 t2) X(b1) (m(b1-b2)31 m(b1-b2)32 m(b1-b2)33 t3) ( 0 0 0 1) so X(b2)= {M(b1-b2) + t(b1-b2)} {M(a1-b1) + t(a1-b1)} X(a1)But it is much easier to use LSQKAB (please use your local version of the CCP4 Program Documentation to view this) to fit the coordinates of A1 onto those of B2, etc.
A dimer mask (proper NCS) made using MAPROT and MAPMASK turned out dismal. On the other hand, the one from 'dm', assumed to be calculated on the unaveraged map, is much better. An NCS mask can be simply made by putting an atom of 40Å radius at the centroid of the heavy atom position. In tests this is about as good as a monomer mask.
This stimulated a lively discussion on the use of the phrase "maskless averaging", after which the question was rephrased as How to improve the original 'dm' solvent mask?.
There are several ways of using averaging and the answers need to be summarised differently for each application. It is assumed you know the NCS operators which map the "master" molecule onto all other copies in the asymmetric unit.
To average an existing map using known NCS operators - this is Eleanor's definition of maskless averaging.
If you are only interested in averaging a MAP there is no reason, except for speed, to use a mask. Those parts of the map which obey the NCS symmetry will be improved by averaging, whilst those parts which do not, will deteriorate.
If the NCS operators are "proper" or "form a closed group" such that there is a NCS-defined pure rotation, the whole complex will be improved. The density for molecule 1 will be overlapped and summed with that of molecule 2, that of molecule 2 overlapped with that of molecule 3, until the complete rotation is done.
If the NCS operators are "improper", only the master molecule will be improved.
Hopefully this will also generate a better molecular boundary and thus a better intial solvent flattening mask.
The NCS operators can be refined to improve the correlation between different copies of the electron density. This refinement will be done using density within some masked region but this need not cover the whole molecule. This restricted mask can be obtained in various ways. The simplest is to put a single "atom" at the centre of the molecule, then generate a mask around that "atom" assigning it a large VDW radius.
The centre of mass can be decided either by inspecting the map, or possibly, if the NCS was determined by fitting heavy atoms within each molecule, it could be the centre of the heavy atom cluster. If the map is very noisy, it may be sensible to only use the strong density to refine the NCS operators.
However once the NCS operators have been refined, it is sensible to use density modification to improve the phases. This is done in the following way
As a result, map cutting may be performed by supplying MAPROT with a work map and mask (WRKIN+MSKIN), leaving the unit-cell map (MAPIN) blank, and giving a single operator (which may be unitary) along with GRID and CELL for the cutout map. The cut density is written to the CUTOUT file.
The new maptuils are available by ftp as follows:
> ftp ftp.yorvic.york.ac.uk login: anonymous password: type_your_full_email_address_here ftp > cd pub/ccp4 ftp > get maputils.tar.gz ftp > quit > gunzip maputils.tar.gz > tar xvf maputils.tar > cd maputils/maprot_ > makemaprot > cd ../mapmask__ > makemapmask
(January 2000)
Thanks to the original enquirer for the summary:
Otherwise change anything: | pH (~0.5 unit, or less cf Wim Hol's subtilisin: 0.1 pH unit) |
temp (Frazao, Acta Cryst. D55 (1999), 1465) | |
additives/detergents | |
ligands | |
crystallization method etc. |
Theoretically this can be done. However, when the data is detwinned you must take precautions that the twin operator deconvolutes I+ to I+ and I- to I-. Otherwise, the detwinning loses your anomalous signal. As I understand, the CCP4 DETWIN program does not support detwinning of anomalous data. That brings up the next point. For the most part a SeMet MAD experiment is a low signal experiment (I am assuming you are talking about a SeMet MAD experiment). Detwinning, while helpful, is not perfect and will remove some if not all of the signal. Also, with the presence of a twin operator, you will have two solutions related to each other by the twin operator. So, finding the consistent solution with its cross peaks becomes a major undertaking. In my opinion, to solve a MAD twin problem you will need to have a very high signal MAD experiment with few sites.
Finally lots of referrals to general things:
(January 2000)
Some thoughts on the first, and a firm answer to the second question:
Then two references to sliding box scaling, the 'original' way:
(September 1999)
CAD is correct, and those pairs of reflections are indeed equivalent.
Rember that if you generate a symmetry equivalent position in real space using this expression:
[Xj] (Sj11 Sj12 Sj13 St1)[X1] [Xj] = (Sj21 Sj22 Sj23 St2)[X1] [Xj] (Sj31 Sj32 Sj33 St3)[Z1] ( 0 0 0 1)[1 ]an equivalent reflection in reciprocal space is generated by:
(Sk11 Sk12 Sk13) [Hk Kk Lk]= [H1 K1 L1] (Sk21 Sk22 Sk23) (Sk31 Sk32 Sk33)For R32 a symmetry matrix: -Y,X-Y,Z can be represented as:
[X2] ( 0 -1 0 0)[X1] [Y2] = ( 1 -1 0 0)[X1] [Z2] ( 0 0 1 0)[Z1] ( 0 0 0 1)[1 ]The equivalent reflection is
( 0 -1 0) [Hk Kk Lk]= [H1 K1 L1] ( 1 -1 0) = [ K1,-H1-K1,L1] ( 0 0 1)For R32 the complete set of reciprocal space equivalents are:
[h,k,l] [k,-h-k,l] [-h-k,h,l] [k,h,-l] [-h-k,k,-l] [h,-h-k,-l] and the equivalent [-h,-k,-l] sets
Symmetry operator 5 gives (4,-2,6) equivalent to (-2,-2,-6) and negating all signs gives (2,2,6).
Symmetry operator 5 gives (4,-1,-7) equivalent to (-3,-1, 7) and negating all signs gives (3,1,-7).
(-h-k) is sometimes written as i, and people refer to the four-index Miller-Bravais symbols
(h k (-h-k) l):
(4 -2 6) becomes (4 -2 -2 6);
(4 -1 -7) becomes (4 -1 -3 -7).
(January 2000)
Thanks to the original enquirer for the summary:
(September 1999)
The general consensus is that striving for low Bfactors is a thing of the
past. High Bfactors can be very 'true' Bfactors. Relatively low side chain Bfactors
would make sense if they are buried, but connected to an exposed piece of main
chain which would then have higher Bfactors. The PDB is not a reliable source of
information about Bfactors, since the scaling algorithms in XPLOR inevitably mean
that most low resolution structures have totally unrealistic Bfactors. XPLOR scales
Fobs rather than Fcalc, which, especially at lower resolution and at the beginning of
a refinement procedure, could cause problems. It would be sensible to start
refinement not with an average B of 20 (as output by many model building programs),
but with an average B that represents the Wilson plot as closely as possible. Also,
to apply Bfactor corrections sensibly before each round of refinement.
There may be valid reasons for adjusting the Bfactor of the
Fobs, for making maps or Patterson or rotation function searches,
sharpening the data to accentuate the high resolution details, but the final
atomic Bfactors should be refined against an uncorrected dataset.
With regards to the use of NCS, most people now accept that relaxing
it, with care, is good crystallographic practice, and could have a beneficial
effect on Bfactors in refinement.
(October 1999)
(September 1999)
If you calculate the correlation between Fobs and Fcalc you will always get a positive outcome because they are correlated in the sense that both decrease with resolution. If you use E values you do not see this since E values do not decrease with resolution, as you noted. In your case the effects may have been more severe, perhaps due to too large B-values for the search model?
(January 2000)
(September 1999)
It most definitely will if you refine the xyz and occupancy for all wavelengths at once. There is an implicit assumption in the algorithm that each "derivative" is independent, which is obviously not true when a "derivative" is taken as another wavelength, with all differences arising from the same atom sites. The more data sets at different wavelengths with the same sites, the fatter the FOMs get.
The best way to minimise this problem is to refine the coordinates and temperature factors against the largest set of dispersive differences for the centric reflections only (assuming you have such reflections). After that refinement fix the xyz and B for all wavelengths, and the ISOE and occupancies for that pair of dispersive differences. Then estimate the ISOE and occupancies for all other pairs of dispersive differences.
Only then with all XYZ, B, OCC and ISOEs fixed, start refining the anomalous occupancies for all wavelengths. There is a good deal of anecdotal evidence that many wavelengths actually gives worse maps than 2 or 3. This is probably not because the phases are worse, but because the FOMS are overestimated.
Be careful about including the native in the MAD phasing. I wouldn't put it in as the native, but as a derivative (with negative OCC and 0 AOCC). If you put it in as the native, the error estimates in the differences with all the other data sets may be bigger than your actual signal, unless the latter is truly gigantic. In that case the FOMS do reflect the quality of the phases (both would be kind of bad).
(January 2000)
Thanks to the original enquirer for the summary:
A straight answer to the question about the distortion, was found in the input to PROTIN: the start of the polypeptide chain needed to be dealt with slightly differently (NTERM 1 rather than NTERM 195).
The question also sparked a discussion about the use of all data in refinement, and the use of torsion angle refinement to decrease the number of parameters to refine. In this case, the data really only extended to 3Å, and careful testing of REFMAC and CNS produced very similar and equally interpretable maps.
Maximum-Likelihood refinement using experimental phases (as in REFMAC) can deal with poor/partial models correctly, if the phases have reliable FOMs. Including all data works best providing you are estimating errors from the Rfree set. Initially the weights assigned to the outer resolution bins are very small, but they still contribute significantly to the important overall scaling parameters. The amount of bias is often a function of the amount of missing data. By default REFMAC substitutes these reflections as Dfc instead of the 2mFo-dFc used for observed data, which usually gives better maps than omitting them altogether (that is tantamount to setting them to zero), but any such term will introduce bias. If you have many missing reflections you may have a problem.
Do not expect miracles, and keep an eye on the theory:
ML is a minimisation CRITERION, and NOT a minimisation method. One can
compare it with the least-square criterion. It works in the same way except
that instead of fitting calculated magnitudes to the observed ones, it
suggests to fit them to some MODIFIED VALUES, and these modifications are
estimated through the model quality and completeness (see, for example,
Maximal
Likelihood refinement. It works, but why? in the October 1999 CCP4
Newsletter). If the model is
incomplete, it is wrong to fit the FC only to experimental amplitudes (as is the
case in the LS refinement) and ML attempts to introduce corrections.
These corrections are estimated in resolution shells. If your model does not fit
at all your data at a given resolution shell, the ML puts the "corrected
experimental magnitudes" in this shell near to zero. As the model improves, the
weighting will increase. This is similar to the "good old scheme" of slowly
increasing the resolution to include in the refinement, but less subjective.
(January 2000)
Any MR program is simply going to find a CRYSTALLOGRAPHIC solution, and not care about building a sensible model. The following procedure would take care of this:
pdbset xyzin tabfun_out.pdb xyzout soln1.pdb cell A B C alpha beta gamma (new cell) rota euler ALPHA BETA GAMMA shift frac Tx Ty Tz CHAIN A end
distang xyzin all_solns.pdb SYMM whatever RADI CA 4 dist VDW END
pdbset xyzin soln2.pdb xyzout soln2a.pdb SYMGEN -X-1, Y+1/2,-z+1 (symmetry 2 -1 0 1 in spacegroup P21)
You are aiming to get a nice compact molecule with good contacts produced by symmetry X,Y,Z generated by particular symmetry operators. Can be messy but it saves a lot of time on the graphics later!
(December 1999)
I am having troubles in identifying two metal ions in a structure.
I used Mg2+ in sample preparation and Ca2+ in crystallization. The
coordination geometry, refinement statistics (both Rfactors and Bfactors) and maps do not
resolve the ambiguity.
I would like to know about the experts' opions. The queston in my mind is whether the bonding
distance with the coordinating oxygens is a discriminator (~2.3Å in my case)? Also
how much the B-factor can tell? (in my case, Mg2+ ~12, Ca2+ ~29,
average of the molecule ~37).
There are a few things you can do to distinguish between Mg2+ and Ca2+:
(October 1999)
Tickle, I.J., Laskowski, R.A. & Moss, D.S. (1998) Acta Cryst D54, 547-557.
(January 2000)
See Applications for Volume and Packing Calculations (Yale). This will be very helpful, because so many related programmes are availabe in this site. Furthermore, try CCP4's AREAIMOL (please use your local version of the CCP4 Program Documentation to view this), and Columbia's GRASP.
(October 1999)