Statistics on correlations of B-factors and resolution

Correlation of B-factors and resolution

Well, I had a quick look at the data stored in QDB (GJK, Acta Cryst D52, 842-857) which shows that for 435 structures the correlation coefficient between resolution and average B is only 0.06, i.e. insignificant.

The only non-trivial correlate (using a 0.2 cut-off) is the percentage of secondary structure (makes sort of sense) with cc=0.20.

In my other large-scale test (see below), mentioned a couple of weeks ago, I found that essentially all temperature factor related statistics are "incorrectly" correlated with measures of model accuracy (e.g., higher average B tends to be accompanied by higher accuracy!). Average B is very strongly correlated with completeness on the other hand. I suspect that problems with data and/or restraints (rather than physics) are a major determinant of the temperature factors we calculate for our models ...

Correlation of B-factors with other factors

I did an experiment some two years ago involving hundreds of refinements of the same structure using systematically perturbed data and different starting models. It's a very long story (and, yes, I will write it up properly one millennium), but the conclusion was:

"if you have measured it, use it!"

This was done at both 2.0 and 2.5 Å, with cns 0.5 using the MLF target in most cases (I did do an "old fashioned" run with target=resi for comparison). I correlated all measures of data quality and quantity with accuracy. The strongest correlation by far was for "completeness" (which, because I used fixed resolution limits, really means "nr of reflections used in the refinement") which showed correlation coefficients of ~0.7. Read's measure of information content also correlated well (~0.5-0.6), but all other measures (Rmerge, average I/sigma(I), average multiplicity, Rmeas and PCV) correlated very weakly with the accuracy of the final model (correlation coefficients ~0.1-0.2). This was really a bit of a shock - but at least it gives my audiences something to disagree with me about :-)

Most of the other findings were surprisingly well in line with the gospel-according-to-alwyn-and-gerard, i.e. the validation criteria that correlate best with model accuracy are those that are "orthogonal" to the information used in the refinement. In other words: rmsd bond lengths and angles from ideal values are completely uninformative; but Rfree, Ramachandran, DACA score etc. are very good.

As for what to call the resolution, I wouldn't worry too much. Eventually, we'll be quoting the ratio of the number of bits of information in our experimental data and the effective number of degrees of freedom (or something similar).