From P.J.Briggs@dl.ac.uk Wed Aug 17 15:58:21 2005
Date: Mon, 15 Aug 2005 11:00:32 +0100
From: P.J.Briggs
To: Peter Briggs
Subject: Notes: MTZ to CIF for deposition
Dear all
Following some very useful discussions with people here at the RCSB,
I've made some progress with the various issues involved in turning MTZ
into CIF for deposition.
Briefly the issues seem to be:
1. Standardisation of CIF tokens output by CCP4
2. Treatment of anomalous data
3. Output of multiple datasets to a single CIF file
To expand on (and propose some solutions for) these issues:
1. Standardisation of CIF tokens:
Use the pdbx_* tokens from the PDB exchange dictionary instead of the
ccp4_SAD* tokens currently used in MTZ2VARIOUS.
(I appreciate that Kim has raised some reservations regarding these
tokens, so I hope that we can talk about it face-to-face at the IUCr
and iron this out.)
2. Anomalous data:
I'm proposing to use the extended token set (i.e.
_refln.pdbx_F_plus, _refln.pdbx_F_minus etc) to write out anomalous
data, and only write a single reflection record hkl (since outputting
a value for ..._F_minus with reflection hkl implies that this is the
measured value for -h-k-l).
Although this differs from how CIF structure factor files are
supplied from the PDB archive (there the hkl and -h-k-l pairs are
explicitly written) it is my understanding that both the EBI and RCSB
can handle these tokens and translate the data correctly into their
internal formats. Does this sound reasonable?
One unresolved point is that the "F" column in MTZ stores an
average of F(+) and F(-), and that without the anamolous differences
it is not possible to recover estimates of F(+) and F(-). So there is
also a question of whether it is ambuiguous to write average Fs as
"_refln.F_meas_au" in the absence of the differences.
It would be
3. Output of multiple datasets
The existing MTZ2VARIOUS program has a number of problems and so I
have written a new MTZ2CIF converter, which can output several MTZ
datasets to a single CIF.
Reflections from different datasets are indexed with unique
crystal_id/wavelength_id number pairs in the _refln.* block - however
it seems that some additional tokens are required in other blocks
(_cell.*, _reflns.* and _diffrn_radiation_wavelength.*) in order to
correctly relate quantities such as cell parameters to the correct
crystal_id:
_cell.CCP4_crystal_id
_reflns.CCP4_wavelength_id
_reflns.CCP4_crystal_id
_diffrn_radiation_wavelength.CCP4_crystal_id
These would allowing a mapping onto
_refln.wavelength_id
_refln.crystal_id
It would therefore be useful also to discuss some standardisation of
these additional tokens as written from MTZ2CIF.
--
_____________________________________________________
Peter J Briggs, pjx@ccp4.ac.uk Tel: +44 1925 603826
CCP4, ccp4@ccp4.ac.uk Fax: +44 1925 603825
http://www.ccp4.ac.uk/
Daresbury Laboratory, Daresbury, Warrington WA4 4AD