From P.J.Briggs@dl.ac.uk Wed Aug 17 15:58:21 2005 Date: Mon, 15 Aug 2005 11:00:32 +0100 From: P.J.Briggs To: Peter Briggs Subject: Notes: MTZ to CIF for deposition Dear all Following some very useful discussions with people here at the RCSB, I've made some progress with the various issues involved in turning MTZ into CIF for deposition. Briefly the issues seem to be: 1. Standardisation of CIF tokens output by CCP4 2. Treatment of anomalous data 3. Output of multiple datasets to a single CIF file To expand on (and propose some solutions for) these issues: 1. Standardisation of CIF tokens: Use the pdbx_* tokens from the PDB exchange dictionary instead of the ccp4_SAD* tokens currently used in MTZ2VARIOUS. (I appreciate that Kim has raised some reservations regarding these tokens, so I hope that we can talk about it face-to-face at the IUCr and iron this out.) 2. Anomalous data: I'm proposing to use the extended token set (i.e. _refln.pdbx_F_plus, _refln.pdbx_F_minus etc) to write out anomalous data, and only write a single reflection record hkl (since outputting a value for ..._F_minus with reflection hkl implies that this is the measured value for -h-k-l). Although this differs from how CIF structure factor files are supplied from the PDB archive (there the hkl and -h-k-l pairs are explicitly written) it is my understanding that both the EBI and RCSB can handle these tokens and translate the data correctly into their internal formats. Does this sound reasonable? One unresolved point is that the "F" column in MTZ stores an average of F(+) and F(-), and that without the anamolous differences it is not possible to recover estimates of F(+) and F(-). So there is also a question of whether it is ambuiguous to write average Fs as "_refln.F_meas_au" in the absence of the differences. It would be 3. Output of multiple datasets The existing MTZ2VARIOUS program has a number of problems and so I have written a new MTZ2CIF converter, which can output several MTZ datasets to a single CIF. Reflections from different datasets are indexed with unique crystal_id/wavelength_id number pairs in the _refln.* block - however it seems that some additional tokens are required in other blocks (_cell.*, _reflns.* and _diffrn_radiation_wavelength.*) in order to correctly relate quantities such as cell parameters to the correct crystal_id: _cell.CCP4_crystal_id _reflns.CCP4_wavelength_id _reflns.CCP4_crystal_id _diffrn_radiation_wavelength.CCP4_crystal_id These would allowing a mapping onto _refln.wavelength_id _refln.crystal_id It would therefore be useful also to discuss some standardisation of these additional tokens as written from MTZ2CIF. -- _____________________________________________________ Peter J Briggs, pjx@ccp4.ac.uk Tel: +44 1925 603826 CCP4, ccp4@ccp4.ac.uk Fax: +44 1925 603825 http://www.ccp4.ac.uk/ Daresbury Laboratory, Daresbury, Warrington WA4 4AD