Converting MTZ to mmCIF for Deposition

Background: how MTZ represents anomalous data

In the diffraction experiment, the presence of an anomalous scattering signal results means that there may be differences between the structure factor amplitudes measured for a reflection hkl as compared with its Friedel mate, -h-k-l.

Rather than storing the data as two independent reflections however, MTZ stores data for both in just one of the pair (say hkl). Imagine that the amplitiude measured for hkl is F(+) and that for -h-k-l is F(-),then:

  1. The data can be stored as a mean amplitude F and an anomalous difference D:

    F = 0.5*( F(+) + F(-) )

    D = F(+) - F(-)

    (There are also equations for calculating the sigmas for F and D, not shown here).

  2. The data can be stored as the raw F(+) and F(-) measurements.

Originally only mean values were stored and people would often simply throw away the differences. As anomalous scattering techniques became more widespread the differences were also stored. Nowadays it is common to see the data stored as both F and D and F(+)/F(-) - all associated with a single reflection, e.g.

h k l   F sigF   D sigD   F(+) sigF(+)   F(-) sigF(-)

(Note that having F sigF D sigD is essentially the same information as having F(+) sigF(+) F(-) sigF(-), only expressed in a different form. In principle you only need one of these two sets of columns in order to have all the data.)

This is different from formats such as XPLOR or SHELX where the data is stored as two separate reflections e.g.:

 h  k  l   F(+) sigF(+)
-h -k -l   F(-) sigF(-)

Issues when converting to mmCIF for deposition

The CCP4 program MTZ2VARIOUS can convert an MTZ file to mmCIF but there are some issues when dealing with anomalous data.

Older versions of MTZ2VARIOUS (pre-23rd May 2003,or revision 1.101) wrote out anomalous data in two explicit reflections, using the following tokens corresponding to columns in the MTZ file:

 _refln.F_meas_au         FP (or F(+)/F(-))
 _refln.F_meas_sigma_au   SIGFP (or sigF(+)/sigF(-))
 _refln.intensity_meas    I (or I(+)/I(-))
 _refln.intensity_sigma   SIGI (or sigI(+)/sigI(-))
This is different to the way that MTZ files store the data, as described in the previous section.

Now MTZ2VARIOUS treats the anomalous data differently, writing only a single reflection for each anomalous pair and using the following tokens to correspond to the columns in the MTZ file:

 _refln.F_meas_au         FP
 _refln.F_meas_sigma_au   SIGFP
 _refln.intensity_meas    I
 _refln.intensity_sigma   SIGI
 _refln.ccp4_SAD_F_meas_plus_au         F(+)
 _refln.ccp4_SAD_F_meas_plus_sigma_au   SIGF(+)
 _refln.ccp4_SAD_F_meas_minus_au        F(-)
 _refln.ccp4_SAD_F_meas_minus_sigma_au  SIGF(-)
 _refln.ccp4_SAD_phase_anom             DP
 _refln.ccp4_SAD_phase_anom_sigma       SIGDP
 _refln.ccp4_I_plus                     I(+)
 _refln.ccp4_I_plus_sigma               SIGI(+)
 _refln.ccp4_I_minus                    I(-)
 _refln.ccp4_I_minus_sigma              SIGI(-) 

This maps more closely onto the way that MTZ files store the same information.

The two ways of representing the anomalous data are different, in the way that the standard _refln.F_meas_au etc tokens are used:

Note that the EBI have these tokens as part of the CIF exchange dictionary mmcif_ccp4.dic: this can be found at http://mmcif.pdb.org/dictionaries/mmcif_ccp4.dic/Index/index.html.

The RCSB convert these to equivalent tokens in the PDB exchange dictionary mmcif_pdbx.dic found at http://mmcif.pdb.org/dictionaries/mmcif_pdbx.dic/Index/index.html. The mappings are:

 _refln.ccp4_SAD_F_meas_plus_au        -> _refln.pdbx_F_plus
 _refln.ccp4_SAD_F_meas_plus_sigma_au  -> _refln.pdbx_F_plus_sigma
 _refln.ccp4_SAD_F_meas_minus_au       -> _refln.pdbx_F_minus
 _refln.ccp4_SAD_F_meas_minus_sigma_au -> _refln.pdbx_F_minus_sigma

 _refln.ccp4_I_plus        -> _refln.pdbx_I_plus
 _refln.ccp4_I_plus_sigma  -> _refln.pdbx_I_plus_sigma
 _refln.ccp4_I_minus       -> _refln.pdbx_I_minus
 _refln.ccp4_I_minus_sigma -> _refln.pdbx_I_minus_sigma

Other Issues

1. In June 2003 the EBI requested that CCP4 change the tokens used explicitly for anomalous data, to make them more generic:

 _refln.ccp4_SAD_F_meas_plus_au        -> _refln.F_meas_plus
 _refln.ccp4_SAD_F_meas_plus_sigma_au  -> _refln.F_meas_plus_sigma
 _refln.ccp4_SAD_F_meas_minus_au       -> _refln.F_meas_minus
 _refln.ccp4_SAD_F_meas_minus_sigma_au -> _refln.F_meas_minus_sigma

 _refln.ccp4_I_plus        -> _refln.intensity_meas_plus
 _refln.ccp4_I_plus_sigma  -> _refln.intensity_meas_plus_sigma
 _refln.ccp4_I_minus       -> _refln.intensity_meas_minus
 _refln.ccp4_I_minus_sigma -> _refln.intensity_meas_minus_sigma

2. The EBI recognise Hendrickson-Lattmann coefficients from CCP4:

 _refln.ccp4_SAD_HL_A_iso               HLA
 _refln.ccp4_SAD_HL_B_iso               HLB
 _refln.ccp4_SAD_HL_C_iso               HLC
 _refln.ccp4_SAD_HL_D_iso               HLD

The RCSB also have equivalents for these in the PDB exchange dictionary:

 _refln.ccp4_SAD_HL_A_iso -> _refln.pdbx_HL_A_iso
 _refln.ccp4_SAD_HL_B_iso -> _refln.pdbx_HL_B_iso
 _refln.ccp4_SAD_HL_C_iso -> _refln.pdbx_HL_C_iso
 _refln.ccp4_SAD_HL_D_iso -> _refln.pdbx_HL_D_iso

3. mmCIF files can contain multiple datasets (indexed by crystal and wavelength). Although this maps well onto MTZ, the MTZ2VARIOUS program also doesn't support this currently.

4. SFCHECK doesn't recognise the _refln.ccp4_SAD_F_meas_plus_au etc tokens.

Issues when converting from mmCIF to MTZ

The CCP4 program CIF2MTZ can be used to convert from mmCIF to MTZ. It was orginally intended to convert mmCIF files from the PDB into MTZ files. Here there are two issues affecting the treatment of anomalous data:

1. If the mmCIF file contains the anomalous data in the RCSB representation (i.e. Freidel mates are explicitly given as separate reflections) then the CIF2MTZ program needs to be given the ANOMALOUS keyword in order to correctly convert the pairs of reflections back to the MTZ format.

2. If the mmCIF file contains the anomalous data in the non-standard token format (i.e. the _refln.ccp4_... tokens) then the correct back conversion is not possible because these tokens are not recognised by MTZ2CIF.

This is an issue for CCP4 to resolve.

Useful Links

mmCIF Resources (including the data dictionaries):

PDB_Extract: