June 2001
This arrangement has worked well, but ignores the fact that certain columns of data go naturally together. A calculated phase belongs with the corresponding calculated structure factor; the structure factors for different wavelengths of a MAD experiment are used together; the structure factors of a heavy-atom derivative are used to estimate phases for a native structure factor. In short, a set of columns does not describe the relationships implicit in the file.
Thus, the concepts of "project" and "dataset" were invented. A project contains all data used to create a model of the target structure. A dataset is the result of a single experiment contributing to the determination of the model (which may be represented by several columns of data). All columns in an MTZ file are assigned to a particular dataset in a particular project. This information was initially used for Data Harvesting, but has since been used in other contexts where inferrence of relationships between columns is useful (see my Newsletter article).
However, it has since been appreciated that this is not a complete data model of the reflection data. Such a model is described in the next section, and is based on ideas of Kevin Cowtan (see Kevin's page), Airlie McCoy and others.
File -> Crystal -> Dataset -> Datalist -> Column
A `Crystal' is essentially a single crystal form: usually there will be one crystal per derivative, unless a single derivative can crystalise in several cells (e.g. RT and frozen). A `Dataset' is a set of observations on a crystal. If data is collected at several wavelengths, each of these becomes a separate dataset. A `Datalist' is a grouping of associated columns. Thus a single list will hold both F and SigF. Another list holds all four Hendrickson Lattman coefficients.
Each data list is linked to one of the datasets and each dataset is linked to one of the crystals. There may be several data lists per dataset and several datasets per crystal. Some data lists may be linked to the base dataset (which is just a placeholder) and thus to the base crystal. This will be the case for synthetic data and data types such as the FreeRflag.
The project is now simply an attribute of the crystal. It will still be used for the purposes of Data Harvesting, but does not form part of the heirarchy.
The relevant records are:
This is clearly not a perfect representation of the data model. Some information is duplicated, for example all datasets belonging to the same crystal must have the same DCELL parameters recorded. However, it represents a simple extension of the existing MTZ format, and thus can be implemented without breaking existing software or rendering existing data obsolete. All keywords except CRYSTAL are already implemented in CCP4 4.1
Available files are:
mtzdata.h defines basic MTZ data structure mtzlib.c C library for MTZ i/o and manipulating MTZ data structure library.c library.c for C programs (made robust by Charles Ballard) symlib.c C versions of symfr3 and symtr3 ccplib.c C versions of ccperr, ccpfyp, ccprcs (mainly from Pete Briggs) cparser.c C version of parser (from Pete Briggs)In more detail, mtzlib.c contains following functions:
MTZ *MtzGet void CmtzRrefl void MtzPut void CmtzWhdrLine void CmtzWrefl
MTZ *MtzMalloc void MtzFree MTZCOL *MtzMallocCol void MtzFreeCol MTZBAT *MtzMallocBatch void MtzFreeBatch char *MtzCallocHist void MtzFreeHist
int MtzNbat
MTZXTAL *MtzAddXtal char *MtzXtalPath MTZXTAL *MtzXtalLookup
MTZSET *MtzAddDataset int MtzNset MTZXTAL *MtzSetXtal char *MtzSetPath MTZSET *MtzSetLookup
MTZCOL *MtzAddColumn MTZSET *MtzColSet int MtzNcol char *MtzColPath void MtzRJustPath int MtzPathMatch MTZCOL *MtzColLookup int MtzListColumn
float ind2reso void hklcoeffs void MtzArrayToBatch void MtzBatchToArray
void clrtitl int clrhist void clrinfo void clrsort void clrbats void clrcell void clrrsol void clrsymi void clrsymm int MtzParseLabin MTZCOL *clrassn void clridx int clrrefl int clrreff int ccp4_ismnf void clhprt void clhprt_adv void clrbat void clwtitl void clwsort int clwhist void clwassn void clwidx void clwidas void clwrefl void clwbat void clwbsetid