List of CCP4 variables

Martyn Winn March 2001.

This page is a dynamic document! Send me complaints about existing items, and suggestions for additions. With enough feedback, we might converge on an agreed list.

Preamble

To progress though structure solution, one needs to keep track of quantities that you have measured or calculated, and how you have got to the current point in the process. This information is stored via:
  1. Standard file formats: MTZ, map and PDB.
  2. ccp4i: .def files
  3. your notebook or your memory
The last method is clearly unsatisfactory and we wish to bring some order to the chaos. This is particular important in view of moves to increased automation.

A first step is to list all the quantities that are included in the last catagory. This will enable us to set up proper methods of handling such data. These methods will probably be extended to the variables held in ccp4i .def files, so we will include those too. But for the time being, we will not consider variables held in the standard files. The aim is to be able to run a program automatically using these variables and the input files.

The following is a list of these variables. The names reflect a possible heirarchy. The ending _i implies there may be several such items. I have no particular implementation in mind (full database, XML, mmCIF, ASCII in log file, etc, etc). This list should be made into a dictionary, with explicit definitions.


Project administration

These are based on database.def and give book-keeping information for individual jobs.

ccp4i::njobs
ccp4i::job_i::status
ccp4i::job_i::date
ccp4i::job_i::logfile
ccp4i::job_i::scratch
ccp4i::job_i::taskname
ccp4i::job_i::title
ccp4i::job_i::input_files
ccp4i::job_i::input_files_dir
ccp4i::job_i::output_files
ccp4i::job_i::output_files_dir

The list of jobs actually forms a web, with perhaps many jobs providing information for a particular job, and that job perhaps providing information for several more jobs. There will also be many dead-ends. It may be difficult to re-create this web automatically, but we would at least need something like:

ccp4i::job_i::previous_jobs
ccp4i::job_i::next_jobs
ccp4i::job_i::success_or_fail

We need some form of error handling. Each job should produce:

ccp4i::job_i::error_level_number           # e.g. 0,1,2,3,4
ccp4i::job_i::error_level_severity         # e.g. OK, fatal, warning, info, library
ccp4i::job_i::error_program_name
ccp4i::job_i::error_message

Each program should produce some standard information whether or not it is run through ccp4i. In the latter case, these would be used to populate the job database.

program::name
program::ccp4_version
program::date
program::status
program::keywords

Data processing

These are from Harry. Many of these go into the batch header of the multi-record MTZ file.

process::crystal::size
process::crystal::colour
process::crystal::data_collection_temperature
process::crystal::oscillation_start
process::crystal::oscillation_end
process::crystal::oscillation_range
process::crystal::oscillation_direction
process::crystal::goniostat_orientation

process::radiation::wavelength
process::radiation::divergence_vertical
process::radiation::divergence_horizontal
process::radiation::polarization
process::radiation::synchrotron_or_lab
process::radiation::lab_site

process::detector::position::crystal_detector_distance
process::detector::position::swing_angle
process::detector::position::beam_position_x
process::detector::position::beam_position_y

process::detector::measurement::num_pixels_x
process::detector::measurement::num_pixels_y
process::detector::measurement::size_pixels_x
process::detector::measurement::size_pixels_y
process::detector::measurement::orientation
process::detector::measurement::overload_value
process::detector::measurement::gain

process::detector::mis_setting::twist
process::detector::mis_setting::tilt
process::detector::mis_setting::bulge
process::detector::mis_setting::radial_offset
process::detector::mis_setting::tangential_offset

Experimental data

This needs to tie in closely with MTZ++ heirarchy. Much of this information will be held in the data file, but unless the environment (ccp4i, intelligent user, etc.) can access the data structure directly, it will need to be translated to these variables.

See example below.


data::mtzfile::filename
data::mtzfile::title
data::mtzfile::num_reflections
data::mtzfile::missing_number_flag
data::mtzfile::spacegroup_num             # symops obtained from ERF i.e. symop.lib
data::mtzfile::sort_order                 # in particular, whether we need sortmtz

data::crystal_i::project_name
data::crystal_i::crystal_name
data::crystal_i::cell
data::crystal_i::dataset_i::dataset_name
data::crystal_i::dataset_i::wavelength
data::crystal_i::dataset_i::column_i::label
data::crystal_i::dataset_i::column_i::type

data::crystal_i::dataset_i::labels::fobs            # These are file labels which
data::crystal_i::dataset_i::labels::sigfobs         # have been identified, and can
data::crystal_i::dataset_i::labels::fobs_plus       # be passed to future jobs.
data::crystal_i::dataset_i::labels::sigfobs_plus    # I.e. these are known columns
data::crystal_i::dataset_i::labels::fobs_minus      # rather than arbitrary columns
data::crystal_i::dataset_i::labels::sigfobs_minus   # from a file.
data::crystal_i::dataset_i::labels::phase
data::crystal_i::dataset_i::labels::fom

data::crystal_i::dataset_i::num_centric
data::crystal_i::dataset_i::num_acentric

data::crystal_i::dataset_i::f_prime
data::crystal_i::dataset_i::f_double_prime

data::crystal_i::dataset_i::wilson_B_iso        # from Wilson plot

data::crystal_i::dataset_i::pseudo_trans_u      # from Patterson
data::crystal_i::dataset_i::pseudo_trans_v
data::crystal_i::dataset_i::pseudo_trans_w

data::crystal_i::dataset_i::twinning::operator
data::crystal_i::dataset_i::twinning::fraction
data::crystal_i::dataset_i::twinning::acentric_moment_2
data::crystal_i::dataset_i::twinning::cumul_I_flag         # Just a Boolean flag

data::crystal_i::dataset_i::anisotropy_ratio
data::crystal_i::dataset_i::res_for_anisotropy_ratio

Experimental phasing


mir::rmsdiff_iso::dataset1
mir::rmsdiff_iso::dataset2
mir::rmsdiff_iso::num_reflections
mir::rmsdiff_iso::value
mir::rmsdiff_iso::Rfactor
mir::rmsdiff_iso::norm_prob::centric
mir::rmsdiff_iso::norm_prob::acentric
mir::rmsdiff_ano::dataset
mir::rmsdiff_ano::value

Molecular Replacement

Here, one might have several search models (which may only differ in having one loop chopped off, for example). For each search model, one might have several solutions, with reference to particular data.

mr::model_i::filename                  # coordinate file
mr::model_i::name
mr::model_i::details                   # e.g. of side-chain trimming

mr::model_i::solution_i::goodness      # possibly several versions of this!
mr::model_i::solution_i::space_group   # can be unresolved at time of mr
mr::model_i::solution_i::euler_alpha
mr::model_i::solution_i::euler_beta
mr::model_i::solution_i::euler_gamma
mr::model_i::solution_i::fract_x
mr::model_i::solution_i::fract_y
mr::model_i::solution_i::fract_z

Model

Again, some of this could be in new coordinate data structure.

model::procheck_resolution
model::matthews_coefficient
model::fraction_solvent_content

model::entity::number_residues
model::entity::molecular_weight

model::asymmetric::number_molecules

model::ncs_i::euler_alpha
model::ncs_i::euler_beta
model::ncs_i::euler_gamma
model::ncs_i::polar_omega
model::ncs_i::polar_phi
model::ncs_i::polar_kappa
model::ncs_i::translate_x
model::ncs_i::translate_y
model::ncs_i::translate_z
model::ncs_i::mask

model::substructure_i::atom_i::fract_x
model::substructure_i::atom_i::fract_y
model::substructure_i::atom_i::fract_z
model::substructure_i::atom_i::occ
model::substructure_i::atom_i::B_iso
model::substructure_i::atom_i::B_aniso
model::substructure_i::atom_i::aocc
model::substructure_i::atom_i::aB_iso

Example 1: marked-up mtzdump output

A routine in the new mtzlib C libraries will write out mtzdump-style information marked-up using the above scheme. You get something like:

 * File information :

data::mtzfile::title       GerE native and MAD.                                                   
data::mtzfile::spacegroup_num       5
data::mtzfile::num_reflections       25739
data::mtzfile::missing_number_flag       NaN
data::mtzfile::sort_order       (not implemented)

 * Crystals, datasets :

data::crystal::crystal_name       native
data::crystal::project_name       Gere
data::crystal::cell         108.8420   61.7790   71.7520   90.0000   97.2510   90.0000

    data::crystal::dataset::dataset_name       native
    data::crystal::dataset::wavelength          1.40000

        data::crystal_i::dataset_i::column_i::label data::crystal_i::dataset_i::column_i::type
                     H                               H
                     K                               H
                     L                               H
                     FP                              F
                     SIGFP                           Q
                     FreeR_flag                      I

data::crystal::crystal_name       Se_met_deriv
data::crystal::project_name       Gere
data::crystal::cell         108.7420   61.6790   71.6520   90.0000   97.1510   90.0000

    data::crystal::dataset::dataset_name       SEinfl
    data::crystal::dataset::wavelength          0.98100

        data::crystal_i::dataset_i::column_i::label data::crystal_i::dataset_i::column_i::type
                     FSEinfl                         F
                     SIGFSEinfl                      Q
                     DSEinfl                         D
                     SIGDSEinfl                      Q
                     F(+)SEinfl                      F
                     SIGF(+)SEinfl                   Q
                     F(-)SEinfl                      F
                     SIGF(-)SEinfl                   Q

    data::crystal::dataset::dataset_name       SEpeak
    data::crystal::dataset::wavelength          0.98000

        data::crystal_i::dataset_i::column_i::label data::crystal_i::dataset_i::column_i::type
                     FSEpeak                         F
                     SIGFSEpeak                      Q
                     DSEpeak                         D
                     SIGDSEpeak                      Q
                     F(+)SEpeak                      F
                     SIGF(+)SEpeak                   Q
                     F(-)SEpeak                      F
                     SIGF(-)SEpeak                   Q


We would then change the tcl procedure ExtractMTZData to search for particular tags rather than assuming a particular layout.

Some information is still dependent on position, for instance datasets belong to the last crystal specified. In depends how complex you want these tags to get.


m.d.winn@ccp4.ac.uk
Last modified: Tue Mar 6 17:19:30 GMT 2001