Contents
Downloading DataReading the Coordinate File
Monomer library files
PDB File Remediation - Summer 2007
Atom energy type and properties
Residue name synonyms
Residue types
Bonds
Saving additional data
See also Coordinate Model Interface
CCP4 Molecular Graphics Documentation | ||
Reading Coordinate Files and Atom Typing |
Documentation Contents | On-line Documentation | Tutorials | CCP4mg Home |
CCP4mg will download and display coordinate files from the coordinate deposition sites (RCSB, EBI or PDBj). Access this via the Download data option on the File pull-down menu. You must enter the four-letter code for the file required and click the Download button. By default the downloaded file will be written to your current project directory, alternatively you can enter a name for the new file.
The choice of browser and URLs for web pages and downloads can be changed in the Preferences window: choose Download options from the Tools folder.
CCP4mg reads model coordinates from PDB or mmCIF format files. The program does some basic analysis of the structure and reports any problems. The report can later be accessed from the Structure definition sub-menu of the model icon menu . Most of the causes of warnings will not prevent the molecule from being displayed.
Coordinate files should follow the file format guide from www.wwpdb.org/documentation/format30/index.html. CCP4mg also expects that each atom in the file will have a unique identifier where the identifier is composed of: chain id, residue sequence number, residue insertion code, atom name and alternate location indicator. CCP4mg does not use the atom serial number or the segment identifier. If the element symbol is not present then CCP4mg attempts to deduce the element type from the atom name.
CCP4mg attempts to cross-reference the residues and atoms in the structure to a library of monomers. The monomer library provides:
The protocol for matching a residue in the structure to a monomer library residue is to first
try to match the residue name and then to match the atoms.
To match the residue name..
To match the atoms..
If no match is found then then is a warning message is added to the file loading warning messages and the atoms are assigned some generic type based on their element type. This is generally not a problem - the most likely problem would be misinterpreting the hydrogen bonding capability of oxygen or nitrogen atoms.
The ideal geometry information for a monomer can be taken from one of two sources:
1) The REFMAC5 monomer library (Vagin et.al.) which contains the structure definitions used in REFMAC5
refinement. The data is in the directory ccp4mg/data/monomer_library. Each monomer in the library is recorded in a separate file called MON.cif (where MON is the name of the monomer). These files are organised by the initial letter in their name into sub-directories named a to z. You can view the content of a monomer library file using the List monomer definition option from the Tools pull-down menu.
2) The user can provide a monomer library file containing definitions of novel structures which are not
in the database. The file can be generated using LibCheck via the CCP4i Sketcher interface
(see $CCP4/ccp4i/help/modules/sketcher.html and $CCP4/html/intro_mon_lib.html)
Novel monomer library files or modifications to the standard files should be placed in the
directory your_home_directory/.CCP4MG/data/monomer_library. CCP4mg will
use monomer definitions in this directory in preference to those the standard distribution
library.
The wwwPDB has completed an exercise of cleaning up and standardising residue and atom names to IUPAC conventions. This creates a problem for programs such as CCP4mg supporting data files that follow two distinct conventions. For monomers that have suffered an atom name change in the remediation the REFMAC5 monomer library has been modified to include an alt_atom_id attribute for each atom. This is the OLD atom name and the atom_id is the remediated atom name. Note that these changes mean that pre-CCP4mg1.1 monomer library is incompatible with CCP4mg1.1 and subsequent program versions. The program will attempt to match atom names in a residue using both the atom_id and alt_atom_id but will not consider a mix of the two.
Note that for old style files with residue names 'A','C','G' the program checks for the presence of 'O2*' to decide whether a residue is DNA or RNA and then uses the monomer library files named 'AD','CD','GD' or 'AR','CR','GR' as appropriate.
The List monomers option on the Tools menu can be used to view the monomer library files.
CCP4mg uses the same atom energy types as REFMAC5. These have code names such as
'CT' or 'NR56' and are used to look up properties such as the atom hydrogen bonding capability or charge for surface potential calculations. The data is in the file
ccp4mg/data/ener_lib.cif
If you wish to modify this file you should make a copy to
your_home_directory/.CCP4MG/data/ener_lib.cif
If this file exists then CCP4mg will use it in preference to the standard distribution file. Beware that the ener_lib.cif used with CCP4mg may not be identical to that used by REFMAC5 - particularly, at the time of writing, it has additional information on charges for surface potential calculations.
Sometimes the name of a residue used in a coordinate file does not match the name used in the REFMAC5 monomer library. To help handle this the monomer library has a list of commonly used synonyms (i.e. alternative names) for residues in the file ccp4mg/data/mon_lib_list (look for 'synonym' to find the list). If you have a coordinate file with an unrecognised residue name you can either edit the mon_lib_list file to include the alternative name or you can use the Residue type assignment interface (on the Structure definition sub-menu of the model icon menu) to enter a synonym for a residue name. Additions to the mon_lib_list file will apply to all models loaded into CCP4mg but synonyms entered in the Residue type assignment interface will apply to just the one model.
Other data taken from files in the monomer library are tabulated below.
Data | CIF catagory | File |
Monomer name synonyms | _chem_comp_synonym | mon_lib_list |
Inter-residue links | _chem_link | mon_lib_list |
Properties of atoms | _lib_atom | ener_lib.cif |
The type of a residue (e.g. nucleic acid, solvent etc.) is necessary information for several program features, for example
The residue type is a function of the residue name (e.g. residue name 'ALA' implies type amino acid and residue name 'H2O' implies type solvent). The residue type is included in the REFMAC5 monomer library definition (see above). CCP4mg recognises the types:
CC4mg residue type | REFMAC5 monomer library type(s) |
amino acid | L-peptide,D-peptide,peptide |
nucleic acid | DNA,RNA,DNA/RNA |
saccharide | saccharide,pyranose,D-saccharid,L-saccharid |
solvent | solvent |
solute | non-polymer |
monomer | non-polymer |
If an imported coordinate file contains a residue which is not included in the REFMAC5 monomer library then it will not be properly recognised for atom selection. Another potential problem is that when protein or nucleic acid backbone is drawn as a ribbon there will be a gap in the ribbon at the site of an unknown residue type. One solution to this problem is to generate a monomer library file but a quicker option is to use the Residue type assignment option from the Structure definition sub-menu on the model icon menu. In the residue type assignment window you should type in the name of the residue and choose a residue type from the menu. The definitions entered here will override any definitions taken from the monomer library and will be saved in the .ccp4mg file to be restored in subsequent program runs.
The Residue type assignment interface can also be used to enter a residue name synonym.
.CCP4mg deems atoms to be bonded if:
The residue and atoms are matched to a monomer library file and the atoms are listed as bonded in that file.
The residue does not match any in the monomer library and the inter-atomic distance is less than the sum of the bonding radius for the two atoms multiplied by a safety factor of 1.2. The bonding radius is dependent on the atom element type and is taken from the file ccp4mg/data/elements.cif.
The atoms in different residues are closer than 2.4A and are listed as possible inter-residue links in the file ccp4mg/data/mon_lib_list.
Bonds can be added or deleted using the Add/delete bonds option on the Structure definition sub-menu of the model icon menu. Note also that if you want to show bonds or other inter-atomic interactions the Vectors graphical object may be useful.
By default inter-residue bonds are drawn if atoms in different residues are closer than 2.4A and are listed as possible inter-residue links in the file ccp4mg/data/mon_lib_list (in the section LIST OF LINKS). It is very possible that your structure has inter-residue bonds that are not listed in this file, particularly if the structure contains saccharides. If there are only a small number of bonds then they could be added with the Add/delete bonds tool described above. If there are many missing links then use the Add inter-residue links tool on the Structure definition sub-menu of the model icon menu.
CCP4mg needs to save data associated with a particular coordinate file - for example: user specified bonds, secondary structure and atom selection aliases. These are saved to a file in the same directory as the coordinate file and with the same name except that the file extension 'pdb' or 'cif' is replaced by 'ccp4mg'. Beware that moving or renaming the coordinate file without treating the ccp4mg file similarly may lead to lose of useful data.
A. A. Vagin, R. A. Steiner, A. A. Lebedev, L. Potterton, S. McNicholas, F. Long and G. N. Murshudov, (2004). REFMAC5 dictionary: organization of prior chemical knowledge and guidelines for its use. Acta Cryst.,D60, 2184-2195