MRBUMP (CCP4: Supported Program)
NAME
MRBUMP
- automated search model generation and automated molecular replacement
SYNOPSIS
mrbump hklin
foo_in.mtz
seqin
foo.seq
hklout
foo_out.mtz
xyzout
foo.pdb
[Key-worded input file]
DESCRIPTION
mrbump has three main parts:
- Automated search for molecular replacement search models for a target
structure.
- Preparation of these template models for molecular replacement using
several different methods.
- Running molecular replacement using these search models and
testing whether the resulting solutions will refine.
Note that mrbump relies heavily on making calls to web-based applications. If your sequence
information is any way sensitive it is recommended that you use the option to run the fasta
search locally rather than via the OCA web application. This will require installing fasta34
on the users local machine. The software can be downloaded from the EBI website
here.
DEPENDENCIES
Before mrbump can be used, the following dependencies should be installed on the local system.
- Mandatory:
- CCP4 6.0 or later,
- Python 2.3 or later,
- Mafft or Clustalw,
- Gnuplot.
- Optional:
- Fasta34,
- Perl + SOAP-Lite module (for SSM search).
mrbump also requires that the local machine has a connection to the internet (directly or via a
proxy).
INPUT AND OUTPUT FILES
HKLIN
Input structure factor file for target structure. Must include a FreeR_flag column.
SEQIN
Input sequence file for the target structure. Can be in PIR or Fasta format or
it can just contain the amino acid sequence.
HKLOUT
MTZ file from Refmac5 refinement of the top MR solution.
XYZOUT
PDB coordinate file from Refmac5 refinement of the top MR solution.
KEYWORDED INPUT
LABIN <program label>=<file label>...
Mandatory. This keyword tells the program which columns in the MTZ file
should be used as native structure factors, sigmas, and FreeR flag.
Available program labels are F, SIGF and FreeR_flag.
JOBID <job name>
Mandatory. This is a name for the job. A directory called "search_JOBID" will be
created in the directory in which mrbump is started from. This directory will contain all of the
downloaded files and results.
ROOTDIR <directory>
The root directory where the search folder will be created.
[Default Current working directory]
NMASU <number>
The number of molecules in the asymmetric unit. Leave this blank for automatic calculation.
[Default Automatic]
MDLNUM <number>
The number of template models to be prepared for molecular replacement.
[Default 50]
MRNUM <number>
The number of prepared models to be used molecular replacement.
[Default 20]
ENSEMNUM <number>
The number of prepared models to be used in a Phaser Ensemble.
[Default 5]
IGNORE <pdb id 1> <pdb id 2>...
A list of PDB ID codes to be ignored in the homologue search. Used for development purposes.
MRPROGRAM [ Amore | Molrep | Phaser ]
Name of the molecular replacement program to be used as the first.
[Default Molrep]
MAPROGRAM [ Mafft | Clustalw ]
Name of the sequence alignment program to be used to do multiple alignment of the template structure
sequences and the target structure sequence.
[Default Mafft]
MDLDPDBCLP [ True | False ]
If true models will be prepared for MR using the PDBclip method. With this method, the waters and hydrogens
are removed from the coordinate file and the most probable side-chain confirmations are selected. If chain ID's
are missing they are added.
[Default True]
MDLPLYALA [ True | False ]
If true Polyalanine models will be prepared for the MR step. All side-chains are removed from the PDB files.
[Default True]
MDLMOLREP [ True | False ]
If true models will be prepared using Molrep. Molrep does a sequence alignment of the target sequence and
the template sequence and prunes the template structure file accordingly.
[Default True]
MDLCHAINSAW [ True | False ]
If true models will be prepared using Chainsaw. Chainsaw takes in a sequence alignment of the target sequence and
the template sequence and prunes the template structure file accordingly.
[Default True]
DEBUG [ True | False ]
If true mrbump will give a more verbose output. Also, temporary directories will not be
deleted at the end of the job.
[Default False]
SSMSEARCH [ True | False ]
If true mrbump will use the top match from the sequence-based search in a secondary structure-based
search to find more potential homologues. Set to false by default. Requires perl and the perl SOAP-Lite
module to be installed.
[Default False]
SCOPSEARCH [ True | False ]
If true mrbump will use the SCOP database to look for individual domains in the template structures
found in the sequence-based and secondary structure-based searches.
[Default True]
PQSSEARCH [ True | False ]
If true mrbump will use the PQS service at the EBI to find more multimers based on the template
structures found in the sequence-based and secondary structure-based searches.
[Default True]
FASTALOCAL [ True | False ]
If true, the fasta sequence-based search will be carried out locally rather than via the OCA web-interface.
This requires that the user have fasta34 installed on there system. This can be downloaded from the EBI
site here.
[Default False]
PACK <number>
The number of clashes that Phaser will tolerate.
[Default 5]
NCYC <number>
The number of cycles of restrained refinement to use in Refmac.
[Default 30]
UPDATE [ True | False ]
If true, the search database files will be tested at the start of the job to see if they are out of date
with respect to those available from the EBI website. If they are found to be out of date, the latest version
will be downloaded.
[Default True]
ONLYMODELS [ True | False ]
If true, only the search models will be generated. The program will exit before any Molecular Replacement
is carried out.
[Default False]
CLUSTER [ True | False ]
If true, the model preparation and molecular replacement jobs will be farmed out to a cluster. Currently
only works for Sun Grid Engine enabled clusters.
[Default False]
END
End keyworded input.
EXAMPLE KEYWORD INPUT FILES
Simple example with minimal input using default values:
LABIN F=F SIGF=SIGF FreeR_flag=FreeR_flag
JOBID MY_JOB_1
|
A more elaborate example:
LABIN F=FP SIGF=SIGFP FreeR_flag=FREE
JOBID MY_JOB_2
MDLNUM 20
MRNUM 10
ENSEMNUM 5
IGNORE 1smw 1smm 1smu
MRPROGRAM molrep
MAPROGRAM mafft
DEBUG true
CLUSTER false
SCOPSEARCH true
SSMSEARCH true
PQSSEARCH true
END
|
PROGRAM OUTPUT
Once a job has been started a user may view the current status of the job via the output log file
or via the results.html web page which is created in the directory
<ROOTDIR>/search_<JOBID>/results and is updated after each stage in the process. A set of search
models is first generated and these are fed to the MR/refinement stage in sequence where the ordering
depends on the alignment score of the template sequence against the target sequence. If a suitable
solution is found, i.e. a model that refines well, the job will terminate and the final results
will be displayed. The resulting refined PDB model and MTZ output from Refmac are made available to the user for
further model building.
AUTHORS
Ronan Keegan, Daresbury Laboratory, UK,
Martyn Winn, Daresbury Laboratory, UK
ACKNOWLEDGEMENTS
Norman Stein, Pryank Patel.
MrBUMP Program References
Any publication arising from use of MrBUMP should
include the following reference:
R.M.Keegan and M.D.Winn (2006) in preparation
In addition, authors of specific programs should be referenced where
applicable:
- CCP4
- Collaborative Computational Project, Number 4. (1994), "The CCP4 Suite: Programs
for Protein Crystallography". Acta Cryst. D50, 760-763
- FASTA
- W. R. Pearson and D. J. Lipman (1988), "Improved Tools
for Biological Sequence Analysis", PNAS 85, 2444-2448
- SSM
- E.Krissinel and K.Henrick (2004), "Secondary-structure matching (SSM), a new tool
for fast protein structure alignment in three dimensions"
Acta Cryst. D60, 2256-2268
- SCOP
- A.G.Murzin, S.E.Brenner, T.Hubbard & C.Chothia (1995), J.Mol.Biol.,
247, 536-540
- MAFFT
- K. Katoh, K. Kuma, H. Toh and T. Miyata (2005)
"MAFFT version 5: improvement in accuracy of multiple sequence alignment"
Nucleic Acids Res. 33, 511-518
- CLUSTALW
- Chenna, Ramu, Sugawara, Hideaki, Koike,Tadashi, Lopez, Rodrigo,
Gibson, Toby J, Higgins, Desmond G, Thompson, Julie D. (2003)
"Multiple sequence alignment with the Clustal series of programs"
Nucleic Acids Res 31, 3497-500
- CHAINSAW
- N.D.Stein (2006) in preparation
- MOLREP
- A.A.Vagin & A.Teplyakov (1997) J. Appl. Cryst. 30, 1022-1025
- PHASER
- McCoy, A.J., Grosse-Kunstleve, R.W., Storoni, L.C. & Read, R.J. (2005).
"Likelihood-enhanced fast translation functions" Acta Cryst D61, 458-464
- REFMAC
- G.N. Murshudov, A.A.Vagin and E.J.Dodson, (1997) "Refinement of Macromolecular
Structures by the Maximum-Likelihood Method" Acta Cryst.
D53, 240-255