1. Introduction to CCP4
programs for MR
The
target is an acylphosphatase-like domain of hydrogenase maturation factor HypF
from E.coli,
see Rosano et al, JMB, 321, 785 (2002).
HypF-ACP sulphate and
phosphate complexes have been deposited in PDB as 1gxt and 1gxu respectively.
We will
solve the hypF structure by molecular replacement,
using several programs and approaches and the native 1gxu dataset to 1.3 A
resolution, space group H32. The target has 91 residues and a Matthews
calculation strongly suggests only one molecule in the asymmetric unit.
N.B. hypF-1gxu-1gxt-HG_scaleit1.mtz
includes the data from 1gxu, 1gxt, the Hg derivative, and some experimental
phases based on the Hg sites. Do not forget to select the correct mtz-columns (FP1gxu, SIGF SIGFP1gxu) each time you define
the input mtz-file.
1.1. Checking the
data
We first use Sfcheck to
check a few things about the data:
1.
Select Data
Reduction and Analysis > Check Data Quality > Analysis with sfcheck to open the sfcheck task window.
2.
Enter a title.
3.
Make sure that Run
Rampage to analyse structure geometry and Run Procheck to analyse structure geometry are unselected (we do not
yet have any coordinates) and Run Sfcheck to analyse experimental data only is selected
4.
In the line MTZ in
select the file hypF-1gxu-1gxt-HG_scaleit1.mtz
5.
Select the labels F
FP1gxu, SIGF
SIGFP1gxu and Free
FREE
6.
Check that a suitable filename has been
generated for Sfcheck Output PS
7.
Keep all defaults, and click Run -> Run Now.
Sfcheck produces a
postscript file with some useful things (see under View
Files from Job):
á Anisotropy
of data (it is not very anisotropic)
á Overall B
from Wilson plot of 21.9 A**2
á Pseudo-translation
not detected (from analysis of the native Patterson
map)
á Also
check the log file - View Files from Job then View Job Results (new style) then click the Log File tab:
á This
includes the results of a twinning test: Perfect twinning test <I^2> /
<I>^2 : 2.0573
á A value
of 2.0 indicates untwinned data, whereas perfectly
twinned data would have a second moment of 1.5
1.2. Choice of
search models
The target is an acylphosphatase-like
domain. A search of the PDB reveals two acylphosphatases
with a sequence identity to the target of about 31%, viz. 1v3z and 1w2i. Each has
two chains in the asymmetric unit, either of which could be used as the basis
of a search model.
Normally you would use something like Chainsaw at this
point to prepare a search model from the template. As an exercise, we are going
to try MR straightaway. We will return to Chainsaw later before running Phaser.
Notes on Sequence Alignment
There are many ways of approaching this, and the
different tools will give slightly different assessments. The sequence identity
depends on the definitions used (i.e. treatment of gaps and alignment length),
the specific alignment technique, and whether bits have been chopped out of the
model.
1.3. Molrep Run 1
We will use chain B of 1v3z as the search model.
1.
Select the Molecular
Replacement module and open the Run Molrep -
auto MR task window.
2.
Enter a title.
3.
Do molecular replacement
should be
already selected.
4.
For Data
select the file hypF-1gxu-1gxt-HG_scaleit1.mtz
5.
Select the labels F
FP1gxu and SIGF
SIGFP1gxu
6.
For Model
select the file 1v3z_B.pdb
7.
(Optional) You can use an upper resolution cut
off of 3A to speed up the calculation, see folder Experimental
Data.
8.
Keep all defaults, and click Run -> Run Now.
When the job has finished, look at the log file (View Files from Job -> View Job Results (new style) -> Log File tab). Note the following:
á
Molrep automatically estimates:
INFO: expected
number of models : 1
INFO: V_model:
61.6% (of asymm. part of u.c.)
á which is
correct. The estimate may be unreliable when there are many monomers in the
asymmetric unit, in which case it can be set explicitly with the keyword NMON
(see folder Search Options in the Molrep
GUI).
á Molrep
checks whether or not an anisotropy correction is necessary:
INFO: Anisotropicy will not be used
á The first
table is a list of peaks of the Cross Rotation Function (CRF), sorted according
to their heights. This is followed by a plot showing which peaks are related.
á The
second table shows the best Translation Function (TF) for each of the CRF peaks
(scored according to the correlation coefficient * PKmax).
Other TF solutions can be viewed in the file View Files from Job -> Output
Files ... <proj_dir>_<job_no>_molrep.doc
á The final
table gives a list of solutions, sorted according to the score.
á Molrep
reports a contrast higher than 3.0. This contrast value suggests a correct
solution.
1.4. Molrep Run 2
In fact, we can make use of our knowledge of the
target, and this will often improve the solution. The search model has a
moderately low sequence identity with the target and therefore the majority of
the side chains are incorrect. Molrep can make use of the target sequence to
improve the search model.
1.
Select the previous job, and click ReRun Job
2.
Most of the parameters should be set correctly,
but you should change the title, and the name of the Solution
file, so that it is different from the first job.
3.
This time, input the target sequence file hypF_Ndom.seq in the
Sequence box.
4.
Click Run -> Run Now
Look at the log file of this job.
á After a
section about the input MTZ file, there are details of the sequence alignment
between the target sequence you have supplied and the sequence of the search
model (i.e. the PDB file).
á Molrep
reports a sequence identity of about 30%. This is lower than other estimates
because Molrep is more conservative in introducing gaps into the alignment.
á Molrep
outputs tables for the CRF and TF as before.
á
At this point it may not be apparent that the MR solution with the search
model modifications has improved. The benefits of model preparation will
become clearer when we refine the solutions.
1.5. Checking the
solution
The positioned model can be submitted for a few cycles
of automated refinement, then checked manually against 2mFo-DFc and mFo-DFc maps, using a graphics program such as Coot. Since
we have a good resolution dataset, the model can also be passed to ARP/wARP for
rebuilding. Refinement, validation and model re-building are covered in other
tutorials.
Here we will give a brief demonstration of how to refine the solution models using Refmac.
When the job has completed, double-click on the job name in the job list window to open the results page. For this example, we are only interested in making a quick assessment of whether or not MR has worked. To do this we will look at the R/R-free values before and after the 10 cycles of refinement. These are listed in the Result table.
Repeat the above steps using the output PDB file from the second Molrep job. Note, do not overwrite the MTZ and PDB output files from the first refinement job! Compare the R/R-free values for both jobs. You can clearly see that modifying the search model has greatly improved the results. Nevertheless, the best way to judge whether a solution is correct is to look at the electron density map. From the Refmac results page, you can launch Coot with the refined map and model loaded by clicking on the Coot button under Output Files.
The Molrep solution is related to the
deposited structure 1gxu by the symmetry operation -Y+2/3, X-Y+1/3, Z+1/3.
Comparison of the structures in CCP4mg or Coot shows that the beta sheet and one of the
two helices are well matched, but there are significant differences elsewhere.
In general, if we want to compare an MR solution to
the deposited structure, then we need to take into account possible symmetry
operations and possible changes of origin. Two solutions may be identical, even
if it is not obvious from a quick look in a graphics program. This can be
checked with the csymmatch utility:
1.
Select the Symmetry
match models task in module Coordinate
Utilities.
2.
Enter the MR solution PDB file as the Work PDB in, and the deposited structure (1gxu) as
Reference PDB in.
3.
Select Apply origin
shift and hand correction and run.
The log file reports the symmetry operator and change
of origin which give the best match, and a normalised
score for the match is reported. The output PDB file has this transformation
applied, and can be compared to the reference PDB file. Of course, usually
we don't have a deposited structure to compare with, but the same process is
useful to compare different MR solutions.
1.6. Chainsaw
Search models can also be prepared using Chainsaw.
Chainsaw takes an external sequence alignment, which can be generated by many
bioinformatics tools and/or manually adjusted. In this job, we will create a
model based on chain B of 1v3z, using a previously prepared alignment to the
target.
1. Select
the Molecular Replacement module and open
the Create Search Model task window in the Model Generation folder.
2. Enter a
title.
3. Leave Create search model using
Chainsaw unchanged.
4. Leave Prune non-conserved residues to gamma atom unchanged.
5. For PDB in select the file 1v3z_B.pdb
6. Use the
sequence alignment format PIR and for Alignment in select the file 1v3z_B_to_target.pir
7. Click Run -> Run Now
Chainsaw produces a coordinate file 1v3z_B_chainsaw1.pdb which is an edited version of
the input PDB file. 6 residues that do not align to the target sequence have
been deleted. Of the rest, 34 have been left unchanged and 50 have had their
side chains cut back to the gamma atom. The output PDB file uses the naming and
numbering of the target sequence.
Have a look at the log file:
¥
At the top, the alignment used is confirmed.
¥
Then there is a listing of all the model
residues, with the action applied (deleted, conserved, mutated).
¥
Finally, there is a summary of the changes made.
This includes the estimated sequence identity. Note that this is not unique,
but depends on the particular sequence alignment used.
Now repeat this exercise using the other search model,
based on chain A of 1w2i. We can overlap the two models and use the ensemble as
input to Phaser (in place of individual search models).
á For PDB in select the file 1w2i_A.pdb
á Use the
sequence alignment format PIR and for Alignment in select the file 1w2i_A_to_target.pir
1.7. Aligning the
models
These models can be aligned and the overlapped
structures used as input to Phaser.
1.
Select the Coordinate
Utilities module and open the Superpose
Molecules task window.
2.
Enter a title.
3.
Change mode to Superpose
using gesamt.
4.
Enter Moving 1w2i_A_chainsaw1.pdb
5.
Enter Fixed 1v3z_B_chainsaw1.pdb
6.
Enter PDB out
1w2i_A_to_1v3z_B_chainsaw1.pdb
7.
Click Run -> Run Now
The 1w2i_A_chainsaw1.pdb has been moved to overlap
1v3z_B_chainsaw1.pdb. The log file shows the transformation used, and gives an
RMSD = 0.305 A between 84 C-alpha atoms of the superposed structures.
UPDATE: Ensemble models can also be generated using the new program Ensembler (Molecular Replacement->Model Generation->Ensembler).
1.8. Phaser
Using the superposed search models generated by
Chainsaw, we will now use Phaser to solve hypF.
Phaser is designed to use ensembles of models to improve the signal.
1.
Select the Molecular
Replacement module and open the Phaser MR
task window.
2.
Enter a title.
3.
Leave Mode for
molecular replacement automated search
unchanged.
4.
For MTZ in
select the file hypF-1gxu-1gxt-HG_scaleit1.mtz,
and select the labels F FP1gxu and SIGF
SIGFP1gxu
5.
In the folder Define
ensembles ..., enter the PDB #1 1v3z_B_chainsaw1.pdb. Set the similarity to be sequence identity 0.36
6.
To add another model click Add superimposed PDB file to the ensemble, enter
the PDB #2 1w2i_A_to_1v3z_B_chainsaw1.pdb.
Set the similarity to be sequence identity 0.38
7.
In the folder Define
composition of the asymmetric unit, select Total
scattering determined by components in
asymmetric unit, and for the SEQ file
select the file hypF_Ndom.seq,
and leave Number in asymmetric unit 1 unchanged.
8.
In the folder Search
parameters, select Perform search using
ensemble1
9.
Click Run -> Run Now
Have a look at the log file. The description below relates to an
older version of Phaser therefore take it as a general explanation. The current
Phaser will make two attempts at structure solution, with resolution limits of
about 3A and 1.3A. Therefore the log file will contain two sets of tables.
¥
After details about the input parameters, there
is information on the anisotropy correction used (compare to the output of Sfcheck above). This is followed by a Matthews coefficient
calculation.
¥
Phaser then calculates a Fast Rotation Function
(FRF). It finds 9 solutions greater than 75% of the top peak (this threshold can be
changed with the option Rotation search peak
selection in the folder Additional
parameters).
¥
These peaks are passed to the Fast Translation
Function (FTF). Detailed results for each rotation peak are given, followed by
a summary table: Beware - these numbers may differ
slightly for different versions of Phaser.
Translation Function Table -------------------------- SET ROT*deep Top (Z) Second (Z) Third (Z) Ensemble SpaceGroup 1 1 19.9 5.46 - - - - ensemble H 3 2 1 2 -5.9 5.28 -8.7 4.29 -11.3 4.43 ensemble H 3 2 1 3 -11.6 5.40 -19.6 4.74 -19.6 4.75 ensemble H 3 2 1 4 - - - - - - ensemble H 3 2 1 5 -15.4 4.63 -24.2 4.59 -31.0 4.64 ensemble H 3 2 1 6 -15.6 4.45 -18.6 4.61 -26.9 4.54 ensemble H 3 2 1 7 - - - - - - ensemble H 3 2 1 8 - - - - - - ensemble H 3 2 1 9 -19.8 4.64 -27.2 4.69 - - ensemble H 3 2 1 10* -10.6 5.30 - - - - ensemble H 3 2 1 11* -46.6 4.63 -48.8 4.79 - - ensemble H 3 2 1 12* -19.3 4.82 - - - - ensemble H 3 2 1 13* - - - - - - ensemble H 3 2 1 14* -25.4 4.86 -35.8 4.79 - - ensemble H 3 2 1 15* - - - - - - ensemble H 3 2 1 16* - - - - - - ensemble H 3 2 1 17* - - - - - - ensemble H 3 2 1 18* - - - - - - ensemble H 3 2 1 19* -25.3 4.87 -31.3 4.92 - - ensemble H 3 2 1 20* -16.9 4.97 - - - - ensemble H 3 2 1 21* -20.7 5.60 - - - - ensemble H 3 2 1 22* - - - - - - ensemble H 3 2 1 23* -16.9 5.17 - - - - ensemble H 3 2 1 24* - - - - - - ensemble H 3 2 1 25* -21.8 4.91 - - - - ensemble H 3 2 1 26* - - - - - - ensemble H 3 2 1 27* - - - - - - ensemble H 3 2
The first trial
(based on the 1st peak of the FRF) gives a clear solution, with a good Z-score,
and a single significant peak of the FTF.
¥
Next is a check on packing for this good
solution. Phaser finds 2 clashes between a C-alpha and a C-alpha of a
symmetry-related molecule. Because the threshold is set to 4 clashes in total
(5% of trace atoms), this
solution is accepted.
¥
Finally, Phaser refines the MR solution, and
displays the improvement in the log-likelihood gain (LLG).
¥
Phaser outputs a .sol file containing the MR
solution, a .pdb file containing the correctly
positioned model, and .mtz file containing the
original data plus a calculated structure factor from the model and columns of
map coefficients.
Checking the solution:
¥
Direct comparison of the Phaser solution and the
deposited structure 1gxu using Coot may or may not be possible. This is because
the spacegroup H32 has two possible origins (see $CHTML/alternate_origins.html). If both structures
will be on the same origin, the comparison will show that the beta sheet and
one of the two helices are well matched, but there are significant differences
elsewhere.
¥
The solution .pdb and
.mtz files can be loaded to Coot (use Coot button in the Qt result page) to inspect
the model against the 2Fo-Fc map. This shows good agreement in most places, but
also highlights problem areas.
¥
Do 20 cycles of restrained refinement in REFMAC
(Run Refmac5 task in module Refinement) and check the model and maps.
¥
Optionally, run ACORN which removes phase bias (Acorn task in
module Program List).
¥
Optionally, rebuild
in arp/warp using the ACORN phases as restraints.
1.9. MrBUMP
You have now prepared three search models based on
1v3z, and used Molrep and Phaser to do the molecular replacement. These steps,
and the initial discovery of 1v3z and other related proteins, are automated in
the program MrBUMP.
1.
Depending on what you want to do, MrBUMP can
make use of web-based services. The following tutorial deliberately does not
make use of the web, so that it can be run anywhere. At the end of the tutorial,
there are suggestions for web-based options. The use of a few local PDB
template files also means that the tutorial is fairly quick. Beware that a full
run of MrBUMP might take longer than is reasonable for a tutorial.
2.
Select the Molecular
Replacement module and open the Run MrBUMP
task window.
3.
Enter a title.
4.
Leave Program Mode
Model search and Molecular Replacement
unchanged.
5.
For SEQ in
select the file hypF_Ndom.seq
6.
For MTZ in
select the file hypF-1gxu-1gxt-HG_scaleit1.mtz,
and select the labels F FP1gxu, SIGF SIGFP1gxu and Free
FREE
7.
Leave the rest of the files folder unchanged,
and move to the Template Search Options
folder.
8.
Un-check Do a FASTA
search for possible template models. Instead we are going to use some
known local templates.
9.
Un-check Update
local copies of search databases
10.
Select Multiple
alignment program Mafft
if available
11.
Un-check all Additional
search methods, i.e. SCOP, PQS and SSM
12.
The folder User
specified search models will have opened. Because we have switched off
all search options, we are required to use local files. Click on Add PDB file 3 times to add 3 local PDB files. The
first file is 1w2i_A.pdb and Chain identifier A.
The second file is 1v3z_B.pdb and Chain identifier B.
The third file is 2acy.pdb and Chain identifier A.
13.
In the folder Search
Model Preparation Options, keep the default which is to use Molrep, Chainsaw
and Sculptor. This means there will be 9
search models in total. Turn one or two off to make the job quicker.
14.
In the folder Molecular
Replacement and Refinement Options, keep Molrep
and switch off Phaser. If you want, you can
use Phaser instead of Molrep or both.
15.
In the folder Model
Building and Phase Improvement, select the model building programs to
try after MR and refinement. By default Buccaneer is set but depending on your
installation you may be able to try ARP/wARP and c-alpha tracing with SHELXE as
well. Model building can help determine if MR has been successful.
16.
Click Run -> Run Now
After a few minutes, have a look at the MrBUMP log
file (do not wait for the job to finish).
¥
At the top, it echoes the options selected.
¥
Under Target
Information, it estimates that there is 1 molecule in the target
asymmetric unit.
¥
Under Template Model
Search Results, it lists the three local files entered. They are named
"loc0", "loc1", "loc2" for internal use.
¥
Under Search Model
Preparation Results, details of the Molrep, Chainsaw and Sculptor
methods are given.
¥
Finally, the section Molecular
Replacement and Refinement gives details for every MR job tried.
By default, it will finish when it finds a solution.
For example, it may finish with model loc1_B_MOLREP, which corresponds to
template 1v3z_B.pdb with a search model created with the Molrep editing features.
The Rfree drops from 0.549 to 0.436 (precise numbers
may vary!) indicating that the MR solution is refinable,
and likely to be correct. If you want to try all search models in MR (a good
idea unless you are in a rush), select Finish when
all of the search models have been tried in MR
in the folder Molecular Replacement and Refinement
Options.
If there are no problems accessing web-based services,
then you can search for templates rather than use local PDB files. Run as
above, with the following differences:
1.
In the folder Template
Search Options, check Do a FASTA search for
possible template models.
2.
Check Run the FASTA
search locally. This refers just to the search step - the PDB files are
still downloaded from the web.
3.
Check all of the Additional
search methods, i.e. SCOP, PQS and SSM
4.
Do not enter anything into the folder User specified search models.
For comparison, here are some example results from
MrBUMP (you may not get exactly the same):
PDB
chain |
sequence
identity |
source / release
date |
Rfree
from MrBUMP |
1w2i_B |
0.310 |
OCA - released Apr
2005 |
chainsaw 0.447 molrep 0.442 |
1w2i_A |
0.310 |
OCA |
chainsaw 0.471 molrep 0.527 |
1v3z_B |
0.310 |
OCA - released Mar
2005 |
chainsaw 0.430 molrep 0.453 |
1v3z_A |
0.310 |
OCA |
chainsaw 0.474 molrep 0.470 |
2bje_G |
0.287 |
OCA - released Nov
2005 |
chainsaw 0.458 molrep 0.442 |
2bje_E |
0.287 |
OCA |
chainsaw 0.468 molrep 0.486 |
2bje_C |
0.287 |
OCA |
chainsaw 0.491 molrep 0.481 |
2bje_A |
0.287 |
OCA |
chainsaw 0.448 molrep 0.443 |
2bjd_B |
0.287 |
OCA - released Nov
2005 |
chainsaw 0.468 molrep 0.529 |
2bjd_A |
0.287 |
OCA |
chainsaw 0.544 molrep 0.466 |
1y9o_A |
0.275 |
OCA - released Jan
2006 (NMR) |
(not tried) |
1ulr_A |
0.286 |
OCA - released Nov
2004 |
chainsaw 0.476 molrep 0.471 |
2acy_A |
0.264 |
SSM - released Nov
1997
(authors tried) |
chainsaw 0.539 molrep 0.564 |
1.10. Other search
models for hypF
Another possible search model is chain A of 1w2i. This
is a different structure of the same protein as 1v3z. You may try repeating the
above steps using 1w2i_A.pdb as the search
model.
You should find that this is more difficult! Modifying
the search model using the target sequence is now necessary. Adjusting the
resolution limits also helps.
Check your solutions against those produced from
1v3z_B.