We will solve the hypF structure by molecular replacement, using several programs and approaches. Other MR examples can be found at the end of this tutorial, and at:
When this tutorial is obtained as part of the CCP4 distribution, $MR_TUTORIAL corresponds to $CCP4/examples/mr_tutorial_2006
Target is the acylphosphatase-like domain of hydrogenase maturation factor HypF from E.coli, see Rosano et al, JMB, 321, 785 (2002). HypF-ACP sulphate and phosphate complexes deposited as 1gxt and 1gxu respectively.
This protein has a Hg derivative. You have processed this data. We have prepared a reflection file for you including the data from 1gxu, 1gxt, the Hg derivative, and some experimental phases based on the Hg sites.
There is native data in H32 to 1.3 A resolution. The target has 91 residues and a Matthews calculation strongly suggests only one molecule in the asymmetric unit.
We first use Sfcheck to check a few things about the data:
Sfcheck produces a postscript file with some useful things:
Also check the log file View Files from Job then View Log File:
Perfect twinning test <I^2> / <I>^2 : 2.0573A value of 2.0 indicates untwinned data, whereas perfectly twinned data would have a second moment of 1.5
There are many ways of approaching this, and the different tools will give slightly different assessments. The sequence identity depends on the definitions used (i.e. treatment of gaps and alignment length), the specific alignment technique, and whether bits have been chopped out of the model.
The target is an acylphosphatase-like domain. A search of the PDB reveals two acylphosphatases with a sequence identity to the target of about 31% - 1v3z and 1w2i. Each has two chains in the asymmetric unit, either of which could be used as the basis of a search model.
Normally you would use something like Chainsaw at this point to prepare a search model from the template. As an exercise, we are going to try MR straightaway. We will return to Chainsaw later before running Phaser.
We will use chain B of 1v3z as the search model (file $MR_TUTORIAL/data/hypF/1v3z_B.pdb).
INFO: expected number of monomers : 1 Vmol: 61.4%which is correct. The estimate may be unreliable when there are many monomers in the asymmetric unit, in which case it can be set explicitly with the keyword NMON (see folder Search Parameters in the Molrep GUI).
INFO: Anisotropicy will not be used
INFO: contrast is good enough. Stop this runbased on a contrast of 3.29 (the precise value you get will depend on Molrep version, resolution limits used, etc.)
In fact, we able to improve on this solution. The search model has a moderately low sequence identity with the target and therefore the majority of the side chains are incorrect. Molrep can make use of the target sequence to improve the search model.
Look at the log file of this job.
INFO: contrast is good enough. Stop this run
The top MR solution is applied to the input coordinates, and the positioned PDB file is written out as 1v3z_B_molrep2.pdb. The contrast indicates that this is probably a correct solution, but this should now be checked! (In fact, the Molrep solution is related to the deposited structure 1gxu by the symmetry operation -Y+2/3, X-Y+1/3, Z+1/3 Comparison of the structures in CCP4mg shows that the beta sheet and one of the two helices are well matched, but there are significant differences elsewhere.)
This model should be submitted to 20 cycles of automated refinement, then checked manually against 2mFo-DFc and mFo-DFc maps, using a graphics program such as Coot. Since we have a good resolution dataset, the model can also be passed to ARP/wARP for rebuilding.
Refinement, validation and model re-building will be covered in more detail in a later tutorial.
Search models can also be prepared using Chainsaw. Chainsaw takes an external sequence alignment, which can be generated by many bioinformatics tools and/or manually adjusted. In this job, we will create a model based on chain B of 1v3z, using a previously prepared alignment to the target.
Chainsaw produces a coordinate file 1v3z_B_chainsaw1.pdb which is an edited version of the input PDB file. 6 residues that do not align to the target sequence have been deleted. Of the rest, 34 have been left unchanged and 50 have had their side chains cut back to the gamma atom. The output PDB file uses the naming and numbering of the target sequence.
Have a look at the log file:
Now repeat this exercise using the other search model, We can overlap the two models and use the ensemble as input to Phaser.
These models can be aligned and the overlapped structures used as input to Phaser.
The 1w2i_A_chainsaw1.pdb has been moved to overlap 1v3z_B_chainsaw1.pdb.
Using the search models generated by Chainsaw, we will now use Phaser to solve hypF. Phaser is designed to use ensembles of models to improve the signal.
Have a look at the log file:
Fast Translation Function Table: Space Group R 3 2 -------------------------------------------------- #SET #TRIAL Top (Z) Second (Z) Third (Z) Ensemble 1 1 25.09 ( 4.62) 24.74 ( 4.58) 23.07 ( 4.37) ensemble1 1 2 45.14 ( 7.13) - - - - ensemble1 1 3 23.91 ( 5.00) 21.21 ( 4.63) 20.14 ( 4.49) ensemble1 1 4 23.41 ( 4.53) 23.25 ( 4.51) 19.77 ( 4.08) ensemble1 1 5 21.86 ( 4.75) 21.46 ( 4.70) 20.37 ( 4.56) ensemble1 1 6 21.01 ( 4.78) 19.57 ( 4.61) 19.52 ( 4.60) ensemble1 ---- ------The second trial (based on the 2nd peak of the FRF) gives a clear solution, with a good Z-score, and a single significant peak of the FTF.
Checking the solution:
You have now prepared three search models based on 1v3z, and used Molrep and Phaser to do the molecular replacement. These steps, and the initial discovery of 1v3z and other related proteins, are automated in the program MrBUMP.
Depending on what you want to do, MrBUMP can make use of web-based services. The following tutorial deliberately does not make use of the web, so that it can be run anywhere. At the end of the tutorial, there are suggestions for web-based options. The use of a few local PDB template files also means that the tutorial is fairly quick. Beware that a full run of MrBUMP might take longer than is reasonable for a tutorial.
After a few minutes, have a look at the MrBUMP log file (do not wait for the job to finish).
If there are no problems accessing web-based services, then you can search for templates rather than use local PDB files. Run as above, with the following differences:
For comparison, here are some example results from MrBUMP (you may not get exactly the same):
PDB chain | sequence identity | source / release date | Rfree from MrBUMP |
---|---|---|---|
1w2i_B | 0.310 | OCA - released Apr 2005 | chainsaw 0.447 molrep 0.442 |
1w2i_A | 0.310 | OCA | chainsaw 0.471 molrep 0.527 |
1v3z_B | 0.310 | OCA - released Mar 2005 | chainsaw 0.430 molrep 0.453 |
1v3z_A | 0.310 | OCA | chainsaw 0.474 molrep 0.470 |
2bje_G | 0.287 | OCA - released Nov 2005 | chainsaw 0.458 molrep 0.442 |
2bje_E | 0.287 | OCA | chainsaw 0.468 molrep 0.486 |
2bje_C | 0.287 | OCA | chainsaw 0.491 molrep 0.481 |
2bje_A | 0.287 | OCA | chainsaw 0.448 molrep 0.443 |
2bjd_B | 0.287 | OCA - released Nov 2005 | chainsaw 0.468 molrep 0.529 |
2bjd_A | 0.287 | OCA | chainsaw 0.544 molrep 0.466 |
1y9o_A | 0.275 | OCA - released Jan 2006 (NMR) | (not tried) |
1ulr_A | 0.286 | OCA - released Nov 2004 | chainsaw 0.476 molrep 0.471 |
2acy_A | 0.264 | SSM - released Nov 1997 (authors tried) | chainsaw 0.539 molrep 0.564 |
Another possible search model is chain A of 1w2i. This is a different structure of the same protein as 1v3z. Try repeating the above steps using $MR_TUTORIAL/data/hypF/1w2i_A.pdb as the search model.
You should find that this is more difficult! Modifying the search model using the target sequence is now necessary. Adjusting the resolution limits also helps.
Check your solutions against those produced from 1v3z_B.
As an exercise create an ensemble of 1v3z and 1w2i (use Superpose Molecules in Coordinate Utilities) and use PHASER to position the ensemble.
As an exercise use Molrep or Amore with the Hg phases to solve the structure using the phased translation function. You will need to modify the input to Amore from the mtz file to read FP=FP1gxu PHI=PHIB_mlphare1 W=FOM_mlphare1
See separate document for 3 more example MR problems.