CCP4 Tutorial 2: More Molecular Replacement

Three Other Test Cases

  1. s100: The structure of HUMAN S100A12. Spacegroup H3; resolution 2.5 A.
  2. 1tj3: A sucrose-phosphatase (spp) from Synechocystis sp. pcc6803 in a closed conformation Spacegroup P6522, Resolution 2.8A.
  3. pst: An immunoglobulin variable domain, which has not been deposited. An extremely difficult example. Spacegroup P21, Resolution 3.5A.

You should create a new ccp4i project directory for each test case.

When this tutorial is obtained as part of the CCP4 distribution, $MR_TUTORIAL corresponds to $CCP4/examples/mr_tutorial_2006

s100 - two molecules in asymmetric unit

s100: Checking the data

The files you will need are in the directory $MR_tutorial/data/s100/

The Cell Content Analysis task (see the Analysis folder in the Molecular Replacement module) indicates that there are likely to be 2 molecules in the asymmetric unit. The molecular weight is about 9700, with 91 residues in the molecule.

Run Sfcheck task to check the data. This shows no NCS translational patterson peak and minimal twinning

You can run a self rotation function from the Molecular Replacement module, using the Self RF in polars task in the Analysis folder, or as an option of MOLREP or Amore. The Self RF in polars task outputs all kappa sections in the .plt file - use the in-built xplot84driver viewer. Molrep outputs kappa = 60, 90, 120 and 180 sections in the .ps output file (you may have to enter a suitable postscript viewing program for your PC under your System Administration folder). The self rotation function suggests that there is a non-crystallographic two-fold axis approximately perpendicular to the crystallographic 3-fold axis (which lies along omega = 0) at omega ~90, phi ~ 20 and Kappa ~ 180.

Notes on self rotation functions

The self-rotation function can give independent information about the contents and organisation of the asymmetric unit.

The radius of integration should be approximately the diameter of search model. For MOLREP this can be reset in the "Parameters for self-rotation Function" folder. In the s100 example the radius derived automatically by MOLREP based on the unit cell parameters is reasonable and there is no need to reset it manually.

It may also provide some knowledge of the oligomeric state of the unknown structure and suggest which model oligomer could be used as a search model.

If it is likely that the new structure has point group symmetry, the NCS operators from the self-rotation function can be used in the Locked Rotation Function.

However self-rotation results can be very confusing or misleading when there is high crystallographic symmetry as well as NCS.

s100: Selecting the model

There are many examples of s100 structures.

For this tutorial we have chosen 1irj.pdb, which has 46% identity. (See $MR_tutorial/data/s100/s100-1irj.oca for alignment - note 1irj has more residues than s100, so we will use CHAINSAW to prune it.)

The EBI PISA site (http://www.ebi.ac.uk/msd-srv/prot_int/pistart.html) suggests that chains A and B of this structure forms a tight dimer. You can download the "assembly" of Chains A and B from the site.

We will use both the monomer $MR_tutorial/data/s100/1irjA.pdb and the dimer $MR_tutorial/data/s100/1irjAB.pdb as possible search models.

Another choice for a model is 1mho.pdb with identity 38% (see $MR_tutorial/data/s100/s100-1mho.oca for alignment).

This structure is in spacegroup C2221, and forms a dimer between the deposited molecule and its crystallographic symmetry partner generated by the operator x,-y,-z. You can once again download the assembly from the EBI PISA site.

Or alternatively: Use task "Edit PDB file" under module " Coordinate utilities" to generate this dimer. You need to select Use "pdbset" to " Generate chains via symmetry operators". The input pdb is $MR_tutorial/data/1mho_nohoh_monomer.pdb. Enter the symmetry operators x,y,z and x,-y,-z, and rename the chains A and B. Call the dimer models 1mho_nohoh_dimer.pdb.

Use Create Search Model task (see folder Model Generation in the Molecular Replacement module) to edit these models using Chainsaw, ready for molecular replacement.

Notes on the molecular replacement searching technique for two copies of the model using MOLREP and PHASER

By default, the programs check whether the data are anisotropic, and if it is so, performs anisotropic correction (in this example the anisotropy is not significant).

By default MOLREP will use the molecular weight of the model to estimate how many copies of the model are likely to be in the asymmetric unit, assuming a solvent content of about 50% (this can be overridden by setting NMOL).

The programs then carry out the rotation function and use the conventional translation function to position one monomer.

Phaser now recalculates the rotation function modifying the data to take account of the found monomer, and does a translation function to find the second, (third and so on). It tests many solutions for the first monomer, and can be slow.

MOLREP does not recalculate a rotation function, and by default, only fixes the best of the first solutions. This means it is faster but can sometimes miss a solution.

s100: Finding the solution

There is no spacegroup ambiguity (spacegroup H3). We can search with

  1. the dimer of 1irjAB, or 1mho_nohoh_dimer
  2. or search for first one molecule, then the second, using models 1irjA.pdb or 1mho_nohoh_monomer
  3. or use MOLREP to fit one molecule, with a second positioned to obey the NCS 2-fold found in the self-rotation. This is called a Locked Rotation Function. To run this, select Locked Rotation Function under folder Search Parameters in the Molrep interface, enter a self rotation function file from an earlier job, and request to use 1 peak from the self rotation function.

Notes on the Locked Rotation Function

The locked rotation function is applicable when the NCS operators form a point group (maybe together with a subset of operators of crystal point group).

In many cases with several copies of the model in the asymmetric unit, correct rotation function peaks are very low in the list and may not be tested by the translation function. These are the cases where locked-rotation function could help.

It applies selected NCS operators output by the self rotation function to all symmetry equivalents of the Cross Rotation Function peaks, and averages any sets which are consistent with selected NCS operator. This should enhance the correct peaks and reduce false ones. For s100, the use of the locked rotation function clearly improves the contrast in the orientation search. The correct peaks move to the first and second position.

Note that there are no significant changes in the Translation Function scores (as it should be).

Notes on generating the output model (MOLREP)

If there are more than one monomer found, MOLREP tests all their symmetry equivalents to detect multimers. ( XXX Or is it only to identify dimers?) If a dimer is detected, the coordinates of the dimer are written in a separate pdb-file. If there is more than one dimer in the AU, the found dimer can be further used in consequential runs. In general, use of an oligomer increases the probability to find solution, provided that the search oligomer is similar to that in the unknown structure.

Notes on Rigid Body refinement

Phaser does a rigid body refinement by default using its maximum likelihood weighted scoring function. XXX Should we comment on this? XXX.

For MOLREP the gain in CC and R can seem small, but these are not good criteria, when the model is still far from the real structure.

Amore does Rigid Body fitting by default and is extremely effective.

In all cases the starting model for the consequential restrained refinement is improved, and this improvement is sometimes crucial in terms of interpretation of the map after the restrained refinement.

Notes on refinement of intermediate solutions

When searching for multiple copies of a model, or searching for the same model in two different crystal forms, it often helps to do a preliminary refinement of the partial solution. That refined model is then used as the "fixed" molecule when looking for further domains, and if there is more than one copy of it, as the search model for the next stage.

The multi-domain example. 1tj3 Spacegroup P6522 - Data resolution 2.8A

1tj3: Checking the data

The files you will need are in the directory $MR_tutorial/data/itj3/

Matthews suggests one molecule in the asymmetric unit.

There is a spacegroup ambiguity; it could be either P6522 or P6122

Sfcheck shows no particular problems with data.

1tj3: Selecting the model

This is a sucrose-phosphatase (spp) from Synechocystis sp. pcc6803 in a closed conformation

There are several structures available in both the open and closed domain.

As an exercise we will choose a open domain model with 100% sequence identity, 1so2.pdb There are two obvious domains (check this with some graphics) so we can split the model into See $MR_tutorial/data/1tj3/1so2-domain1 and See $MR_tutorial/data/1tj3/1so2-domain2.

1tj3: Finding the solution

The spacegroup cannot be selected from the absences alone; P65 2 2 and P61 2 2 would both require that only reflections along the c* axis; 0 0 6n ; were observed. The rotation function solution requires knowledge of the point group only - it is Patterson based, but the translation search uses all the listed symmetry operators. You should check both spacegroups and see which gives the best result for the translation search. ( If you are a very cautious person, you may wish to check all spacegroups consistent with the point group - in this case P6 2 2, P61 2 2, P65 2 2, P62 2 2, P64 2 2, and P63 2 2.)

Automatic MR using the whole 1so2 model finds the solution, but the subsequent refinement does not improve the R-factors much, and the maps show re is no density for the domain~2. This is because of the flexibility of the molecule. It is worth noting that a domain in a wrong position is a double error in terms structure factors, and hence in terms of density interpretation. It is not present where it should be, and is present where it should not be.

It is better to solve the structure by parts, searching for one domain then the second. This can be done with MOLREP, Phaser, or Amore, searching for first domain 1 (the larger one), then fixing it to search for domain 2. At this stage we can use the phases of Domain 1 to help the translation search. (This is the default for Phaser and Amore.)

Thus, in this example, there are two steps of structure solution.

  1. Find domain 1.
  2. Refine domain 1.
  3. Fix the refined domain 1, and find domain 2.

For PHASER it is a very straightforward run; name two ensembles and the program will search first for ensemble1 then ensemble2.

For MOLREP, first find the larger domain. (The expected number of monomers must be entered explicitly under "Search Parameters" because MOLREP estimates the number of monomers in the asymmetric unit of unknown crystal structure assuming a solvent content about 50%. As here each domain model is approximately half of the total molecule, the estimate will be wrong.)

Run a second job to find the second domain; On the GUI click 2input fixed model", set the "Model in" as the model for domain 2, and the "Fixed in" as the output of the first run of MOLREP. It may be sensible to improve the signal of the first domain by running some refinement cycles.

For this example, we will also describe a relatively new technique, MOLREP with Spherically Averaged Phased Translation Function (SAPTF) . To run this, first find the larger domain and run some refinement cycles. These will improve the model, and give an output mtz file with ML weighted coefficients FWT/PHWT which generate a 2mFo-DFc map. These are used as input for the SAPTF. GUI details are:

  1. select the option SAPTF + Phased RF + Phased TF
  2. Select use experimental phases
  3. Select as input the mtz file output by REFMAC .
  4. Assign FP to FWT and PH to PHWT
  5. assign Model in as the smaller domain still to be found
  6. assign Fixed in as the refined coordinates for Domain 1. This is used to mask the map generated with FWT/PHWT.
  7. Set Search Parameters again to search for 1 copy of domain 2.

Notes on Spherically Averaged Phased Translation Function

Given some estimates of phases and a model of a homologue protein, Molecular replacement techniques can be used to position the model into the density. The standard approach prescribes the following route:

  1. the conventional patterson based rotation function, which does not use phase information,
  2. phased translation function.
In the Molecular replacement with SAPTF implemented in MOLREP the phase information is used in both steps, but now the order of steps is changed:
  1. SAPTF (phase information is used to find a position in the asymmetric unit where there is sufficient density to fit the model),
  2. phased rotation function about this point,
  3. phased translation function to refine the position of the molecule found in the step (i).
The third way is a 6-dimensional search, which is somewhat slower (hours instead of minutes). Finally, with this example, the fitting of two homologous structures implemented in MOLREP is shown. A specific feature of this fitting is that it does not need a preliminary sequence alignment, and that it fits the largest fragments among those, which are 3d-similar, and ignores the rest of the structure. These features can be used to define domains in the cases where their definition is not obvious.

The fiendish example. pst

pst: Checking the data

The files you will need are in the directory $MR_tutorial/data/pst/

Matthews indicates there are likely to be 4 molecules in the asymmetric unit.

Sfcheck finds a non-crystallographic translation vector.

MOLREP will check for translational NCS and uses this information when performs the translation function. In effect, the search model contains two monomers related by non-crystallographic translation, although technically everything is done in the reciprocal space.

The self-rotation function shows there is a non-crystallographic two-fold axis.

pst: Selecting the model

There is a partial model 1moe with 64% identity over 228 of 286 residues. The last 50 residues are absent from this molecule. See $MR_tutorial/data/pst/pst-1moe.oca for alignment.)

Inspection on the graphics shows there is a flexible hinge at residue 113. The model 1moe forms a dimer with close contacts between domain 1 of chain A and domain 2 of chain B.

Various search models were tried. All were modified using CHAINSAW.

  1. the dimer 1moeAB.pdb .
  2. The fragment moleA_domain1.pdb and 1moeA_domain2. This means we need to fit 8 parts; 4 copies of domain 1 and four of domain2.
  3. The partial dimer moleA_domain1-moleB_domain2.pdb

pst: Finding the solution

The search with the whole dimer fails. This is not surprising since the orientation of the two domains proves to be completely different in this crystal form.

The search for eight fragments is partially successful in PHASER.

Molrep is successful using domain1 of A and domain 2 of B as the search model after fixing both the translational NCS and using the locked rotation function.

In the end the best model is made up of domain 1 of molecule A and domain 2 of molecule B.