The world according to wARP:
improvement and extension of crystallographic phases

Anastassis Perrakis1, Titia K. Sixma1, Keith S. Wilson 2 and Victor S. Lamzin3
1. Netherlands Cancer Institute (NKI), Department of Molecular Carcinogenesis, Plesmanlaan 121,
1066 CX Amsterdam, The Netherlands
2. Protein Structure Group, Dept. of Chemistry, University of York,
Heslington, York YO1 5DD, UK
3. European Molecular Biology Laboratory (EMBL) Hamburg, c/o DESY, Notkestrasse 85,
22603 Hamburg, Germany


Abstract

We have developed procedures for the improvement of crystallographic phases resulting either from the position of a heavy atom within the native molecule, or from a multiple isomorphous replacement experiment.

In the first case the position of a heavy atom as located from native Patterson maps is used as a starting model for least squares or maximum likelihood refinement and iterative model updating in an ARP procedure. Automatic update and completion of the model by ARP, results to maps of excellent quality. Furthermore, the atomic positions of the final ARP model are very accurate and can be used to initiate automatic model building techniques, currently under development.

For the second case, the best initial map is used to construct a number of dummy free atom models which are subjected to ARP refinement. Averaging of the phase sets calculated from the refined models and weighting of structure factors by their similarity to an average vector, results in a phase set that improves and extends the initial phases if the native data set has sufficiently high resolution (beyond ~ 2.4 Å). This procedure allows shortening of the time-consuming step of model building in a lot of crystallographic structure solutions.

NOTE: The ARP program is freely available as part of the CCP4 package. C-shell scripts and the actual averaging program, are available to run wARP. They perform the dummy model building, ARP refinements and final averaging in an automated manner. They are also capable to split jobs in a `parallel' manner to different processors which can be located in different computers over a network, thus minimizing the actual required run time to the one needed for a single ARP job - provided that enough processors are available. The scripts are tested on several Irix 5.3 based clusters, but should be straight-forward to adapt for usage with any Unix based system. A WWW ARP/wARP home page is now available, at http://den.nki.nl/~perrakis/arp.html from where the complete ARP/wARP package can be obtained. A mailing list is also open for questions and discussion for ARP/wARP usage. To subscribe, simply do it through the WWW page or send a mail with one line `subscribe arp-users' to majordomo@linde.nki.nl.


Outline of the (w)ARP method

ARP from single heavy atom model

One single or a few heavy atoms located from the native Patterson synthesis are used as an initial model. Starting from these atoms, a model consisting of only oxygen atoms is slowly created by ARP. This model consists of free atoms that are not subjected to any kind of restraint. One ARP refinement cycle has two parts: (1) unrestrained least-squares minimization or maximum likelihood refinement in reciprocal space, to properly match calculated to observed structure factor amplitudes and (2) substantial modification of the current atomic dummy model in real space, using ARP [1,2]. For the unrestrained refinement step, C-shell scripts have been constructed to employ most currently available programs in the procedure. Standard protocols include PROLSQ [3] and REFMAC [4] from the CCP4 [5] suite. ARP, after each reciprocal space refinement cycle, updates the model mimicking human intervention between refinement cycles. It removes atoms based on the density in the 3Fo-2Fc Fourier synthesis and adds atoms in significant density in the Fo-Fc Fourier synthesis, provided that they are bonded to existing atoms. After several such cycles of ARP, the atoms that are added gradually constitute a model that resembles the protein to a great extend.

From ARP to wARP

The procedure described above requires data to very high resolution to be available and a heavy atom present in the native protein. In most crystallographic projects, however, this is not the case. Since it is very hard if at all possible to provide an ab initio solution to the phase problem in such cases, our effort has been concentrated on improving phases that are available by experimental techniques. Such phase information can be very inaccurate and means of improvement will speed up the efficiency and the quality of model building. ARP needs high resolution data to converge to global minimum during refinement. If such data are not available, the refinement will most likely not converge and inaccuracies are introduced to the `final' model. With wARP we try to overcome this problem by the weighted averaging of structure factors from individual models.

ARP from MIR maps: wARP

The first step in the wARP procedure is the creation of moderately different free atom models in the best available map. The procedure for building a `dummy model' is then invoked as described in the ARP manual. Briefly, starting from a small set of dummy atoms placed anywhere in the protein region, a model is slowly expanded by the stepwise addition of atoms that are in bonding distances with existing atoms and in significant density in the electron density map exists for their placement. Six such models are created, using slightly different ARP building protocols, which are used for all subsequent steps. Next, these models are subjected to ARP refinement. Due to the limited amount of diffraction data, they will presumably at the end contain different errors, which by the averaging procedure will be canceled out.

Structure factors are calculated for all models after refinement and scaled to observed amplitudes. A vector average of the calculated structure factors from the different refined models is then calculated. The phase of the vector average is remarkably better than those calculated from any of the individual runs. Subsequently, a weighting scheme is applied to enhance the overall quality of phases, depending on the variance of the individual structure factors around the average.


Examples

ARP from single heavy atom model

Rubredoxin

Rubredoxin is a small protein of 51 residues, the first protein to be refined to atomic resolution [6]. It contains a Fe atom which is coordinated by 4 Cys residues. The position of the Fe atom and the 4 sulfurs of the cysteines side chains can be located from the native Patterson map, if data better than 1.5 Å resolution are available. A high resolution data set of rubredoxin (0.92 Å) was used. In all cases that lower resolution is quoted, that means that the data were simply truncated at that resolution limit.

The starting model for the ARP refinement procedure was initially the Fe atom with the coordinates calculated from the native Patterson map. After 80 cycles of least squares refinement, or 30 cycles of maximum likelihood refinement a complete model was available. The map correlation coefficient [7] improved from only 26 % to more than 90 % in both cases. The lowest resolution at which the method works, starting from the Fe atom alone, is 1.1 Å. However, if the positions for the four sulfur atoms are included, the method can work with 1.4 Å data, in other words with less than one third of observed reflections at 0.92 Å. In that case, the correlation coefficient with the final map is 96%, because the correct atomic types are used for the four sulfur atoms. If we try to use data to 1.5 Å resolution, there is a small increase in correlation coefficient to 45%, but after that no improvement could be achieved. Protocols involving the use of E-maps and the wARP averaging described below are being tested to extend the use of method to much lower maximal resolution. It is of interest to note, that the atomic positions of atoms placed by ARP are these of atoms in the final model, with slight variation, Figure 1. It would be thus feasible to use them to initiate automatic model building techniques to minimize the amount of time spent in traditional model building and the errors introduced by this procedure. Characteristic parts of the maps before and after the ARP procedure are shown in Figures 2 and 3.


Figure 1
Positions of the ARP atoms (left) and of the atoms in the final model (right), for a representative region of the protein. You are welcome to try the `join the correct dots' game in the left panel of the figure ...



Figure 2
Stereo figures of the area of the map around the starting Fe atom. In the initial map (top) resulting from the phases calculating from Fe atom position alone, a big bolb of density is representing the ion. Although there is density for the four sulfurs of the cysteines coordinating the Fe ion, it is hardly interpretable. After ARP the atomic positions are clearly visible and the map is of excellent quality.



Figure 3
Stereo figures of one area of the map far from the starting Fe atom. In the initial map (top), although density for some of the Tyr atoms is present, the Tyr residue is in practice not recognizable. After ARP refinement the Tyr main and side chains are clearly recognizable.


ARP from MIR maps: wARP
Leishmanolysin


The structure of the Leishmania virus coat protein (Leishmanolysin, PSP) was solved with a complicated protocol involving the use of SIRAS phases for two different crystal forms, averaging between those, solvent flattening and density skeletonization (unpublished data were kindly provided by Dr. Peter Metcalf). For the wARP test one set of SIRAS phases was used, which extends to a resolution of 3.0 Å. These phases were determined for the first crystal form for which native data extending to 2.5 Å were used for solvent flattening and phase extension with the DM program [16], CCP4. This density modified map was used to build the initial models with ARP. The ARP unrestrained refinement was performed against a higher resolution native data set from a frozen crystal (2.0 Å). REFMAC maximum likelihood minimization was used with ARP. All of these models gave maps of dramatically better quality than the solvent flattened map. Here the power of ARP procedure itself is large because the resolution of the native data is good. The wARP procedure resulted in a small but significant additional improvement. Statistics on phase improvement are in Figure 4 and a representative part of the map at Figure 5.
Figure 5
Representative regions of the solvent flattened (a,c) and equivalent wARP averaged maps (b,d) for Leishmanolysin, shown in stereo.



Chitinase A


The chitinase A structure from Serratia marsescens (ChiA) was initially solved by MIRAS [8]; with one only derivative contributing to better resolution than 5.0 Å [ref]. The MIRAS map (2.5 Å) was solvent flattened with the procedures implemented in the PHASES package [15]. Model building was not straightforward and much time was spent in tracing the protein chain. In the wARP procedure the solvent flattened map was used to initiate building of dummy models. PROLSQ least squares minimization against the native 2.3 Å data was used with ARP. Refinement of the models resulted in crystallographic R factors ranging between 20.1 % and 22.4 %. All of the ARP refined models gave phases same or worse than the phases already available by solvent flattening, due to the limited resolution of the native data, Figure 1a. At that case, where limited resolution if the data prevent convergence of the refinement, the wARP averaging procedure results in a much further improved map, comparable to the improvement achievable with higher resolution data. The phase improvement in resolution shells, for all phase sets, is analytically shown in Figure 6. A characteristic region of the map is shown in Figure 7.

Figure 7
Representative regions of the solvent flattened (a,c) and equivalent wARP averaged maps (b,d) for Chitinase A, shown in stereo.

Applicability and requirements

ARP for ab initio structure solution: Capabilities and limitations

Ab initio methods in protein crystallography have only recently been successfully applied, the most characteristic examples being the structure solution of crambin [9] by direct methods and cytochrome c6 [10] by Patterson expansion methods. The limitation for succesful application of ab initio methods is the resolution of the diffraction data. Allthouh our current example, rubredoxin, can be solved easily by any relevant procedure if atomic resolution data are available, these methods fail if data worse than ~ 1.2 Å are available. With ARP we managed to produce an excellent map and an atomic model, with only 1.4 Å data, ie with essentially ~ 60 % of the reflections. Many more proteins diffract to resolution around 1.5 Å than 1.2 Å, according to the data on projects recently collected at EMBL Hamburg synchrotron X-rays facilities. Furthermore, we believe that we will be able to extend that limit in the near future, possibly with the application of wARP averaging.

Resolution requirements and use of different refinement methods for wARP

In contrast to most density modification methods the wARP procedure is extremely sensitive to the resolution of observed data in the native dataset. This is due to the limitations of the unrestrained refinement step, which requires that the observations/parameters ratio is more than 1.5 for convergence to a minimum. It is crucial to realise, that the real limitation can not be expressed solely in resolution terms, but better as observations/parameters ratio, which is largely dependent on solvent content. Thus, for a crystal with high solvent content 2.5 Å data will be sufficient while for a crystal with low solvent content data to 2.0 Å resolution must be available. Obviously the collected data must be of good quality, as can be judged by Rmerge, I/(I), and completeness. The success of refinement can be easily assessed by the crystallographic R factor.

Our experience shows that if the ratio of the number of reflections in the dataset to refined atomic parameters (four parameters per atom, x,y,z,B) is more than 2.0 (resolution ~ 2.0 Å) then use of maximum likelihood refinement as implemented in REFMAC can be used very effectively, as shown in Leishmanolysin. If the observations to parameters ratio drops below 2.0 traditional least squares refinement as implemented in PROLSQ produce better results, as shown for ChiA. When the observations to parameters ratio drops below 1.5 the method does not work.

Applicability of the averaging method

The averaging method we describe has also been succesfully used in our laboratory to combine maps obtained by different phasing techniques. We have used MIR phase sets determined for `cold' and `warm' native datasets, different solvent flattening protocols and partial model phase sets, to combine them with the wARP procedure. The resulting map appears to be of substantially better quality. Unfortunately, this project is still under refinement and we can not quote the exact phase improvement figures. Furtermore, it is not a usual case to obtain many phase sets, with different sources of errors. Also, other more standard and theoretically sound procedures are developed for standard phase combination. Thus, we will not treat it as a test case, allthough potential users that think this procedure might be applicable in their particular cases are encouraged to inquire after this possibility with us.

References

1. Lamzin, V.S. & Wilson, K.S. (1993) Automated refinement of protein models. Acta Crystallogr. D49, 129-147.

2. Lamzin, V.S. & Wilson, K.S. (1996) Automated refinement for protein crystallography. In Methods Enzymol.: Macromolecular Crystallography. (Carter, C.M. & Sweet, R.M. Eds.) in the press

3. Konnert, J.H. & Hendrickson, W.A. (1980) A restrained-parameter thermal-factor refinement procedure. Acta Crystallogr. A36, 344-350.

4. Murshudov, G., Vagin, A. and Dodson, E (1996) Application of maximum likelihood refinement. In The refinement of protein structures Proceedings of Daresbury study weekend

5. CCP4 (1994) Collaborative Computational Project Number 4. The CCP4 suite: programs for protein crystallography. Acta Crystallogr. D50, 760-763.

6. Dauter, Z., Sieker, L.C. & Wilson, K.S. (1992) Refinement of rubredoxin from Desulfovibrio vulgaris at 1.0 Å with and without restraints. Acta Crystallogr. B48, 42-59.

7. Watenpaugh, K.D., Sieker, L.C. & Jensen, L.H. (1980) Crystallographic refinement of rubredoxin at 1.2 Å resolution. J. Mol. Biol. 138, 615-633.

8. Lunin, V.Y. & Woolfson, M.M. (1993) Mean phase error and the map correlation coefficient. Acta Crystallogr. D49, 530-533,

9. Cowtan, K. (1994), Joint CCP4 and ESF-EACBM Newsletter on Protein Crystallography, 31, 34-38

10. Perrakis, A.,et al and Vorgias, C.E. (1994) Crystal structure of a bacterial chitinase at 2.3 Å resolution. Structure 2, 1169-1180.

11. Furey, W & Swaminathan, S.(1990). PHASES - a program package for the processing and analysis of diffraction data from macromolecules. American Crystallographic Association Meeting Abstracts, 18, 73

12. Weeks, C.M., Hauptman, H.A., Smith, G.D., Blessing, R.H., Teeter, M.M. & Miller, R. (1995) Crambin: a direct solution for 400-atom structure. Acta Crystallogr. D51, 33-38.

13. Frazao, C., Soares, C.M., Carrondo, M.A., Pohl, E., Dauter, Z., Wilson, K.S., Hervas, M., Navarro, J.A., De la Rose, M.A. & Sheldrick, G.M. (1995) Ab initio determination of the crystal structure of cytochrome c6 and comparison with plastocyanin. Structure 3,