CCP4 Proceedings 1997

MAD-DM At Elettra; A Case Study

Harold R. Powell, University of Cambridge

Introduction

MAD is an extremely demanding technique which can yield good phases from high quality crystals and data. However, in combination with DM, usable maps can be obtained from datasets which are little better than average. The present work is intended to show that provided some care is taken in the early stages of the process, it is a straightforward technique which is of particular applicability to oligonucleotide crystallography.

Here I concentrate on the aspects of the technique as I have applied it, treating the problem as a variation on MIR using MLPHARE for heavy atom refinement.

The data for the three structures discussed here were all collected at the new synchrotron in Trieste, Italy, on the protein crystallography beamline 5.2R on visits in February and May 1996; they were the first three MAD datasets that I collected, and among the first to be collected at Elettra.

The beamline at Trieste is well suited to MAD because of the easily tunable X-ray source from ~0.62Å to ~3.1Å [1]. It supplies 10¹² to 10¹³ monochromatic photons per second; although the X-rays are not quite as well focussed as at the ESRF, it is still an extremely bright source, and the reliability and stability are very high.

It is necessary to process the diffraction data as well as possible; small errors can lead to failure of MAD-DM as it uses extremely small differences between Bijvoet pairs, which are expected to be only slightly larger than the errors in the data themselves. Without concentrating on the data processing here, it should nevertheless be remembered that any outliers flagged in the output from scaling should be noted and if the deviations are particularly large, these reflections should be omitted manually from further processing, at least until the heavy atoms have been located; Patterson maps in particular are very sensitive to the presence of rogue reflections. The SCALEIT statistics for the merging R factors of and between datasets should also be examined; if the differences between the datasets are all about the same, then location of heavy atoms is unlikely to be successful by any means.

The majority of the calculations performed in these analyses were carried out with standard CCP4 [2] programs; the data for the first example have been made available as part of a worked example on the CCP4 server. Data reduction from raw images was carried out with Denzo and Scalepack [3] ; processing with other programs (e.g.MOSFLM and SCALA) will yield data of similar quality. The general scheme followed is outlined in Table 1.

Determination of the X-ray Absorption Edge

Oligonucleotides are often available in much lower quantities than proteins, and this is eqspecially true of those species containing anomalous scatterers; also, crystallization is often difficult and thus few crystals are available. However, the monomer nucleotides or even nucleosides are available pure in large quantities, so in these experiments the XRF spectra were obtained for 5-bromo-2'-deoxyuridine and used to determine the appropriate wavelengths for data collection. The chemical environment of the bromine in 5-bromo-2'-deoxyuridine (the nucleoside) is very similar to that in 5-bromo-uracil (the free base) or even in an oligonucleotide containing 5-bromo-2'-deoxyuridine-5'-phosphate, hence XRF spectra obtained from these species are all extremely similar, and in general similar to that in Mark Peterson's in this Report.

Table 1: Flowchart of general procedure

Data Collection

The most important point is that the crystals containing the anomalous scatterer must be of high quality. Small crystals help avoid problems due to absorption; as the DFs are very small, a poor absorption correction could mask completely any effect being exploited.

At the synchrotron, the quality of the optics is paramount; it is essential that not only is the wavelength what you think it is, but also that it can be reliably and repeatedly reselected. The X-rays must be stable for extended periods, both in terms of intensity and wavelength. Small variations can easily accumulate into significant errors.

The advent of cryo-cooling of macromolecular crystals is one of the features that has made MAD-DM data collection reasonably straightforward recently. The ability to collect several complete datasets on a single crystal has increased the chance of success of this method considerably.

Many crystallographers make life more difficult for themselves by not trying the 'oil drop' technique, but instead search for cryoprotectants that may well contribute to increased mosaicity and reduction in data quality. Much of the degradation in crystal quality on freezing is due to surface moisture freezing rather than ice formation in the solvent channels inside the crystal [4] . The oil drop method, because it removes this surface moisture, will in many cases prevent crystal damage; it has never failed for me on either DNA or protein crystals. It has the added advantage that the crystal is coated in a hydrophobic layer, so it does not dry out and can be handled for some minutes outside its sitting or hanging drop.

I prefer to mount the crystal in a random orientation; this is advantageous in that the completeness of the datasets is increased over that obtainable from an aligned crystal. With a stable crystal and stable X-rays, there is little to be gained from the careful alignment of the crystal on an axis. The advantage of measuring Bijvoet pairs close together in time seems to be relatively unimportant, in DNA crystallography at least.

Location of Heavy Atoms

Atomic coordinates for the anomalous scatterers in each example were determined using the direct methods option in SHELXS-96 [5] (F² data from Scalepack were processed with SHELX-PRO [6] to yield anomalous DF values). An example of the results of this strategy for the first sample is in Table 2; it can be seen that this route should be considered as the first choice for heavy atom determination. Direct Methods seem to be more 'robust', and resistant to the presence of outliers in the data than the Patterson method, and give answers in negligible time.

**Table 2**: SHELXS-96 anomalous delta-F results. Crystal 1; Space group I222, so the positions found in each solution are equivalent by space group symmetry.Times are for an SG Indigo2, R4K, 150MHz.
	dataset	x	y	z	CPU (s)
Direct methods	inflexion	0.1908	0.0150	0.1676	15.3
	inflexion	0.1877	0.1562	0.1896	15.3
	white-line	0.8088	0.4842	0.1673	17.0
	white-line	0.8105	0.3441	0.1892	17.0
	high E offset	0.6912	0.0155	0.1678	13.9
	high E offset	0.6887	0.1572	0.1903	13.9
Patterson	inflexion	0.6895	0.5170	0.6679	347.0
	inflexion	0.6859	0.6551	0.6897	347.0
	white-line	0.8094	0.9809	0.8324	224.7
	white-line	0.8160	0.8456	0.8101:	224.7
	high E offset	-	-	-	-

The reliability of direct methods can be judged from several criteria; chief amongst these in my view is that if the same results are obtained from each of the datasets with an anomalous contribution but not from the long wavelength offset, the answer is probably correct. Once (if!) they have failed it may be necessary to calculate Patterson maps, plot Harker sections and interpret these, but in the general case this will not be necessary. In my eagerness to look at electron density, I tend to glance over the SCALEIT statistics while the program output is scrolling past on screen, and only return to it later if difficulties have arisen.

MAD by itself will rarely provide enough phase information to be able to produce interpretable electron density maps; some kind of additional phase extension is usually required in addition. We have used the CCP4 program DM, which applies solvent flattening and histogram matching to the data, and this leads to maps which can be of very high quality.

Structure Solutions

Sample 1

A crystal of the cyclic DNA octamer CAT-BrU-CAT-BrU, which has the 5' and 3' ends joined, was used in this study.

Four datasets were collected , one each at a long wavelength offset, at the inflexion point, the white-line maximum and a short wavelength offset (Table 3). Processing of these data showed that they were reasonably complete, and using Scalepack's 'linear R-factor' and 'square R-factor' as guides, they were of reasonable but not exceptional quality.

Table 3: Data collection statistics for Sample 1.
Sequence CATBrUCATBrU
Crystal System orthorhombic Space Group I222
Cell dimensions a = 22.627 b = 26.002 c = 70.045
Crystal to detector 120mm Frames 60 x 3.0°
Max. resolution ~ 1.5Å
Dataset (Å) 0.8993 0.9198 0.92054 0.9334
Total data 36470 36152 36277 36223
Unique data 3552 3530 3529 3527
Rmerg (1) 0.066 0.071 0.060 0.040
Rmerg (2) 0.058 0.074 0.067 0.048
Completeness (%) 99.2 98.6 98.5 98.6

**Table 3**: Data collection statistics for Sample 1.
Sequence	CATBrUCATBrU
Crystal System	orthorhombic	Space Group	I222
Cell dimensions	a = 22.627 b = 26.002 c = 70.045
Crystal to detector	120mm	Frames	60 x 3.0°
Max. resolution	~ 1.5Å
Dataset (Å)	0.8993	0.9198	0.92054	0.9334
Total data	36470	36152	36277	36223
Unique data	3552	3530	3529	3527
Rmerg (1)	0.066	0.071	0.060	0.040
Rmerg (2)	0.058	0.074	0.067	0.048
Completeness (%)	99.2	98.6	98.5	98.6

Direct methods gave two possible bromine positions (see Table 2), which was expected from the unit cell dimensions and space group.

Heavy atom refinement according to the scheme in Table 1 gave the results in Table 4. It is worth spending a little time looking at the various figure of quality produced. For the Figures of Merit, values greater than 0.6 can be considered encouraging, and if > 0.8, the problem can be considered well on the way to being solved. The Cullis R-factors, which are calculated for each derivative should become smaller for a correct answer; final values of RCull(cen) < 0.9 and RCull(acen) < 0.6 for the white-line maximum and short wavelength offset datasets should be seen as encouraging, and an Rcull(ano) < 0.5 for the datasets with an anomalous contribution seems a good indicator that the correct answer is being approached

Table 4: Selected MLPHARE and DM statistics for Sample 1.
(+x, +y, +z) (-x, -y, -z)
ML-PHARE Totals FoM (ace) 0.8521 0.8524
FoM (cen) 0.6646 0.6717
FoM (all) 0.8164 0.8179
Deriv #1 Cullis R (ace) 0.58 0.57
Cullis R (cen) 0.61 0.60
Cullis R (ano) 0.89 0.89
Deriv #2 Cullis R (ace) 0.85 0.85
Cullis R (cen) 0.87 0.86
Cullis R (ano) 0.30 0.30
Deriv #3 Cullis R (ace) 0.51 0.50
Cullis R (cen) 0.54 0.54
Cullis R (ano) 0.36 0.36
"Native" Cullis R (ace) 1.46 1.46
Cullis R (cen) 1.00 1.00
Cullis R (ano) 0.35 0.34

DM FoM-DM 0.881 0.886
R_free 0.557 0.502
Real Space R_free 0.349 0.202

**Table 4**: Selected MLPHARE and DM statistics for Sample 1.
	(+x, +y, +z)	(-x, -y, -z)
ML-PHARE	Totals	FoM (ace)	0.8521	0.8524
FoM (cen)	0.6646	0.6717
FoM (all)	0.8164	0.8179
Deriv #1	Cullis R (ace)	0.58	0.57
Cullis R (cen)	0.61	0.60
Cullis R (ano)	0.89	0.89
Deriv #2	Cullis R (ace)	0.85	0.85
Cullis R (cen)	0.87	0.86
Cullis R (ano)	0.30	0.30
Deriv #3	Cullis R (ace)	0.51	0.50
Cullis R (cen)	0.54	0.54
Cullis R (ano)	0.36	0.36
"Native"	Cullis R (ace)	1.46	1.46
Cullis R (cen)	1.00	1.00
Cullis R (ano)	0.35	0.34
DM	FoM-DM	0.881	0.886
R_free	0.557	0.502
Real Space R_free	0.349	0.202

Another measure of the correctness of the refinement process can be found by inspection of the refined values of Occ and AOcc (the real and anomalous occupancies), as they should be proportional to delta-f' and f" respectively; even in the best collected datasets, there will be deviations from these relationships which reflect the fact that datasets have not been collected exactly at the inflexion point and whiteline maximum (Table 5). However, as long as the proportions are roughly correct, it is important not to worry too much.

Table 5: Refined occupancies for bromine atoms: Crystal 1:(Occ delta-f', AOcc f")
wavelength (Å) 0.8993 0.91980 0.92054* 0.9331
Occ delta-f' (exp) 5.372 1.279 0 4.489
Br(1) 0.154 0.043 0 0.129
Br(2) 0.186 0.058 0 0.158
AOcc f" (exp) 3.641 3.826 2.167 ~0.5
Br(1) 3.028 3.098 2.998 0.461
Br(2) 3.306 3.670 3.240 0.491

**Table 5**: Refined occupancies for bromine atoms: Crystal 1:(Occ delta-f', AOcc f")
	wavelength (Å)	0.8993	0.91980	0.92054*	0.9331
Occ	delta-f' (exp)	5.372	1.279	0	4.489
Br(1)	0.154	0.043	0	0.129
Br(2)	0.186	0.058	0	0.158
AOcc	f" (exp)	3.641	3.826	2.167	~0.5
Br(1)	3.028	3.098	2.998	0.461
Br(2)	3.306	3.670	3.240	0.491

The main thing to be remembered about the various measures of quality associated with heavy atom refinement is that they are only guides; the best, and only sure way of knowing that the MAD-DM process has been successful is when calculated electron density is studied and model fitting can begin.

DM was run in a more-or-less default mode of solvent flattening with histogram matching; the only required information from the crystallographer is a reasonable estimate of the solvent fraction of the unit cell. The figure that seems most informative from DM is the Real Space Free R; this can give good information on the correct hand of the structure (which cannot be obtained from MLPHARE), and is also a further indication that the whole process has worked. Note that it is only after processing with DM that there is a significant difference between the two hands, and it is apparent in this case that originally the wrong hand was chosen. Phases are also calculated for many reflections unphased in previous steps, and this phase extension is important in being able to calculate electron density.

The phases calculated by DM can be used directly by FFT to produce an F(obs) map which can be viewed on a graphics workstation after suitable translation.

Figure 1: Electron density for sample 1 showing obvious base stacking. Figure 2: Electron density for sample 1 in the region of an A-T base pair.

Sample 2

The second sample was isomorphous with the native structure solved elsewhere. In this case, instead of four datasets, seven were collected; the extra three were collected with wavelengths at -1eV (#5), +1eV (#6) and +2eV (#7) from the measured inflexion point of the nucleotide. This experiment was intended to ensure that we had a dataset as close as possible to the true inflexion point of the oligonucleotide. As it turned out, the real value was between the measured IP and #5.

Table 6: Data collection statistics for Sample 2.
Sequence cyclic CATBrUCATBrU
Crystal System orthorhombic Space Group P2₁2₁2₁
Cell dimensions (Å) a = 22.80 b = 27.86 c = 55.06
Crystal to detector 120mm Frames 35 @ 3° (long wavelength offset 27 @ 3°)
Max. resolution ~1.5 Å
Dataset (Å) 0.92155 0.92079 0.9003 0.9334 0.92162 0.92148 0.92141
Total data 23555 23707 23409 18392 23170 23809 23846
Unique data 9753 9789 9597 8834 9610 9837 9865
Rmerg (1) 0.070 0.069 0.072 0.059 0.064 0.085 0.091
Rmerg (2) 0.083 0.082 0.098 0.073 0.079 0.096 0.108
Completeness (%) 85.0 85.1 83.6 76.8 83.7 85.7 85.7

**Table 6**: Data collection statistics for Sample 2.
Sequence	cyclic CATBrUCATBrU
Crystal System	orthorhombic	Space Group	P2₁2₁2₁
Cell dimensions (Å)	a = 22.80 b = 27.86 c = 55.06
Crystal to detector	120mm	Frames	35 @ 3° (long wavelength offset 27 @ 3°)
Max. resolution	~1.5 Å
Dataset (Å)	0.92155	0.92079	0.9003	0.9334	0.92162	0.92148	0.92141
Total data	23555	23707	23409	18392	23170	23809	23846
Unique data	9753	9789	9597	8834	9610	9837	9865
Rmerg (1)	0.070	0.069	0.072	0.059	0.064	0.085	0.091
Rmerg (2)	0.083	0.082	0.098	0.073	0.079	0.096	0.108
Completeness (%)	85.0	85.1	83.6	76.8	83.7	85.7	85.7

The data collected were not of the same quality as for Crystal 1 (Table 2), but Direct Methods revealed the presence of four heavy atoms in the asymmetric unit, with roughly the same coordinates as those for the four non-base-paired thymine methyl groups in the native.

Table 7: Selected DM statistics for Sample 2.
(+x, +y, +z) (-x, -y, -z)
FoM-DM 0.924 0.922
R_free 0.434 0.440
Real Space R_free 0.269 0.255

**Table 7**: Selected DM statistics for Sample 2.
	(+x, +y, +z)	(-x, -y, -z)
FoM-DM	0.924	0.922
R_free	0.434	0.440
Real Space R_free	0.269	0.255

Examination of an F(obs) map in the region of an A-T base pair (Figure 3) reveals that the electron density is interpretable, but less easily than for sample 1. However, with some work the molecule could be successfully fitted even without prior knowledge of the correct structure.

Figure 3: F(obs) electron density map in the region of an AT base pair for Sample 2.

Sample 3

This work is part of an ongoing project led by Dr Christine Cardin of Reading University, and I was in the fortunate position of helping her in this study. The crystal used was grown by Dr Adrienne Adams of Trinity College, Dublin.

The whole analysis from raw images to first electron density map took about one and a half working days, and only took that long because we took our time over it!.

The data collected appeared comparable at all stages of the processing to those from sample 1.

Direct methods found one heavy atom in the asymmetric unit. Heavy atom refinement proceeded smoothly, and examination of the measures of quality from DM show that there is little to choose between the correct and incorrect hand for this structure. However, note that the Real Space Rfree values for both hands are far worse than for the previous two samples; this should emphasize the point that all the numbers output by the programs should only be taken as guides!.

Electron density in an F(obs) map revealed that the solution from DM with the worse statistics was actually correct. Figure 4 shows the spectactularly good density for the oligonucleotide revealed in the first map calculated; it is not necessary to include a model of the structure to see in Figure 5 the positions of the Br in a BrU-A base pair and most of the base atoms as well.

Figure 4: Electron density for sample 3 showing the base stacking. Figure 5: Electron density for sample 3 in the region of an A-^BrU base pair. The density corresponding to the bromine atom is highlighted in red in each figure.

Table 8: Data Collection Statistics for Sample 3. (* due to crystal decomposition.)
Sequence ACGTACG-BrU
Crystal System tetragonal Space Group P4₃2₁2
Cell dimensions a = 41.991 c = 25.301
Crystal to detector 120mm Frames 30 @ 3° (*15 @ 3°)
Max. resolution ~ 1.6Å
Dataset 0.9344* 0.9216 0.9208 0.9003
Total data 12171 25725 26727 28390
Unique data 4276 5672 5867 6301
Rmerg (1) 0.051 0.071 0.058 0.061
Rmerg (2) 0.074 0.105 0.100 0.096
Completeness (%) 74.3 98.7 98.6 98.2

**Table 8**: Data Collection Statistics for Sample 3. (* due to crystal decomposition.)
Sequence	ACGTACG-BrU
Crystal System	tetragonal	Space Group	P4₃2₁2
Cell dimensions	a = 41.991 c = 25.301
Crystal to detector	120mm	Frames	30 @ 3° (*15 @ 3°)
Max. resolution	~ 1.6Å
Dataset	0.9344*	0.9216	0.9208	0.9003
Total data	12171	25725	26727	28390
Unique data	4276	5672	5867	6301
Rmerg (1)	0.051	0.071	0.058	0.061
Rmerg (2)	0.074	0.105	0.100	0.096
Completeness (%)	74.3	98.7	98.6	98.2

Table 9: Selected DM statistics for Sample 3.
(+x, +y, +z) (-x, -y, -z)
FoM-DM 0.871 0.868
R_free 0.526 0.541
Real Space R_free 0.354 0.399

**Table 9**: Selected DM statistics for Sample 3.
	(+x, +y, +z)	(-x, -y, -z)
FoM-DM	0.871	0.868
R_free	0.526	0.541
Real Space R_free	0.354	0.399

Conclusions

The take-home message from this work is that the facilities to collect data for a MAD-DM experiment and the programs to process these data are available now. MAD-DM is straightforward provided that data are collected carefully from the best available crystals; it is capable of giving excellent electron density which allows rapid and relatively easy structure building. The comparison of F(obs) maps for crystals 1 and 3 shows that it is necessary to examine the electron density rather than rely on the statistics; there can be a marked difference even between apparently similar data.

Acknowledgements

I wish to express my sincere gratitude to the following people and organizations; Eleanor Dodson (York), who guided me through the process of actually getting the phases and improving them; Christine Cardin, Alan Todd (Reading) and Adrienne Adams (Dublin), who provided the ideas and the crystal for sample 3 and much of the labour involved in collecting the data on all three crystals; Stephen Salisbury and Sarah Wilson (CCDC) who provided samples 1 and 2 and also spent sleepless nights at Elettra; the CCDC (for my salary!); University Library, Cambridge (for giving me time off to come to York for this meeting); especially to all the staff at Elettra, who have made collecting data there such a rewarding and interesting experience.

References

[1] see the WWW page http://www.elettra.trieste.it

[2] Collaborative Computational Project Number 4. 1994. "The CCP4 Suite: Programs for Protein Crystallography", Acta Cryst. D50, 760 - 763."

[3] Z. Otwinowski, Denzo and Scalepack, film processing programs for macromolecular crystallography. Yale University, New Haven, 1995.

[4] see, for example, http://www-structure.llnl.gov/Xray/cryo-notes/Cryonotes.html

[5] SHELXS-96, G.M.Sheldrick, Universität Göttingen, 1996

[6] SHELXPRO, G.M.Sheldrick, Universität Göttingen, 1996