Multiwavelength anomalous dispersion phasing strategies
investigated with a brominated oligonucleotide.

Mark R. Peterson
Structural Chemistry Section, Department of Chemistry,
University of Manchester, Oxford Road, Manchester,
M13 9PL, England, U.K.
(Current Address: Wellcome Sciences Institute, Department of Biochemistry,
University of Dundee, Dundee, DDI 4HN, Scotland, U.K.)

Abstract

Multiwavelength anomalous dispersion methods were used to analyse the crystal structure of d(CGCG^BrCG) in extension of the work presented in Peterson, Harrop, McSweeney, Leonard, Thompson. Hunter and Helliwell (1996) J. Synch. Rad. 3, 24- 34. The brominated oligonucleotide d(CGCG^BrCG) of chemical formula crystallises in space group with unit cell dimensions a=11.97, b=30.98, c=44.85 Å, . It was chosen as a test crystal to evaluate the MAD method itself and to commission station PX9.5 for several reasons; it was radiation insensitive; it had a very good concentration of anomalous scatterers, i.e. two bromines in two hundred and forty light atoms; and the bromine K edge was very near to the critical wavelength flux output of the SRS wiggler. It also diffracted strongly, due to the relatively small unit cell, in spite of the rather small crystal volume. Data to a resolution of 1.65 Å were collected at four wavelengths about the bromine atom K absorption edge using synchrotron radiation at Station PX9.5, SRS, Daresbury. Traditionally, the maximum of f " is not coincident with the minimum in f ', however, in this case both are observed on the same data set, . Hence and could be maximised using only two wavelengths. Various wavelength combinations phasing strategies were then studied, ranging from 4 to 2 wavelengths. DM phase improvement procedures were also employed on these combinations giving highly interpretable maps even for unoptimised 2 wavelength cases.

Data Collection

Data collection was conducted at Station PX9.5 at the Synchrotron Radiation Source (SRS) in Daresbury, England. The single crystal selected for the data collection had a pseudo- hexagonal plate morphology of dimensions 0.2 X 0.1 X 0.01 mm. As the anomalous scattering factors are derived from the atomic absorption coefficient, a XANES (X- ray absorption near edge structure) experiment was also carried out on station PX9.5 to decide upon the precise wavelengths to be used in the data collection. The

for the beam was set at 4 4 x 10^-

4, by restricting the vertical divergence of the beam by a factor of two with the use of slits upstream of the focussing mirror.

Upon inspection of a test diffraction image, it could be seen that the crystal was relatively well aligned, i.e. the Bijvoet mates could be measured on the same or adjacent images. No further crystal alignment was undertaken. The wavelengths for the diffraction measurements were chosen to optimise the phasing power by (a) maximising the f " effect and (b) for different wavelengths for each hkl. Hence, four wavelengths were chosen: (1) a reference on the long wavelength side of the edge ; (2) at the absorption edge inflection point ; (3) at the "white line" absorption maximum ; (4) a reference on the short wavelength side of the edge . The choices of and follow what are known as f ' dip and f " max respectively.

For each wavelength the crystallographic data were collected, each involving a 4^o rotation of the crystal. For each 4^o sweep the total exposure time was 60 seconds. In total 120^o of data were collected for each of the four wavelengths. Another 120^o, a fifth data set was then collected on the same crystal, immediately after the MAD data, at the "white line" (i.e. ) but with the crystal misaligned by offsetting one of the goniometer head arcs by approximately 30^o. This allowed reflections previously in the blind region to be measured and combined with the data set.

Merging statistics from the five data sets are displayed in Table 1.

The weak and negative intensities were made consistent with a Wilson distribution of structure factor amplitudes using TRUNCATE (CCP4). The computer programs CAD and SCALEIT (CCP4) were employed to combine the five data sets into one file and to put them on an overall common scale. This was done with respect to , it was treated as the 'native'. It was indeed found that had the largest MFID between all other data sets.

SCALEIT provides useful estimates of the largest acceptable dispersive and absorptive differences between and within the different data sets. Due to the sensitivity of Patterson methods to spurious, large, differences it was important to reject any unacceptably large differences as outliers. The final SCALEIT statistics are shown in Table 2.

Dispersive and absorptive Patterson maps were then generated with FFT. Identification of the bromine sites could be readily found using both the anomalous and dispersive Patterson maps. From the three Harker sections in both maps, two consistent bromine sites could be easily found. The quality of these Patterson maps can be seen in Figures 1. and 2. The positions of the two bromine sites were 0.3241, 0.2009, 0.0100 (Site A) and 0.5010, 0 1807, 0.2310 (Site B) respectively.

Phase calculations

Each bromine atom had its co- ordinates, temperature factors and occupancies (both real and anomalous) refined in MLPHARE (CCP4) for ten cycles. The refined positions of the two bromine atoms were used in MLPHARE on both hands . MLPHARE also treats the data sets collected at different wavelengths as isomorphous derivatives with one data set being chosen as the 'native'. To maintain a consistent positive dispersive difference between the other data sets, the f ' dip data set ( ) was chosen as the native. Dispersive differences between and the other data sets give rise to isomorphous differences, especially and with respect to which were treated as apparent real occupancies of the anomalous scatterers. For the 'native' data set ( ) the real occupancies of the anomalous scatterers were fixed to zero initially. The figures of merit of the MAD phases, using all four wavelengths (excluding the data set), were 0.86/0.82 to 1.65Å resolution for the acentric/centric data respectively for both hands. The f ' and f " anomalous scattering factors were added to the form factor list, both being arbitrarily set equal to one electron so that the real and anomalous occupancies corresponded to the number of electrons involved in the dispersive and absorptive differences respectively, as the data sets were on a common absolute scale previously via SCALEIT and TRUNCATE. Table 3 gives the relevant phasing statistics for each derivative against the native ( ) and also compares the theoretical values of the anomalous coefficients f ' and f " (Sasaki, 1989) at each wavelength with the coefficients extracted at each wavelength via the occupancies in MLPHARE.

The phases from MLPHARE were then combined with the structure factor amplitudes from the , native data set, enabling a MAD electron density map was calculated via FFT (CCP4). The MAD maps were calculated on both hands (Figs. 3 (a) and (b)) at 1.65 Å resolution. The figures of merit for both sets of phases do not distinguish between correct and incorrect enantiomers. The problem is only resolved upon inspection of the MAD electron density maps for "chemical sense". That is the map calculated on the correct hand (Fig. 3(a)) showed the bases clearly and building of the model with O (Jones et al (1989)) could be easily started from the known heavy atom positions. The map calculated on the wrong hand was totally uninterpretable (Fig. 3 (b)).

Key observations on the MAD Work.

In the variation of f ' and f " with wavelength, only two wavelengths need to be measured to yield a at one wavelength and a change via of F; between the two wavelengths (Okaya and Pepinsky (1956); Hoppe and Jakubowski (1975); and Helliwell (1979)). The choice of wavelengths to maximise and was made with reference to the fluorescence spectrum. A key objective is to make the centres of the phasing circles in the Harker phasing diagram well separated and non- collinear; which is a necessary and sufficient condition for phasing (Helliwell (1984)). Traditionally, the maximum of f " is not coincident with the minimum in f '. Hence, three wavelengths would be needed in such a situation for fully moving the centres of the phasing circles apart. In this study however, although was expected to have the largest Friedel anomalous difference, in fact that was the case for the (f ' dip) data set (e.g. see Ranom values in Table 1). In light of being the f " maximum, was taken as 'native' to confirm if was indeed at the f ' dip. This was done by comparing MFID's between data sets where then are taken as the 'native' data sets. It was indeed found that had the largest MFID between all other data sets (see Table 2). In such a case then, where both the f " maximum and the f ' minimum case are both observed on the same data set, i.e. , one data set becomes essentially redundant i.e. in making the biggest anomalous differences. Hence, various alternative strategies of combinations were investigated.

Phase Information and Electron Density Map Quality from Various Wavelength Combinations.

The following analysis can essentially be split up into three categories involving data sets recorded at: respectively four, three, and two wavelengths in a variety of combinations to explore both experimental strategies for phasing and theoretical/computational strategies of phase improvement (See Figure 4 and Table 4 for respective map quality and FOM's). The experimental strategies were published in Peterson et al. (1996).

Case 1:

This combination of wavelengths is the case described previously where the f " anomalous effects of each wavelength are all utilised along with the isomorphous effects between and each of the other three wavelengths. The map was of excellent quality and structural moieties could be easily characterised.

Case 2:

This three wavelength case, and the next, is to compare the two possible choices of reference wavelength. Sometimes, due to lack of SR beam time and/or prolonged exposure times, it may be only feasible to collect data at three wavelengths. The reference wavelength, has no anomalous signal as it is situated on the long wavelength side of the Br K edge. The map, however, was of excellent quality and could be easily characterised.

Case 3:

The reference wavelength, has a good anomalous signal as it is situated on the short wavelength side of the absorption edge, unlike . The overall figure of merit was certainly improved compared with case 2. The map was again of excellent quality and could be easily characterised.

Case 4:

The theoretical minimum case for unique phase determination involves two wavelengths. This is akin to the 'two- short- wavelength- method' of Hoppe and Jakubowski (1975). It is required that the centres of the phasing circles be well separated and non- collinear and this is achieved well here (Helliwell (1984)). The pairing has the largest dispersive difference, whilst, also has the maximum Friedel difference. The electron density map was of high quality and totally interpretable.

Case 5:

This combination of wavelengths stimulated by the correspondence from D. H. Templeton, was used to see if the map could be phased with two extremely close wavelengths (i.e. only 0.0007A apart!) that might be adversely affected by dichroism effects. Also the pairing has half the dispersive signal compared to the theoretical minimum, case 4, . However, has the largest anomalous difference whereas has the next largest anomalous difference..

Density Modification Procedures for Improvement of Phase Ouality.

The principle of density modification (DM) is to improve the experimental phases by imposing restrictions on the density in real space and then using the phases of the modified map to alter or replace the experimental phases. In protein crystallography these are important methods for phase improvement. Moreover they may be applied so as to reduce the number of wavelengths needed in a MAD phase determination experiment and/or use wavelengths very close in value, but with reduced (less optimal) values of f " or . The map modification process embroided in the program DM (Cowtan ( 1994)) was used on the various wavelength phasing combinations.

Case 1: Density Modified

The quality of the original map was very good, however, DM improved the map quality around all the bases. All bases now had well defined, complete electron density apart from base 7 which still had a lack of connectivity at one bond.

Case 2: Density Modified

Seven bases (1, 3, 8, 9, 10, 11 and 12) that had incomplete density (side chains missing or lack of connectivity) originally, sufficiently improved to now show well resolved connected density. The remainder of the bases, which had previously suffered from a lack of connectivity, were still not significantly altered.

Case 3: Density Modified

Eight bases (1, 3, 4, 8, 9, 10, 11, and 12) that had incomplete density (side chains missing or lack of connectivity) originally, sufficiently improved via to now show well resolved connected density. The remainder of the bases which suffered from a lack of connectivity were not significantly altered.

Case 4: Density Modified

Eight bases (3, 4, 6, 8, 9, 10, 11 and 12) which were defined by density with a lack of connectivity at a least one bond now showed well defined connected density after DM. The remaining four bases showed a clear improvement in density quality, e.g. base 1 now has the nitrogenous side chain defined.

Case 5: Density Modified

The original map had most structural moieties in the correct position. DM further increased the map quality considerably, so much so that all the bases are easily characterised. Bases 3, 4, 5, 10 and 11 now had well defined connected density compared to the lack of connectivity experienced in the original map at these positions. Bases 1, 6, 9 and 12 showed improved density, whereas bases 7, 8 were still interpretable, but were slightly better defined in the original map. Base 2 showed no significant change in density. As might be expected this modified map was not of a high quality as compared to modified case 4.

Discussion and Concluding Remarks.

alone yields the largest f " value, as expected from theory, if not the Kronig- Kramers transform curve. Hence, the choice of two wavelengths, a reference wavelength, or with , whilst being the theoretical minimum number of wavelengths, also yielded the biggest and f " differences in the diffraction data. The use of 2- 's may be of interest when the concentration of anomalous scatterers is high in the system, and when a three or four wavelength data set collection strategy is not favourable (e.g. due to restricted beam time, and long exposure times per diffraction image are needed).

Density modification was then considered for the various wavelength scenarios. There is a special interest in the two wavelength cases which simplify the experimental and beamline needs. Key points are further discussed now. The already good map quality in the phasing combination was reinforced further after the DM procedure and structure solution became even easier. The isomorphous difference between data sets and is half that of the previous two cases mentioned above, 3.68 electrons, but this is generated by a change in wavelength of only 0.0007Å! The advantage of this is that beam position incident onto the sample would be essentially identical for the two wavelengths. The original phases and

¹ Compared with Peterson et al. 1996 the total number of reflections are now 2399, 636, and 3035 throughout. In the previous publication a coding error in CCP4 MLPHARE had lead to the rejection of some especially large -ve . This coding error has been rectified in a new release of the program. There was no visible impact of this error on the map quality and comparisons no impact on the figures of merit values of the reflections that were phased, and which also constituted a large fraction of the total available in any case.

map were of only reasonable quality before DM procedures. The DM phases produced a highly interpretable map in which the structure could be easily solved. Structure solution can then even be obtained when the isomorphous signal was not optimised, due to these modification procedures. Overall, DM could perhaps be further enhanced if the electron density 'data bank' used for histogram matching actually consisted of nucleic acid density instead of protein density (which had to be used here). In essence, a key result, cases 1 to 4 become equally comparable in terms of FOM's of the phases after DM.

In Peterson et al. (1996), it was reasoned that dichroism effects were not evident in the f ' and f " values, in essence because the maximum induced f " and differences were induced with respect to in agreement with theory but somewhat unexpected. However, it was pointed out by David Templeton (pers comm), that for the two independent Br sites (A and B) in the crystallographic asymmetric unit, there did appear to be a variation between the two sites f', and f " values which had a maximum at . Hence, at the effect of different atomic environments of the A and B sites might explain this, in a similar way to the previously reported bromide example of Templeton and Templeton (1995), in which there was a very marked edge shift, on edge, for the parallel and perpendicular polarisation components of 0.00031Å (estimated from figure 3 of that paper). Therefore, the , pair in the analysis would be the most to suffer if dichroism were present to a large degree. Since Figure 5 (b) shows good quality phasing and electron density map quality, it can be concluded that dichroism was not a major factor in the f ', f " values that we have encountered. Nevertheless further experiments are planned to explore the values of f ', and f " at finer sampling and for dichroism which must be present to some degree.

In summary, this work successfully evaluated and compared a variety of MAD experimental and computational procedures for phase improvement. It provides guidance in planning future experiments and/or new instruments, and is therefore a significant contribution to the methods of protein crystal structure determination. Aspects of the work are published in Peterson et al. (1996).

Acknowledgements

Thanks for discussions with J. R. Helliwell, W. N. Hunter and G. A. Leonard. Thanks also to S. J. Harrop and S. M. McSweeney for data collection assistance at Station 9.5 SRS, Daresbury. Correspondence on possible dichroism in the f' and f " values at and was between D. H. Templeton and J. R. Helliwell.

References

CCP4 (1994) Acta Cryst. D50, 760- 763.

Sasaki, S. (1989) KEK Report 88- 14, Tsukuba 305, Japan.

Okaya, Y. and Pepinsky, R. (1955) Phys. Rev. 98, 1857- 58.

Hoppe, W. and Jakubowski, U. (1975) In Anomalous Scattering, 437- 61.

Helliwell, J. R. (1979) Daresbury study weekend, DL/SCI/R13,1- 6.

Helliwell, J. R. (1984) Reports on Progress in Physics 47, 1403- 1409.

Peterson, M. R. et al. (1996) J. Synch. Rad. 3, 24- 34.

Cowtan, K. (1994) Newsletter on protein crystallography, 31, 34- 38.

Templeton, D. and Templeton, L. (1995) J. Synch. Rad. 2, 31- 35.