------------ CCP4 Newsletter - January 1997 ------------

Back to Contents....

# AUTO-INDEXING OSCILLATION IMAGES USING A PATTERSON FUNCTION

John W. Campbell, CCLRC Daresbury Laboratory

## INTRODUCTION

The possibility of using a three dimensional FFT as the basis of an auto-indexing method was mentioned to me in an informal discussion with Phil Evans. This article describes some tests carried out to examine how useful such a procedure might be as the basis of an auto-indexing or crystal orientation determination procedure for monochromatic protein oscillation images. The trials described focussed on the use of a single oscillation image for the orientation determination. The trials were carried out using both simulated and real data.

List of sections:

## BASIS OF THE METHOD

The basis of the method is simply that the Fourier Transform can provide a direct transformation between a reciprocal lattice (which may be determined experimentally) and a real space set of vectors from which the cell vectors and the cell orientation may be found. In theory this should provide a more direct route to the required information than the more usual procedures involving the determination and clustering of difference vectors in reciprocal space (see refs. 2, 3 & 4).

To carry out the procedure, a fine three dimensional orthogonal reciprocal lattice grid (parallel to a set of laboratory axes) is set up. A set of reciprocal lattice points is determined from the observed diffraction data. For each of these reciprocal lattice points, an 'intensity' value of 1.0 is assigned for the nearest grid point of the othrogonal lattice. All other points of this orthogonal lattice are assigned intensities of zero. A three dimensional Patterson function is then computed from the orthogonal lattice using an FFT program and the resulting Patterson function is searched for the vectors to determine the cell and its orientation.

## POTENTIAL PROBLEMS

A number of problem areas are obvious. In the first instance, it is necessary to carry out the calculations using a 'fine' grid for the 3-dimensional FFT as the finer the grid, the less will be the rounding errors is assigning the observed reciprocal lattice points to the nearest FFT grid points. How fine this grid needs to be in practice is probably one of the main factors determining whether the method has practical value using today's computers.

Assuming a sufficiently fine grid can be used, the following problem areas still remain.

• The very sparse sampling of the reciprocal lattice.

• The asymmetry of the sampling of the reciprocal lattice.

• The errors introduced by the fact that a finite oscillation angle is used when recording the image.

• Errors in the measurement of the primary beam position, the spot positions and the crystal to image distance.
The latter two categories of error, of course, affect any method using oscillation images.

## INITIAL TRIALS

Some initial trials were carried out in one and two dimesions using grids from 2048 down to 512 points with a data resolution of 1.0 Angstroms and cell parameters up to about 85 Angstroms. Typically a random sample of 5% of the precicted reciprocal lattice points data was included in the FFT calculation. As the results seemed promising, the trials were extended into three dimensions. The first trials in three dimensions were carried out using a grid of 512x512x512 points. Although again they looked promising, the computation time for such a 3-D FFT was taking around 13 minutes of cpu time with an elapsed time of around 45 minutes on an Indy workstation. This basically suggested that the method was unlikely to be practical unless a coarser grid could be used and a grid of 256x256x256 was then tried. This reduced the cpu requirement to less than 2 minutes and it was this grid that was used in all the subsequent examples unless otherwise stated. Such computaion time could probably be significantly reduced by resticting the number of grid divisions grid to a power of 2 and making more use of the fact that the input data are very sparse.

## PROGRAMS USED

The main programs used in the trials were the FFT and PEAKMAX programs from the CCP4 program suite and IMSTILLS from the MOSFLM suite. A series of jiffy programs were also written and used.

Special versions of the routines OUT_SPOTS

These were used in building temporary versions of the ROTGEN program to create some simulated data for trying out the method.

PREP_MTZOSC

Prepare an MTZ file for input to the FFT program from reciprocal lattice coordinates written from a version of ROTGEN with a special version of OUT_SPOTS included.

STILLS_MTZ

Prepare an MTZ file for input to the FFT program from reciprocal lattice points calculated from spot positions measured using IMSTILLS.

PVEC

Analyse selected peaks as output from PEAKMAX.

PVEC_REFN

Analyse selected peaks as output from PEAKMAX and carry out some limited refinement.

CELL_VEC

Find vectors within given distance distance and angular criteria from peaks output by PEAKMAX.

FIND_ORIENT

Output cell parameters and missetting angles from a set of three selected cell vectors.

## THE FFT GRID

As indicated above, almost all the trials were carried out using a grid of 256x256x256 for the FFT calculations. This meets the following requirements:

• It is about the finest grid which could currently be envisaged as the basis for a practical method.

• The grid is a power of two which would be most beneficial for future optimisation.

• It matches reasonably for the size of cell envisaged provided that reasonably low resolution data are used for the indexing - see further below.
The grid used for the FFT and its relationship to the crystal reciprocal lattice for data of a given resolution are illustrated in two dimensions in the following diagram.

Figure 1 Choice of FFT Grid for Crystal Data of a Given Resolution

Let npt be the number of divisions in each dimension of the grid and res be the resolution of the data to be used, then the cell size for the 3-D Patterson function to give vectors of the correct dimensions is given by:

```   FFT cell size = npt*res/2.0
```
The grid spacing rlgrid is given by:
```   rlgrid = 2.0*alam/(res*npt)
```
where alam is the wavelength at which the data are collected.

The FFT grid corresponds to a index range of (-npt/2 + 1) to (npt/2 - 1)

The conversion of a reciprocal lattice point (dimensionless) to the nearest indices (hkl) for the FFT input data is given by:

```   h = NINT (x/rlgrid)
k = NINT (y/rlgrid)
l = NINT (z/rlgrid)
```
where x,y,z are the dimensionless reciprocal lattice coordinates for an observed reflection.

It is probably desirable when it comes to searching for vectors in the 3-D patterson function that the primary cell vectors should be less than half the FFT cell dimension in length. If the FFT grid is to be restricted to 256, then this means that the resolution of the data to be included in the calculation will need to be restricted to give a sufficiently large FFT cell based on the expressions given above. For a cell of 200 Angstroms the data input would need to be restricted to a resolution of say 3.5 Angstroms whilst for a cell of 100 Angstroms, 2.0 Angstrom data would be suitable.

## STEPS FOLLOWED

The calculations were carried out in the following stages:

Find spots or simulate data

For real data, find spot positions from the oscillation image using the program IMSTILLS.

For simulated data, use a temporary version of the program ROTGEN with a modified spots output routine linked in.

Create an MTZ file

Calculate the nearest grid points to the reciprocal lattice coordinates computed from the spot positions data (or read in for simulated trials) and output this data as an MTZ reflection file for input to the FFT program. For real data, the program STILLS_MTZ is used and for simulated data, the program PREP_MTZOSC is used.

Calculate a 3-D Patterson Function

A three dimensional Patterson function in space group P1bar is calculated using the FFT program. The map is scaled to give an origin peak of height 100.0.

Find peaks in the Patterson

The Patterson map is searched for peaks using the PEAKMAX program. In most cases a threshold of 30.0 (i.e. just under 1/3 of the origin peak) was used for the search.

Analyse the vectors Found

This is currently done with a number of jiffy program.

PVEC and PVEC_REFN select peaks above a requested threshold, find the vector lengths within a requested range and output details of the vectors selected and the angles between them. The PVEC_REFN attempts some preliminary refinement of these vectors using the reciprocal space data using a general least squares routine or a method described by Clegg for refining 't' vectors (ref 1.).

CELL_VEC allows for a search of any solutions for which each of three vectors are within given ranges and are separated by requested minimum angles.

Determine the Orientation

The program FIND_ORIENT can be used to determine the cell parameters and crystal setting from three vectors selected manually from the output of the PVEC_REFN program. This data can be fed into ROTGEN to check whether the correct orientation has basically been found.

## YEAST PGM - SIMULATED AND REAL DATA

Two simulated data trials were carried out for a crystal of the enzyme Yeast Phosphoglycerate Mutase (PGM) (Monoclinic, C2, a=96.2 b=85.8 c=81.9 beta=120.5). In the first of these, the ideal three dimensional reciprocal lattice coordinates were calculated (using a modified version of ROTGEN) for the reflections which would occur on a two degree oscillation image. In the second case data were calculated for a 0.75 degree oscillation image again using a modified version of ROTGEN. In this case the reciprocal lattice points were calculated from the predicted spot positions on the detector thus including the unavoidable source of error due to the use of a finite oscillation angle. The procedure was then applied to the first real set of data, a 2.0 degree oscillation PGM image recorded on image plate (small MAR). All three sets corresponded to the same orientation of the crystal. A crystal to image distance of 250.0 mm was used for the simulated data and the experimental distance of 137.0 mm was used for the real data.

Using the jiffy programs described above, the Patterson peaks were analysed and in each case the unit cell vectors could be clearly seen. Other high peaks obviously resulted from vectors to the 'C' centre, across the cell faces etc. In this example the cell vectors are all of a similar magnitude and the vectors corresponding to multiples of the unit cell were outside the range examined. Such multiple cell length vectors were however clear in some of the other examples described below.

The sections of the Patterson map containing the peaks corresponding to the cell vectors are illustrated for the two sets of simulated data and the real data.

Figure 2 PGM Trials with Ideal and Real Data

For the real data, a series of runs were done making adjustments to the initial starting position for the centre of the primary beam. The best position for the beam, as judged by the closeness of cell dimensions obtained to those of a reference set, was very close to that of the refined centre position from a run of MOSFLM. The results were encouraging in spite of large oscillation range used and in spite of the fact that the 3.0 Angstrom data used occupied only the central section of an image recorded to 1.8 Angstroms resolution.

## EXAMPLE DATASETS USED

The following table gives a complete list of the protein crystals used in the indexing trials:

```Table 1: Protein Crystals used for Auto-Indexing Tests

Code      Protein                            Space-group

PGM       Yeast Phosphoglycerate Mutase      C2

LYS       HEWL Tetragonal Lysozyme           P43212

PRI       Prismane protein                   P212121

PST       Pig Serum Transferrin              C2

NPL       Narcissus Pseudonarcissus Lectin   C222

INS       Insulin                            P212121
```
All data used were collected on the Daresbury SRS. The PGM images were collected by H.C. Watson and J.W. Campbell. The LYS, PRI and PST images were collected by E.M.H. Duke and the NPL and INS images were collected by P.J. Rizkallah and M.Z. Papiz.

## RESULTS FROM REAL EXAMPLES

The parameters used for the recording and processing of the test images (one from each dataset) are shown in the following table:

```Table 2: Parameters used with example datasets

Sample Detector Lambda  ctof   Osc.  Resolution   No.    FFT  Peaks-in-FFT
type   type     (A)    (mm)  angle  Obs.  Used  spots  cell   rms  thresh

PGM      IP     1.00    137.0  2.0    1.8   3.0   278  384.0  4.24    30

LYS      IP     0.92    310.0  1.0    2.1   2.5   311  320.0  4.00    35

PRI      IP     1.70    270.0  1.5    3.4   3.5   232  448.0  4.77    50

PST      IP     0.92    310.0  1.0    2.1   2.5   206  320.0  4.93    50

NPL     CCD     0.87     90.0  1.0    1.4   2.5   162  320.0  5.56    30

INS     CCD     0.87     90.0  0.1    1.4   2.5   141  320.0  5.95    40

Note: Grid for FFT was 256 in all cases
```
Two figures show the Patterson peaks at the ends of the unit cell vectors. In each case a detail for the section (a 20x20 grid point box) is shown. The elongation of the peaks in the vertical direction on the plots corresponds to a elongation in the direction of the X-ray beam and is presumably due to the lower resolution of the recorded reciprocal lattice data in that direction.

Figure 3 Patterson Peaks from MAR Image-Plate Examples

Figure 4 Patterson Peaks from MAR CCD Examples

As might be expected, the weakest of the three unit cell vectors in the PGM example was the one lying approximately in the direction of the X-ray beam. The effect of sparser data in this direction and also the finite size of the oscillation angle are presumably factors affecting such peaks. The two degree oscillation angle is probably larger than desirable in any case but it is encouraging that the required vectors are nevertheless still well defined. The vector to the 'C' face centre is stronger than that of the 'a' vector and is also shown in the first figure.

In both the LYS and PRI cases, all the desired vectors showed up well.

The most disturbing result was in the case of the PST crystal (C face centred, monoclinic cell) where the 'a' axis vector could not be detected. On the other hand, the vector to the face centre gave a strong peak and the choice of this vector with the two other clearly identifiable cell vectors gave a valid primtive cell. The FIND_ORIENT program was used to determine the crystal orientation for this primitive cell and a simulation using ROTGEN gave a good match to the observed image. The procedure was repeated using 3.5 Angstrom data with a 256x256 grid and also 3.0 Angstrom data with the finer 512x512x512 grid. In both these cases, the 'a' vector could be seen.

The NPL data collected on the MAR CCD system gave reasonable peaks for the cell vectors.

The INS data were collected in narrow phi slices of 0.1 degrees and, as in the other cases, a single image was used. Though, in this case, not given the cell dimensions but given that the cell was orthorhombic, the cell dimensions were readily determined.

The results of the cell determinations are shown in the following table together with reference values for comparison.

```Table 3: Determined and reference cell parameters.

Test ---Determined-cell-parameters----  ----Reference-cell-parameters-----
a     b     c   alph  beta gamm     a     b     c   alph  beta gamm

PG   97.1  85.7  82.7  90.5 121.4 89.0   96.2  85.8  81.9       120.5

LYS  79.5  80.6  38.3  86.4  87.7 89.8   79.6        38.5

PRI  65.0  65.9 154.8  90.1  90.5 89.7   65.1  65.7 155.2

PST   --   45.2  79.6  90.0   --  --    224.8  45.2  79.2
45.2  79.6 116.6 106.6 101.3 90.0   45.2  79.3 114.7 104.9 101.3 89.9

NPL  71.8 101.1  38.3  89.2  86.3 90.9   73.1 100.9  37.0

INS  52.8  57.7 36.2   91.4  91.3 90.8   50.9  56.9  37.1
```

```Table 4: Peak heights in the Patterson of cell vectors

Sample   ------Peak- Height------
a     b     c     ab-cen

PGM    35.0  55.0  62.5    73.6

LYS    61.3  75.8  57.7

PRI    62.0  85.9  59.8

PST     --   96.1  76.7    56.0

NPL    33.7  32.1  67.2

NPL    40.0  60.5  76.6

```
It was generally observed that peaks with heights of 30 or more were found for the expected vectors. With the current refinement procedures for the individual 't' vectors, the examples show errors of up to about 3% in the lengths of the cell vectors found and up to 3.7 degrees in the cell angles.

## CONCLUSIONS

The results obtained would seem to confirm that the proposed method could be put to practical use. A customised program would need to be written incorporating the stages described with particular emphasis on speeding up the FFT calculation step. The refinement of the individual potential cell vectors also needs further attention. Having selected suitable cell vectors, the stages following that including refinement of the cell and its orientation, cell reduction etc. would be analogous to that used in already well established procedures.

As with other auto-indexing or orientation determining procedures, it is important to have good values for the centre of primary beam position and the crystal to image distance.

## REFERENCES

1. Clegg W., (1984) "Enhancements of the Auto-Indexing Method for Cell Determination in Four-Circle Diffractometry", J. Appl. Cryst. 17 334-336.

2. Kabsch W., (1988) "Automatic Indexing of Rotation Diffraction Patterns", J. Appl. Cryst. 21 67-71.

3. Kim S., (1989) "Auto-Indexing Oscillation Photographs", J. Appl. Cryst. 22 53-60.

4. Higashi T., "Auto-Indexing of Oscillation Images", J. Appl. Cryst. (1990) 23 253-257.

## ACKNOWLEDGEMENTS

My thanks are due to Phil Evans (MRC Cambridge) for drawing my attention to the subject and to my colleagues at the Daresbury laboratory, in particular Liz Duke, Pierre Rizkallah and Miroslav Papiz for making available some test images for the trials.