News of MOSFLM 6.0

Harry Powell, MRC-LMB, Cambridge.

The recent release of MOSFLM 6.0 contains a number of new features. Two new detector types have been added - the LIPS (Large Image Plate Scanner at ESRF) and SBC1 (Westbrook detector at APS).

The principal improvement is the inclusion of the Fourier auto-indexing routines which have been written by Ingo Steller in the Rossmann group at Purdue University. These are a set of freely available programs written in ANSI C using ANSI FORTRAN FFT routines from the National Center for Atmospheric Research, Colorado; they are also available in the DPS Data Processing Suite for the ADSC detector used at MacCHESS

A few local modifications to the routines have been made which increase their utility; chief among these is that reflections may be selected from multiple images, which is especially useful in the case of weakly diffracting images or where the reciprocal lattice has a principal axis close to the X-ray beam. The incorporation of the routines gives MOSFLM a robust and reliable indexing mode which is similar to that used in Denzo in its effectiveness.

For the first time we have included a protocol in the build for PCs running Linux. The program has been built successfully on a Pentium MMX 233MHz with 128Mb RAM under RedHat release 5.1-2 using the GNU C compiler gcc 2.7.2.3-14 and FORTRAN compiler egcs-g77 1.0.3a-14. MOSFLM builds in around 10 minutes on this system. This method can also be used on a PowerPC machine (e.g. Macintosh) running LinuxPPC Release 4.

Executables for the following platforms/operating systems are available (see below) on the LMB's ftp server.

Digital Alphas: Digital UNIX 4.0
SGIs:           Irix 5.3, Irix 6.2, Irix 6.4, Irix 6.5
Intel PC:       RedHat Linux 5.1
PPC Macintosh:  LinuxPPC Release 4

Alternatively, these can be built locally (using CCP4 release 3.4 libraries) under the csh or tcsh shell after extracting the directory structure simply by typing "make"; the HOSTTYPE environment variable is checked and Makefiles chosen accordingly. Irix 6.4 and 6.5 are accommodated by setting HOSTTYPE to an appropriate value.

All the builds require that the CLIB environment variable has been set to the directory containing the two libraries libccp4.a and lib_xdlview.a; a customized version of the code for lib_xdlview.a required for a build under Linux has been prepared by Joachim Meyer of EMBL.

MOSFLM can be built either with or without the existing autoindexing code REFIX as supplied by Wolfgang Kabsch. If the source code for REFIX is required, Andrew Leslie should be contacted directly.

The program is available via anonymous ftp from ftp.mrc-lmb.cam.ac.uk/pub/mosflm.

During the recent CCP4 Study Weekend in Sheffield, a number of people made the comment that MOSFLM seems to run more slowly than Denzo when integrating images. This prompted me to do a spot of bench-marking. I took two datasets collected from protein crystals produced in the LMB and ran comparative jobs for both programs on our Digital Alpha 'farm'; this is a bank of processors given over to batch processing single jobs at the same time, so the timings of the programs should not be affected by time-sharing considerations. For each job, the images were copied onto a scratch disk local to the processor being used to minimize the effect of network traffic.

Hardware: Digital Alphaservers, 500MHz processors, 128Mb main memory, 1.5Gb swap. In each case the batch job was submitted 10 times and run independently. The Denzo and MOSFLM command files were written to reflect what a reasonably experienced user of either program might use for initial processing of the datasets; no attempt was made to optimize the process, hence the relatively poor agreements between the datasets.

The time taken to convert Denzo .x files to .mtz format has been ignored in the calculations, as it is possible that some users prefer not to use the CCP4 suite for further processing.

Post-processing for both datasets was performed using programs from the CCP4 suite.

Dataset 1

Hen egg-white lysozyme, space group P4₃2₁2, 45 images, Mar 30cm. Resolution range 2 - 60 A, overall R(merg) between datasets on F 0.048, RF_I 0.062, Wted_R 0.053

Dataset 2

Hepatitis B capsid with ligand bound, space group C2, 60 Images, Mar 30cm. Resolution range 6 - 60 A, overall R(merg) between datasets on F 0.095, RF_I 0.129, Wted_R 0.070

             N(obs)  N(ref) N(merg)  Rmeas Complete   Mean    Max    Min
HEWL Denzo   42095   27826   8544    0.125  97.4%    2'16.5" 2'14"  2'22"
     MOSFLM  47956   28293   8554    0.117  98.6%    2'40.5" 2'33"  2'43"

HepB Denzo  168104   87028   61308   0.246 100%      6'57.2" 7'15"  6'44"
     MOSFLM 198425   87545   61589   0.256 100%      6'50.2" 6'59"  6'39"

The conclusion from this is that the two programs perform comparably as regards speed, and this should not be a deciding factor in choosing which program to use.

It has been suggested that another explanation for the observation is that MOSFLM might run more slowly on machines with limited RAM due to excessive paging, as it keeps two images in memory simultaneously as opposed to one for Denzo. This seems unlikely to provide a complete answer as the following test indicates. I ran tests on an Intel PC (Pentium MMX, 233MHz) running Linux with different amounts of RAM, all frames on an internal hard disk.

Using the lysozyme dataset above, I obtained the following times for complete integration, again for 10 jobs each;

128Mb: mean 9'02.3",  min 8'58",  max 9'05"
32Mb:  mean 10'44.4", min 10'39", max 10'47"
24Mb:  mean 11'19.5", min 11'16", max 11'27"

MOSFLM would not run on a PC with only 16Mb of RAM. Obviously it runs more slowly with less memory, but the difference is not as great as the 5 - 10 times slower that had been mentioned at the Sheffield CCP4 workshop.

Harry Powell

Newsletter contents...