Report from the ISGO Workshop: "Automation of X-ray Structure Determination for Structural Genomics" October 8-9th 2002, BESSY Organisers: Victor Lamzin, Thomas Terwilliger and Uwe Mueller The workshop aimed to cover all aspects from automated expression through to data collection and structure determination. This report is distilled from my handwritten notes, if anyone wants to see the programme, participant list, abstract book, or wants more detail about a particular presentation then, please let me know - PJB. Glossary: HTP = high throughput MM = macromolecular, macromolecule(s) SB = structural biology SG = structural genomics db = database Tuesday 8th October George DeTitta (Buffalo) described technologies used in their HTP lab for MM crystallisation (see http://www.hwi.buffalo.edu). Automatic delivery system using microbatch under oil method plus automated photography & processing of results. Use mySQL database to store results (>10 million images). Service available to anyone who wants to submit samples: started in Feb 2000, currently over 200 collaborators, ~70% doing "standard" SB. Ray Stevens (Syrrx Inc) talked about public-private collaboration between JCSG and Syrrx for developing HTP expression and crystallisation technologies. Using baculovirus/mammalian systems for expression of eukaryotes; nanovolume hanging drop crystallisations, with Oracle db to store images. 95% xtals taken directly to diffrn expt. Applied this to "JCSG Proteome 1" (Thermatoga) started Aug. 2001, aim to express whole genome (results so far PNAS 99 (18) 11664 (2002)). Session on "Synchrotron Beamlines and Experimental Station Automation": representatives from various synchrotrons (Thomas Earnest ALS, Peter Kuhn SSRL, Ed Mitchell ESRF, Uwe Mueller PSF, Masaki Yamamoto RIKEN). Many common threads: 1. Overall philosophy is to ultimately automate entire procedure ("xtals in structures out"). Start by automating the easiest parts. 2. Current status: everyone now has automatic robotic sample changers - advantage is that all xtals in dewar can initially be rapidly screened and "best" xtals (as determined by user) can be scheduled to be shot first. There appears to be no standardisation across synchrotrons for mounting systems or sample tracking (e.g. some using barcodes, SSRL using "code pins"). 3. A number of projects looking at automatic xtal characterisation, using software control to e.g. DNA (ESRF), ALS with Paul Adams (they are using MOSFLM but would prefer to use DENZO), SSRL using MOSFLM. One interesting point (Peter Kuhn): at Stanford they did a process/error analysis and found 69 steps from expression to xtal mounting. Even if each step is 99% error-free then only 50% of final proteins will be the ones you think you have. So automation is a good idea! Automated model-building and first map interpretation: Dusan Turk talked about his MAIN 2000 program; the philosophy is to provide automated tools to assist users in building structures (density modification, building/manipulation, semi-automated rebuilding tools). Tassos Perrakis talked about "user-free" procedures for model-building & refinement via ARP/wARP: start from native diffrn data plus some initial phases (2.5A expt data; phase extension methods mean that resoln & quality of phases is largely irrelevant). Final models are typically 90-95% complete & fairly accurate (rmsd 0.2A, essentially error-free). CCP4i interface requires only input file (data + phases) and total number of residues (used for estimating solvent content). Also "guiSIDE": side-chain docking to sequence, fitting best rotamers from Richardson's db. It's also an experiment to try and use as many publically available external software libs (e.g. Clipper), for rapid development of new applications with reduced burden of maintenance on program developers. Wednesday 8th October Tom Ioerger (Texas A&M) talked about TEXTAL (automated model building from electron density map using pattern recognition). Procedure is: assign C-alphas (CAPRA - neural net predicts positions from density), link up to make C-alpha chain, then assign side-chains (LOOKUP). Various protocols e.g. combine with refinement step. Currently works best for 2.8A data; problems with tightly bound cofactors. (See http://textal.tamu.edu:12321 - users can try TEXTAL via a web server.) Tom Terwilliger talked about RESOLVE (model building at moderate resoln) - it uses a library of protein fragments derived from solved structures. Procedure is: FFT-based identification of helices & strands (FFFEAR-like); extension with tripeptide libraries; link fragments together and use probablistic identification of side-chains from rotamer templates; probablistic sequence alignment; "molecular assembly" step (make the most compact assembly which obeys NCS etc). Can combine with refinement using REFMAC5 in iterative procedure. Also mentioned "iterative model-rebuilding" as a method for removing model bias, e.g. if you starting with a model from Molecular Replacement. (NB both TEXTAL and Tom's work are part of the PHENIX project.) Sjors Scheres (Bijvoet Center for Bioinformatic Research) talked about "conditional optimisation" as an alternative formalism for protein structure refinement. The goal is to enlarge the radius of convergence of refinement by finding alternative formulations of prior information for "unlabelled" atoms (pairs = bonds, triples = joints, quadruples = peptide planes etc). Optimium number of layers (i.e. atoms) is 9. Concludes that conditional optimisation combines flexible searching behaviour with incorporation of extensive prior information at 2A; for simplified cases can refine a structure from a random atom distribution. (Work has been published in Acta Cryst 2001 - didn't get full reference.) Advances in Phasing, Experimental Techniques and Software: Zbigniew Dauter (NSLS) talked about efficient SAD phasing ("one and a half wavelength method"). Reviewed the history of SAD, and argues that the method has great potential (especially with the advent of advances in software for phasing and autobuilding), particularly if you have accurate data (in this context accuracy and redundancy is better than high resoln) since it is very fast (important for HTP projects). Manfred Weiss (EMBL Hamburg) and Bi-Cheng Wang (University of Georgia) also each talked about the use of SAD (also refered to as "single-wavelength anomalous signal" or SAS, since it is actually an atomic property) for "direct crystallography" i.e. using the anomalous signal from native crystals to solve the structure. This should be possible as elements such as sulphur and phosphorous are common in proteins, also ~30% are metalloproteins. Manfred talked about practical issues (e.g. problems associated with using soft X-rays) but concluded that the soln of the "average protein" (>35kDa?) from native xtals was now within reach. B-C looked ahead to how advances in softw0are & hardware technologies and beamline automation could help to make this technique a standard procedure. Wladek Minor (University of Virginia) talked about an integrated system for xtallographic data collection & analysis at X9B beamline at the NSLS". Basically this is HKL2000 extended to control beamline equipment; uses client-server architecture and allows data collection to be done over the internet. Richard Morris (Global Phasing) talked about Bayesian molecular replacement (aside: presumably as the number of known structures increases, MR will become the method of choice for HTP?). MR is a global optimisation problem; it is difficult to rank solns accurately; often solns cannot be refined (radius of convergence is too small). Bricogne's "generalised MR" method (1997) uses full Bayesian statistical treatment and results in a smooth transition from the "Patterson regime" to the "Fourier regime" (ie from "no phase info" to "some phase info"), plus other advantages e.g. packing function generated from a particular rotation soln gives "prior knowledge" in the sense that the possible translation solns are limited. Also mentioned an object-oriented Java/C/Fortran library "Bayesian Analysis and Log Likelihood System" (or "BALLS"!) developed by Clemens Vonrhein at Global Phasing. Currently includes tools for rotational sampling, calculation of packing function, log likelihood, Bayesian update etc etc). Possible that the API will be made publically available. Paul Adams spoke about the PHENIX project (see http://www.phenix-online.org). Expect to release a beta test version at the start of 2003 (will include heavy atom search, SOLVE/RESOLVE, TEXTAL). Currently working with the PDB to streamline deposition procedure. Hope to have funding to incorporate single xtal electron microscopy. Peter Briggs 29/10/2002 --------------------------------------------------------------------------