Report from the ISGO Workshop: "Automation of X-ray Structure
Determination for Structural Genomics"

October 8-9th 2002, BESSY

Organisers: Victor Lamzin, Thomas Terwilliger and Uwe Mueller

The workshop aimed to cover all aspects from automated expression through
to data collection and structure determination. This report is distilled
from my handwritten notes, if anyone wants to see the programme,
participant list, abstract book, or wants more detail about a particular
presentation then, please let me know - PJB.

Glossary:
HTP = high throughput
MM  = macromolecular, macromolecule(s)
SB  = structural biology
SG  = structural genomics
db  = database

Tuesday 8th October

George DeTitta (Buffalo) described technologies used in their HTP lab for
MM crystallisation (see http://www.hwi.buffalo.edu). Automatic delivery
system using microbatch under oil method plus automated photography &
processing of results. Use mySQL database to store results (>10 million
images). Service available to anyone who wants to submit samples: started
in Feb 2000, currently over 200 collaborators, ~70% doing "standard" SB.

Ray Stevens (Syrrx Inc) talked about public-private collaboration between
JCSG and Syrrx for developing HTP expression and crystallisation
technologies. Using baculovirus/mammalian systems for expression of
eukaryotes; nanovolume hanging drop crystallisations, with Oracle db to
store images. 95% xtals taken directly to diffrn expt. Applied this to
"JCSG Proteome 1" (Thermatoga) started Aug. 2001, aim to express whole
genome (results so far PNAS 99 (18) 11664 (2002)).

Session on "Synchrotron Beamlines and Experimental Station Automation":
representatives from various synchrotrons (Thomas Earnest ALS, Peter Kuhn
SSRL, Ed Mitchell ESRF, Uwe Mueller PSF, Masaki Yamamoto RIKEN). Many
common threads:

1. Overall philosophy is to ultimately automate entire procedure ("xtals
in structures out"). Start by automating the easiest parts.

2. Current status: everyone now has automatic robotic sample changers -
advantage is that all xtals in dewar can initially be rapidly screened and
"best" xtals (as determined by user) can be scheduled to be shot first.
There appears to be no standardisation across synchrotrons for mounting
systems or sample tracking (e.g. some using barcodes, SSRL using "code
pins").

3. A number of projects looking at automatic xtal characterisation, using
software control to e.g. DNA (ESRF), ALS with Paul Adams (they are using
MOSFLM but would prefer to use DENZO), SSRL using MOSFLM.

One interesting point (Peter Kuhn): at Stanford they did a process/error
analysis and found 69 steps from expression to xtal mounting. Even if each
step is 99% error-free then only 50% of final proteins will be the ones
you think you have. So automation is a good idea!

Automated model-building and first map interpretation: Dusan Turk talked
about his MAIN 2000 program; the philosophy is to provide automated tools
to assist users in building structures (density modification,
building/manipulation, semi-automated rebuilding tools).

Tassos Perrakis talked about "user-free" procedures for model-building &
refinement via ARP/wARP: start from native diffrn data plus some initial
phases (2.5A expt data; phase extension methods mean that resoln & quality
of phases is largely irrelevant). Final models are typically 90-95%
complete & fairly accurate (rmsd 0.2A, essentially error-free). CCP4i
interface requires only input file (data + phases) and total number of
residues (used for estimating solvent content). Also "guiSIDE": side-chain
docking to sequence, fitting best rotamers from Richardson's db. It's also
an experiment to try and use as many publically available external
software libs (e.g. Clipper), for rapid development of new applications
with reduced burden of maintenance on program developers.

Wednesday 8th October

Tom Ioerger (Texas A&M) talked about TEXTAL (automated model building from
electron density map using pattern recognition). Procedure is: assign
C-alphas (CAPRA - neural net predicts positions from density), link up to
make C-alpha chain, then assign side-chains (LOOKUP). Various protocols
e.g. combine with refinement step. Currently works best for 2.8A data;
problems with tightly bound cofactors. (See http://textal.tamu.edu:12321 -
users can try TEXTAL via a web server.)

Tom Terwilliger talked about RESOLVE (model building at moderate resoln) -
it uses a library of protein fragments derived from solved structures.
Procedure is: FFT-based identification of helices & strands (FFFEAR-like);
extension with tripeptide libraries; link fragments together and use
probablistic identification of side-chains from rotamer templates;
probablistic sequence alignment; "molecular assembly" step (make the most
compact assembly which obeys NCS etc). Can combine with refinement using
REFMAC5 in iterative procedure. Also mentioned "iterative
model-rebuilding" as a method for removing model bias, e.g. if you
starting with a model from Molecular Replacement.

(NB both TEXTAL and Tom's work are part of the PHENIX project.)

Sjors Scheres (Bijvoet Center for Bioinformatic Research) talked about
"conditional optimisation" as an alternative formalism for protein
structure refinement. The goal is to enlarge the radius of convergence of
refinement by finding alternative formulations of prior information for
"unlabelled" atoms (pairs = bonds, triples = joints, quadruples = peptide
planes etc). Optimium number of layers (i.e. atoms) is 9. Concludes that
conditional optimisation combines flexible searching behaviour with
incorporation of extensive prior information at 2A; for simplified cases
can refine a structure from a random atom distribution. (Work has been
published in Acta Cryst 2001 - didn't get full reference.)

Advances in Phasing, Experimental Techniques and Software: Zbigniew Dauter
(NSLS) talked about efficient SAD phasing ("one and a half wavelength
method"). Reviewed the history of SAD, and argues that the method has
great potential (especially with the advent of advances in software for
phasing and autobuilding), particularly if you have accurate data (in this
context accuracy and redundancy is better than high resoln) since it is
very fast (important for HTP projects).

Manfred Weiss (EMBL Hamburg) and Bi-Cheng Wang (University of Georgia)
also each talked about the use of SAD (also refered to as
"single-wavelength anomalous signal" or SAS, since it is actually an
atomic property) for "direct crystallography" i.e. using the anomalous
signal from native crystals to solve the structure. This should be
possible as elements such as sulphur and phosphorous are common in
proteins, also ~30% are metalloproteins.

Manfred talked about practical issues (e.g. problems associated with using
soft X-rays) but concluded that the soln of the "average protein"
(>35kDa?) from native xtals was now within reach. B-C looked ahead to how
advances in softw0are & hardware technologies and beamline automation
could help to make this technique a standard procedure.

Wladek Minor (University of Virginia) talked about an integrated system
for xtallographic data collection & analysis at X9B beamline at the NSLS".
Basically this is HKL2000 extended to control beamline equipment; uses
client-server architecture and allows data collection to be done over the
internet.

Richard Morris (Global Phasing) talked about Bayesian molecular
replacement (aside: presumably as the number of known structures
increases, MR will become the method of choice for HTP?). MR is a global
optimisation problem; it is difficult to rank solns accurately; often
solns cannot be refined (radius of convergence is too small). Bricogne's
"generalised MR" method (1997) uses full Bayesian statistical treatment
and results in a smooth transition from the "Patterson regime" to the
"Fourier regime" (ie from "no phase info" to "some phase info"), plus
other advantages e.g. packing function generated from a particular
rotation soln gives "prior knowledge" in the sense that the possible
translation solns are limited.

Also mentioned an object-oriented Java/C/Fortran library "Bayesian
Analysis and Log Likelihood System" (or "BALLS"!) developed by Clemens
Vonrhein at Global Phasing. Currently includes tools for rotational
sampling, calculation of packing function, log likelihood, Bayesian update
etc etc). Possible that the API will be made publically available.

Paul Adams spoke about the PHENIX project (see
http://www.phenix-online.org). Expect to release a beta test version at
the start of 2003 (will include heavy atom search, SOLVE/RESOLVE, TEXTAL).
Currently working with the PDB to streamline deposition procedure. Hope to
have funding to incorporate single xtal electron microscopy.

Peter Briggs 29/10/2002
--------------------------------------------------------------------------