Report from the ISGO International Conference on Structural Genomics (ICSG) 2002 October 10-13th 2002 Berlin, Germany NB: This report is distilled from my handwritten notes, if anyone wants to see the programme, participant list, abstract book, or wants more detail about a particular presentation then, please let me know. If anyone is interested in some or all of my full notes for the whole workshop + conference (24 sides of A4!) then please let me know - PJB. Thursday 10th October Robert Huber opened the conference with a talk about proteosomes (funtions include removal of denatured proteins and degradation of short-lived signalling molecules). These have become a target for drug design but are large and complex. He described attempts to understand the functional mechanism based on an understanding of structure. Chris Dobson talked about the problems of understanding the mechanisms behind protein folding. Molecules are able to fold accurately and quickly in a packed cellular environment, and correct folding is essential to biological function. Large complex molecules appear to be assembled from smaller pre-folded molecules. Misfolding is the cause of a variety of diseases: a particular example being protein amyloid diseases (e.g. Parkinson's disease) where misfolded proteins aggregate and form amyloid fibres. He suggests that proteins can be thought of as "evolved polymers", and that certain sequences are more prone to misfolding & aggregation than others. Since often only a few key residues are responsible for misfolding, it may be possible to design sequences that preserve function whilst being more resistant to these problems. In the session on "Protein Production for Structural Genomics", Dave Stuart talked about the development of the Oxford Protein Production Facility, 3yr MRC-funded project to establish HTP protein production & crystallisation, which is focused on biomedically significant targets (cancer, immune cells proteome, Herpes virus). Naomi Chayn talked about methods for HTP xtallisation (drop under oil in microbatch arrays). Suggests that "containerless" methods (e.g. drop suspended between two layers of oil) might be suitable for xtallising membrane proteins. Geoffrey Waldo (Los Alamos) talked about engineering proteins for improved solubility/crystallisability (points out that 40% of proteins in TB genome are insoluble). He creates crystallisable constructs via "directed evolution" i.e. generate mutants and screen for improved solubility etc, then make mutants from those (screening done using GFP as a reporter). It wasn't clear whether function was significantly impaired at the same time as improving crystallisability. Friday 11th October Kurt Wuthrich's plenary gave an overview of the history of determining 3D NMR solution structures of biological macromolecules. Pointed out that NMR can study proteins in solution, e.g. in vivo fluids, and that it has a valuable role to play as a complementary method to xtallography: e.g. NMR studies of confirmational states of individual molecules in supramolecular structures, like membranes. Following session examined some of the challenges of studying membrane proteins, particularly using xtallography (apologies, from my notes I am unsure whether these are general problems or specific to particular proteins) for example: lipid co-crystallisation (difficult to remove), and the xtal packing can be discontinuous or anisotropic. Different ways of expressing proteins have (dis)advantages: in membranes (correctly folded but often toxic to the host, difficult to preserve fold whilst purifying); as inclusion bodies (partially folded or unfolded, so difficult to refold, but can give high yields); in helical arrays (rarely observed but useful for EM). Chris Sander's plenary gave an overview on the "Challenge of Structural Genomics". His vision was to solve the 3D structure of "all" proteins, to use for classification, understanding molecular evolution, for drug design and so on. He believes we only need to experimentally determine a subset of proteins which is sufficient to cover "protein space", then others can be determined from homology modelling (estimates 4000 required to cover 90% of Pfam). Can then perform evolutionary classification of all proteins by structure and functions (it's possible to have structural divergence and functional convergence). Another challenge is studies of "functional complexes" i.e. protein-protein interactions, and study of all gene products in 3D (not just proteins, e.g. also functional RNA). In the following session on "Structural Genomics and Disease" Wim Hol talked about the SGPP project ("Structural Genomics of Pathogenic Protozoa"), and the issues associated with using structural genomics approaches for fighting disease. He argued that there are great problems with predicting active sites from structural/functional similarity. Pointed out that new drugs need to be delivered in pairs, to reduce the evolution of drug resistance. Eric Adam talked about the platform being developed by Syrrx to go rapidly from gene to structure to drug (in one case in just 4 months). Suggested that availability of structures leads to a diversity (chemically speaking) of potential drugs. Jean-Paul Renaud's plenary focused on "Characterising Orphan Nuclear Receptors through Structural Studies". "Orphan receptors" are those for which no ligand is known, basically they want to find the ligands. Problem as before is low solubility and/or low crystallisability of the target proteins, often due to the absence of the ligand. Sun-Huo Kim talked about the structural genomics ideal of going from structure to function. Over 60 genomes have been sequenced, containing 10^3 to 10^5 genes, and the functions of over 50% of these are either unknown or uninferable from the sequence; also 8-20% of functional annotations are estimated to be in error. He concludes that sequence alone is insufficient for functional inference, and talked about the idea of "functional genomics": molecular function (chemistry and physics), and cellular function (i.e. pathways of molecular functions). Aled Edwards (Toronto) argued that structural proteomics efforts purify ten times more soluble protein than is needed for structure determination via xtallography, and that this excess could be used in rapid biochemical screens testing for function. The session on "NMR Methods for Structural Genomics" contained talks on automating the process of structure determination from NMR data. The current bottleneck is that assignments of resonances and NOES have be done manually. Miguel LLinas (Pittsburgh) talked about CLOUDS, which performs assignments automatically and generates proton densities analogous to xtallographic maps from which protein structures can be determined. Geometry and overall quality of CLOUDS structures are comparable with previous manually-determined structures. Peter Guntert has alternative software CYANA, which automates two of the stages of conventional NMR structure solution (collection of confirmational restraints & structure calculation) - see http://www.guerntert.com. Other talks in the same session concentrated on NMR as a powerful complementary technique to xtallography. Saturday 12th October Janet Thornton's plenary talked about tools to aid "functional annotation" - determination of biochemical and biological function and various other properties (e.g. location of active site, which ligands bind to it) from the protein structure. Developing program called Profunc (with Roman Laskowski and James Watson) to do this automatically. There are already many programs available to do auto identification of protein family from structure comparison (SAP,SSM,DALI) but same superfamily doesn't necessarily imply same function. Also, <35% sequence identity almost definitely alters the function (e.g. binding to different substrate). Uses 3D templates (Andrew Wallace et al, 1997) to search new structures for possible binding sites, and other methods to determine multeric states, biological interfaces etc. She argues that "data integration" is required to pull together information from different sources so it can be used for functional annotation. In the "Bioinformatics: Protein Space" session, Sung-hou Kim talked about the idea of a "molecular function-fold family dictionary" (like a periodic table of protein fold domains). He introduced the concept of "structural distance" as a measure of pairwise structural dissimilarity; performing various analyses he concludes that structures are clustered in particular regions of 3D fold-space. Lisa Holm talked about a new program "MaxFlow" for performing structure/sequence alignments. It uses a method called "transitive alignment", which moves between closely related sequences as steps to reach more distant homologues - this method seems able to generate accurate alignments across large evolutionary distances. Allampura Babu (PSF Berlin) talked about a "web book" (essentially web-based LIMS) for handling structural genomics data. Has a dictionary of data items based on a PDB initiative. Basic architecture consists of: web server (Apache) & database server (MySQL), with scripting done in Python. Used Netscape as web browser and developed on a Linux platform - so all components are freely available. In the second bioinformatics session Olivier Lichtarge (Houston) introduced the idea of "evolution as a computational filter", to try and identify functional sites in protein structures via comparisons of sequences. The concept is that every branch point in the evolutionary tree is like a virtual screen - so look for changes in trace residues which alter the function, as these point to changes in the binding sites (clustering of such residues in the structure are more likely to overlap with real functional sites - I guess you also need the structure in order to determine the clustering). Kengo Kinoshita (Yokohama) introduced eF-site (=electrostatic-surface of Functional SITE, see http://www.pdbj.org/eF-site), a method for identifying functional sites. Similar to e.g. DOCK in that it uses graph-theory and searches against a database - except in this case compares site against other sites (rather than against possible ligands). Was able to predict function of a hypothetical protein (in spite of presence of new fold), later confirmed directly by expt. In the session on "X-ray methods for structural genomics", Victor Lamzin talked about the most significant developments in the new version of ARP/wARP: model-building is faster & models are more complete; can deal with poorer starting phases; building is possible at lower resoln limit, e.g. can get 50% completeness at 2.9A. It is important to use good quality data (e.g. excluding poor data in innermost resoln shell resulted in more complete model - possibility of feeding this information back into data collection?). Also can deal with large multimeric structures, and basic automated ligand building. Future developments include: extension to lower resoln (~3A); ~100% model completeness (poorly defined loops, NCS); improved ligand building. Tom Terwilliger (in addition to plugging PHENIX) talked briefly about SOLVE (essential feature: a reliable method of scoring putative heavy atom sites which can discriminate between correct and incorrect solns) and RESOLVE (iterative model building at moderate resoln, using template matching into electron density plus probabilistic sequence alignment). Wayne Anderson (Chicago) gave an overview of his method for automating structure determination, using existing software (CNS, SOLVE/RESOLVE, SHELXD). Essentially has a central MySQL database (the "queen ant") which distributes work to "worker ants" (which are Unix daemons idling until they recieve some input, one ant for each stage e.g. "scaling by CNS"). Easily extensible to distributed computing environment. Will be released as open source at some stage. Dominika Borek (Dallas) suggested a novel data collection protocol. In a "typical" Se-MAD expt 10% of the errors are due to radn damage, 1-2% are systematic & random errors; can reduce the latter errors by using a strategy which maximises the number of photons collected, and correct for effects of this damage using a model of the resultant changes in intensities due to radiation damage. In future it is possible that radiation damage could be used as a source of additional phase information to aid in the structure determination. Sunday 13th October Helen Berman talked about efforts at the PDB to facilitate integration of structural genomics activities, and to enable HTP deposition. Examples include: PDB Structural Genomics Portal (http://www.rscb.org/pdb/strucgn.html); Target database (http://targetdb.pdb.org, somewhere for groups to register targets or see if someone else is working on it); Data dictionaries (http://deposit.pdb.org/mmcif, to enable data exchange between local databases, includes dictionaries for X-ray/NMR/protein production); tools for HTP deposition (http://deposit.pdb.org/software, standalone version of ADIT plus other validation tools available as open source, for individuals to integrate into their own projects). Mitch Guss (Sydney Australia) reported back on efforts to facilitate publication of the results from structural genomics. He conceded that as yet structural genomics has not delivered an avalanche of new structures, but suggested it may happen in future - in which case the community needs to examine the way that macromolecular structures are published. Compares situation now with the problem that small molecule community faced 30yrs earlier, and where now most structures are published electronically. Suggests that scientific publication is something which "adds value" (e.g. by a discussion of biological importance); which acts as a "statement of record"; is refereed (nb PDB depositions are validated but not refereed); is long-lived (won't be changed in future). "Electronic publication" is taken to refer only to the method of distribution i.e. it cannot be changed over time. Tom Terwilliger talked generally about the principles of the ISGO: to develop standards and policies for structural genomics; sponsor international meetings and workshops; promote co-operation (including public-private efforts); promote publication of data. It has a number of task forces (deposition, publication, IPR, target tracking, etc). Key principles: free exchange of data and materials; deposition of coordinates & mandatory data in PDB immediately on completion of structure determination; public release in short time (<6 months); open exchange of targets via target database. There are particular concerns regarding patenting, the ISGO supports the position that patents "need to have a high degree of utility." Website is http://www.isgo.org, mailing list news@isgo.org. Stephen Burley (Structural GenomiX aka SGX) gave a plenary lecture in which he stressed the need for fast turnaround xtallography in structure-based drug discovery. The drug discovery procedure is iterative, at each stage information on structure feeds into chemistry etc to suggest the next set of leads that will be examined in the next cycle. Essentially it is necessary to be able to co-crystallise and solve structures of bound ligands on the time scale of one of these cycles (~4 weeks). He also introduced a method he called "virtual screening by docking": 1. A large database of binding candidates is reduced to a much smaller "enriched" database by applying 1st generation methods (e.g. DOCK) as a rapid filter 2. Use a scoring method which accurately ranks ligands in terms of binding energies. Historically scoring functions have been poor indicators because confirmational changes can cause large changes in binding energy, so SGX method is to take a starting confirmation with the ligand, add solvent and then run molecular dynamics to generate a series of "snapshots" with different binding energies. This way they can generate an average binding energy which can be used for scoring. Peter Briggs 29/10/2002 --------------------------------------------------------------------------