NEWS FROM THE UPPSALA SOFTWARE FACTORY - 8

Les amis d'O

Gerard J. Kleywegt
Department of Molecular Biology
Biomedical Centre, Uppsala University
Uppsala - Sweden

In this article we shall take a closer look at three of the lesser-known utility programs from Uppsala that work in conjunction with O [1].

SOD

SOD [2] stands for "Sequences to O Datablocks", i.e. it is a program that converts sequences into information that can be used in or by O. The program can read individual and aligned sequences in a number of formats. It can be used to do the following:

to generate an O datablock of the sequence of your protein (task INIT). This is a quick way to get this information into O when you are about to assign the sequence to your Ca trace. The datablock produced by SOD can be used with the sam_init_db command in O to initialise the data structures for a new protein molecule.
to generate O macros with which one can quickly build a homology model or, more interestingly, a molecular replacement search model (task HOMO). In order to do this, SOD requires two aligned sequences, one of a protein whose structure is known (and available - which is not always the same, unfortunately), and one of a related protein for which you want to generate a model. SOD will use the aligned sequences to generate an O macro which contains mostly Mutate instructions (mutate_insert , mutate_delete , and mutate_replace ). In the case of homology modelling, all residues that differ in the two sequences will be replaced by the residue type of the protein for which one wants to build a model, and deletions and insertions are included (although they will have to be modelled by the user). If one generates a molecular replacement search probe, on the other hand, residues that differ between the two sequences will be replaced by alanines, deletions are carried out, but insertions will not be made. (Other tools for generating molecular replacement search models were discussed in a previous episode in this series, available at URL: http://alpha2.bmc.uu.se/~gerard/manuals/factory_6.html.)
to do a pairwise comparison of one sequence with one or more others (task PAIR). For each comparison an O datablock will be generated which contains an integer code for every residue: 0 = conserved residue, 1 = mutation, 2 = insertion in other sequence, 3 = deletion in other sequence, 4 = outside other sequence. This datablock can be used to colour the molecule, e.g. using the paint_case command; a Ca-trace will then reveal where mutations, insertions and deletions occur in the 3D structure.
to analyse multiple aligned sequences (task MULT). This option is useful to present information regarding sequence conservation in a large family of related proteins (provided the structure of one of them is available). The following O datablocks will be produced (assuming that the molecule is called "M1" in O ):
- - M1_RESIDUE_POSSIBLE - listing all residue types encountered for every residue;
- - M1_RESIDUE_CONSERVED - degree of conservation (%) of each residue type in the sequence;
- - M1_RESIDUE_VARIATION - a count of the number of different residue types observed at each position;
- - .ID_SOD - a temporary .id_template showing all of the above properties when you click on an atom;
- - @M1_SOD - a macro to produce three objects from your molecule: CONS (Ca-trace colour-ramped by M1_RESIDUE_CONSERVED), VARI (Ca-trace colour-ramped by M1_RESIDUE_VARIATION), and GRAD (Ca-trace coloured in steps according to M1_RESIDUE_CONSERVED).
Simply reading the datablock file into O and executing the macro will produce the three graphics objects.

ODBMAN

One of the useful features in O is its use of datablocks (of type real, integer, character or text) to represent information pertaining to a molecule as a whole, to each of its residues, or to each of its atoms [3] . Although O contains a number of commands to manipulate datablocks, a separate utility program (ODBMAN, for "O DataBlock MANipulation" [4] ) is also available. Its options (besides trivial I/O-related ones) fall into the following categories:

extracting information from other sources (EXtract commands). With these commands, information can be extracted from formatted files and stored as real or integer datablocks. The input can either be formatted, or field oriented (i.e. , containing fields separated by tabs or spaces). A separate option is available to extract information from a ProCheck [5] output file (residue type and name, secondary structure assignment according to the DSSP algorithm, area of the Ramachandran plot in which each residue resides, the number of bad contacts for each residue, and its H-bond energy).
manipulating individual entries of a datablock (SEt commands). This includes options to set all entries of a datablock to a particular value, to set a consecutive stretch of entries to a particular value, to set a consecutive stretch of entries individually, and to "translate" information from other datablocks (e.g. , to generate an integer datablock representing secondary structure from a character datablock). Using these options, it is not too difficult to colour a protein structure in the colours of the Dutch flag, for instance.
manipulating entire datablocks. This include options to do simple arithmetic on integer and real datablocks, to smoothen datablocks, and to modify character datablocks.
analysis of datablocks. Options are available to list some statistics and to produce histograms of individual datablocks, and to produce line plots and scatter plots of individual datablocks or of one datablock versus another.

O2D

Many of the Uppsala programs (e.g. , ODBMAN, MOLEMAN2, LSQMAN, DATAMAN, MAPMAN) produce (ASCII) plot files in a meta-format. O2D [6] is a simple program to convert such plot files into other formats. Usually, this program will be used to convert plot files into PostScript files, but the program can also produce tab-delimited ASCII files, which can be read by most popular spreadsheet and graphing programs on the market, and hence used to produce more professional-looking graphs. The SGI version of this program in addition allows the user to plot data interactively in graphics windows. O2D can produce line and scatter plots, histograms and simple pie charts of 1D data, and contour plots of 2D data. In interactive mode, there are also simple facilities for integrating curves or contour plots, and to manipulate the display. The (ASCII) meta-format for both 1D and 2D plots is simple, using six-character keywords (see the manual for details). There is also a C-shell script available which will do a batch conversion of many plot files to PostScript.

AVAILABILITY

SOD, ODBMAN, and O2D are part of the X-UTIL package, which is available free of charge to academic users from ftp://alpha2.bmc.uu.se/pub/gerard/xutil/. Commercial users may contact GJK for more information (mailto:gerard@xray.bmc.uu.se ). For more information about O , contact Alwyn Jones (mailto:alwyn@xray.bmc.uu.se ). The O WWW site is at http://imsb.au.dk/~mok/o/ , and the Uppsala Software Factory can be found at http://alpha2.bmc.uu.se/~gerard/manuals/ .

REFERENCES

[1] Jones, T.A., Zou, J.Y., Cowan, S.W. and Kjeldgaard, M. (1991). Improved methods for building protein models in electron density maps and the location of errors in these models. Acta Crystallogr. A47 , 110-119.
[2] The manual for this program is available at URL: http://alpha2.bmc.uu.se/~gerard/manuals/sod_man.html
[3] Jones, T.A. and Kjeldgaard, M. (1997). Electron density map interpretation. Meth. Enzymol. 277 , in press.
[4] The manual for this program is available at URL: http://alpha2.bmc.uu.se/~gerard/manuals/odbman_man.html
[5] Laskowski, R.A., MacArthur, M.W., Moss, D.S. and Thornton, J.M. (1993). PROCHECK: a program to check the stereochemical quality of protein structures. J. Appl. Cryst. 26 , 283-291.
[6] The manual for this program is available at URL: http://alpha2.bmc.uu.se/~gerard/manuals/o2d_man.html

Latest update at 17 July, 1997.

Newsletter contents...