Institut de Pharmacologie et Biologie Structurale, IPBS, 205 route
de Narbonne, 31077 TOULOUSE, Cedex, France
e-mail : manu@ipbs.fr
Handling and treatment of crystallographic
data commonly requires several programs. They are provided by the
ccp4
project and by other sources. The treatments
include:
Although possible, interactive work is rather fastidious, inefficient and error prone, as every program requires the reading and writing of several data, log, and command files. It thus seems more appropriate to write scripts, in which all calls, input or output file names, etc., will be coded so that a single script execute all the programs. However, large scripts end up looking alike for any informatic program: according to the way they have been written, they can be more or less difficult to understand, and thus difficult to maintain or modify. We have tried to implement a tool which would permit the users to write large but nevertheless clear and easy-to-maintain scripts.
It is most common, when writing scripts which execute scientific
programs, to use traditional unix shells, such as sh
,
ksh
, csh
or one of their dialects. This is
a good and well-known approach, when the objective is only to
chain the execution of several programs.
However, it appears that data treatments now make it necessary to
use
more and more complex scripts, invoking not only CCP4 programs, but
also programs from other sources (particularly
xplor
or cns
). An example of those complex
scripts could be for instance running a given program in a loop and
exiting from the loop only when a given condition is fulfilled. The
execution of the program should be considered as a black box, the
behaviour of which may be described as:
The output files of a given program are often the
inputs of another one, defining some piping mechanism.
Another
case arises when the script should analyze the log file, whether to
extract some results and print a summary, or to take a decision based
upon those results. It is thus important to be able to easily extract
the information from ascii files.
In both cases, the file will be treated inside the script, and will
probably not be used later. Thus its name does not have to be chosen by,
or even known to the user; however one must define an algorithm
for the script to choose unique file names, avoiding any conflict. It
is clear that this "black box concept" is most easily implemented
using object oriented programming. One thus needs a scripting
language with two main characteristics:
Neither of these two characteristics is correctly implemented in
traditional Unix shell languages: without even speaking of object
orientation, the users have to use external unix programs, like
grep
, awk
, etc. to extract information from
ascii files. However, starting from version 5, the Perl language
is nicely adapted to these constraints:
occp4.pm
module.
A Perl module was written which allows the user to write Perl scripts
in an object oriented way. This module is loaded through a classical
require
Perl instruction (cf. example)
Before the execution of a program, an object,
whose class is called occp4
, is created (cf example). The parameters of the constructor are:
ccp4
program
As noted previously, in many cases, the files
created or read by the programs do not have to survive after the script:
they can be treated as temporary files, so the user does not
have to worry about the choice of a filename. In those cases, the
special name 'CHANA' (acronym of 'CHoose A NAme for me') causes the
object to compute a unique temporary name for this file. The algorithm
involved is implemented by the occp4
module, in such a
way that names are generated in an "orderly" manner. This helps when
debugging the script. The temporary files are deleted by the
destructor of the object, unless otherwise specified.
The behaviour of the object is then controlled by the member functions; the most important functions are quoted here:
iofiles
, iofildel
to specify the
input-output pairs of (logicals, file names), after the
constructor has been called, or to retrieve the file name, if
chosen by the system.
keywords
, keywrep
,
keywdel
to generate and modify a list of pairs of
(keywords, values) controlling the behaviour of the
program, as described in the documentation.
input_src
, input_string
,
input_file
: these functions allow the user to
let the object read its control script not only from the list of
(keywords,values), but also from a string or from a file: they
are particularly useful when dealing with programs which do not
belong to ccp4
.
logfile
allows the user to know
the name of the (temporary) logfile computed by the object. It is also
possible
to force the object to use another log file. Forcing all objects
to share the same logfile allows the user to maintain and keep an
unique logfile for the whole script.
debug
controls the messages
printed by the object
at the time of key moments in its life (mainly when the program is
executed or when the object is destroyed).
run
starts the program. The
execution code of the program is returned, so
that error handling is very easy to achieve.
The destructor is called by the system, as
explained below. It essentially removes all the temporary files
handled by the objects, i.e. the files whose names were declared as
"CHANA
". It is thus very simple to implement pipe-like
mechanisms, feeding a program with the output of another program,
without having to worry about intermediate file names.
Perl provides a sophisticated memory management
system. An appropriate knowledge of its logic is important to
correctly use the occp4
objects. While the constructor
is explicitly called when a new object is created, the
destructor is implicitly called when the object runs beyond
its scope.
use strict
instruction In order to make your script clearer, and to be
sure of the scope of your variables (and consequently of your objects),
you should declare use strict
at the beginning
of the script (cf example); with this
declaration, you are ensured that:
Every variable must be declared with the my
instruction before being used
Every variable declared with the my
instruction is
local to the block; this insures that the memory
allocated for this variable is given back to the system at the
end of the block
Objects in Perl are so-called "blessed" references (other languages like C refer to pointers). Working only with local variables, as noted above, the following rules may be applied in order to control the destructor call:
When an object is created (calling its constructor with an
instruction like $obj = new occp4(...)
):
some memory allocation is performed for the object
the object is initialized
the constructor returns a reference pointing to the created object
The reference is stored inside $obj
It is legal to write instructions like
$o1=$obj
. Only the reference is copied then, the object
itself is not duplicated.
If the scope of $o1
is longer than the scope of
$obj
, the object is still alive, even when
$obj
is destroyed. The exact rule is: "the object is
alive as long as there is at least one reference pointing
towards him". The object will be destroyed, and thus the destructor
will be called, when the last reference pointing to
this object is suppressed, or when it will
be assigned again. For instance with the instruction $obj
= 0
. See the third example for a
short illustration of this property.
Thus, even if the destructor cannot be explicitly called, as in other object oriented languages like C++, you may control exactly when the destructor is called by the system, just by controlling the affectation or scopes of reference variables.
This section shows 3 short examples of scripts, written in Perl, with
the use of the occp4.pm
module. Those scripts can be
copied and pasted, the numbers shown inside comments refer to the
notes below each example script.
occp4.pm
:
use
instructions and other magical formulas
debug
flag is also illustrated here.
#!/usr/local/bin/perl use strict; require "occp4.pm"; # 1 { # 2 my $obj_fft = new occp4 ('fft', 'hklin'=>"some_file", 'mapout'=>'CHANA.map'); # 3 $obj_fft->logfile(">>file.log"); # 4 $obj_fft->keywords('TITLE'=>"some title", 'LABI' =>" F1=F1 SIG1=SIGF1 F2=FC PHI=PHIC", 'RESO' =>"$low_res $high_res, 'SCALE'=>"F1 3.0 0.0 F2 2.0 0.0", 'BINMAPOUT'=>' '); # 5 die (« something wrong in fft ») if ($obj_fft->run()); # 6 my $obj_ext = new occp4('extend', 'MAPIN' =>$obj_fft->iofiles('MAPOUT'), 'MAPOUT'=>outfile.map', 'XYZIN' =>"some_file.pdb"); # 7 $obj_ext->debug("RVDF"); # 8 $obj_ext->keywords('BORDER'=>"5."); die « something wrong in extend ») if($obj_ext->run()); } # 9
fft
hklin
is some file already created
mapout
is a temporary file, whose name is
chosen by the object
ccp4
keywords are entered
fft
is run, the script is killed in the case of an
error
extend
mapin
file is the mapout
file
of the previous fft
object
mapout
file is a definitive one.
$obj_fft
and read by
$obj_ext
) are removed.
mapout
file) are kept.
$obj_fft
is not removed, because the
'F' debug flag was set for this object; however, a message
is printed about this file.
xplor
my $obj_xpl = new occp4 ('xplor.exe'); # 1 $obj_xpl->input_src("F"); # 2 $obj_xpl->input_file("file.inp"); # 3 $obj_xpl->logfile(">>file.log"); die "ERROR in xplor" if ($obj_xpl->run()); $obj_xpl=0 ; # 4
xplor.exe
, which
must be present in the path
$obj_xpl
reference is set to 0, and because
there are no other reference points to our object, it will be
destroyed.
my $old_obj=0 ; # 1 while (1) { # 2 my $xyzin ; if ($old_obj) { $xyzin = $old_obj->iofiles("XYZIN") ; } else { $xyzin = "some_initial_value" ; } ; # 3 my $obj = new occp4("some_program", # 4 XYZIN=>$xyzin, XYZOUT=> "CHANA") ; keywords etc. if (some_condition) { break ; # 5 } else { $old_obj = $obj ; # 6 } ; } ; # 7 my $loop_obj = $old_obj; # 8
$old_obj
reference so that it points to the
last created object. This prevents the destruction of
$obj
, although we are going to start a new
iteration.
$old_obj
is now
destroyed, because no other reference is pointed towards it.
$obj
is still alive, and is now pointed to by
$old_obj
.
$old_obj
is the object
produced by the penultimate iteration.
occp4.pm
is available on the ipbs ftp server ftp://ftp.ipbs.fr/pub/occp4.
Also available at the same url is refmac.pl
, a perl rewrite
of a script written in ksh
by Laurent Maveyraud (refmac.ksh) to
perform refmac refinement with bulk solvent correction (in xplor or
refmac).