Newsletter contents... UP


Writing CCP4 scripts in PERL

E. Courcelle and J.P. Samama

Institut de Pharmacologie et Biologie Structurale, IPBS, 205 route de Narbonne, 31077 TOULOUSE, Cedex, France

e-mail : manu@ipbs.fr

I. Introduction

Handling and treatment of crystallographic data commonly requires several programs. They are provided by the ccp4 project and by other sources. The treatments include:

Although possible, interactive work is rather fastidious, inefficient and error prone, as every program requires the reading and writing of several data, log, and command files. It thus seems more appropriate to write scripts, in which all calls, input or output file names, etc., will be coded so that a single script execute all the programs. However, large scripts end up looking alike for any informatic program: according to the way they have been written, they can be more or less difficult to understand, and thus difficult to maintain or modify. We have tried to implement a tool which would permit the users to write large but nevertheless clear and easy-to-maintain scripts.


II. Choosing a scripting language

It is most common, when writing scripts which execute scientific programs, to use traditional unix shells, such as sh, ksh, csh or one of their dialects. This is a good and well-known approach, when the objective is only to chain the execution of several programs.
However, it appears that data treatments now make it necessary to use more and more complex scripts, invoking not only CCP4 programs, but also programs from other sources (particularly xplor or cns). An example of those complex scripts could be for instance running a given program in a loop and exiting from the loop only when a given condition is fulfilled. The execution of the program should be considered as a black box, the behaviour of which may be described as:

The output files of a given program are often the inputs of another one, defining some piping mechanism.
Another case arises when the script should analyze the log file, whether to extract some results and print a summary, or to take a decision based upon those results. It is thus important to be able to easily extract the information from ascii files.
In both cases, the file will be treated inside the script, and will probably not be used later. Thus its name does not have to be chosen by, or even known to the user; however one must define an algorithm for the script to choose unique file names, avoiding any conflict. It is clear that this "black box concept" is most easily implemented using object oriented programming. One thus needs a scripting language with two main characteristics:

Neither of these two characteristics is correctly implemented in traditional Unix shell languages: without even speaking of object orientation, the users have to use external unix programs, like grep, awk, etc. to extract information from ascii files. However, starting from version 5, the Perl language is nicely adapted to these constraints:


III. The occp4.pm module.

A Perl module was written which allows the user to write Perl scripts in an object oriented way. This module is loaded through a classical require Perl instruction (cf. example)

Conception of an object:

The constructor:

Before the execution of a program, an object, whose class is called occp4, is created (cf example). The parameters of the constructor are:

As noted previously, in many cases, the files created or read by the programs do not have to survive after the script: they can be treated as temporary files, so the user does not have to worry about the choice of a filename. In those cases, the special name 'CHANA' (acronym of 'CHoose A NAme for me') causes the object to compute a unique temporary name for this file. The algorithm involved is implemented by the occp4 module, in such a way that names are generated in an "orderly" manner. This helps when debugging the script. The temporary files are deleted by the destructor of the object, unless otherwise specified.

The member functions:

The behaviour of the object is then controlled by the member functions; the most important functions are quoted here:

The destructor:

The destructor is called by the system, as explained below. It essentially removes all the temporary files handled by the objects, i.e. the files whose names were declared as "CHANA". It is thus very simple to implement pipe-like mechanisms, feeding a program with the output of another program, without having to worry about intermediate file names.


IV. Playing with memory management

Perl provides a sophisticated memory management system. An appropriate knowledge of its logic is important to correctly use the occp4 objects. While the constructor is explicitly called when a new object is created, the destructor is implicitly called when the object runs beyond its scope.

The use strict instruction

In order to make your script clearer, and to be sure of the scope of your variables (and consequently of your objects), you should declare use strict at the beginning of the script (cf example); with this declaration, you are ensured that:

How to call the destructor ?

Objects in Perl are so-called "blessed" references (other languages like C refer to pointers). Working only with local variables, as noted above, the following rules may be applied in order to control the destructor call:

Thus, even if the destructor cannot be explicitly called, as in other object oriented languages like C++, you may control exactly when the destructor is called by the system, just by controlling the affectation or scopes of reference variables.


V. some examples

This section shows 3 short examples of scripts, written in Perl, with the use of the occp4.pm module. Those scripts can be copied and pasted, the numbers shown inside comments refer to the notes below each example script.

Use of two CCP4 objects

This example shows the basics of occp4.pm: The debug flag is also illustrated here.

#!/usr/local/bin/perl

use strict;
require "occp4.pm";                                     # 1


{		                                        # 2
my $obj_fft = new occp4 ('fft',
                         'hklin'=>"some_file",
	                 'mapout'=>'CHANA.map');        # 3

$obj_fft->logfile(">>file.log");                        # 4

$obj_fft->keywords('TITLE'=>"some title",
                   'LABI' =>" F1=F1 SIG1=SIGF1 F2=FC PHI=PHIC",
                   'RESO' =>"$low_res $high_res,
                   'SCALE'=>"F1 3.0 0.0 F2 2.0 0.0",
                   'BINMAPOUT'=>' ');                   # 5
die (« something wrong in fft ») if ($obj_fft->run());  # 6

my  $obj_ext = new occp4('extend',
                         'MAPIN' =>$obj_fft->iofiles('MAPOUT'),
                         'MAPOUT'=>outfile.map',
                         'XYZIN' =>"some_file.pdb");    # 7

$obj_ext->debug("RVDF");                                # 8

$obj_ext->keywords('BORDER'=>"5.");
die « something wrong in extend ») if($obj_ext->run());

}                                                       # 9
  1. Magical formulas
  2. Start a new block
  3. Alloc memory for an object:
  4. The logfile is appended to the main script log file
  5. The ccp4 keywords are entered
  6. fft is run, the script is killed in the case of an error
  7. Allocate memory for another object:
  8. Set some debugging flags:
  9. End of the block:

Use of xplor and explicit destruction of an object

This example shows:
my $obj_xpl = new occp4 ('xplor.exe');                    # 1
$obj_xpl->input_src("F");                                 # 2
$obj_xpl->input_file("file.inp");	                  # 3
$obj_xpl->logfile(">>file.log");
die "ERROR in xplor" if ($obj_xpl->run()); 
$obj_xpl=0 ;		                                  # 4
  1. Allocate memory for a new object:
  2. Select a file as the source of commands for this object : as xplor does not use the CCP4 library, the keywords/values pairs are not a relevant way of controlling the program. We prefer to write xplor commands to a file. However, the commands could have been also written to a string.
  3. The name of the file with the xplor commands in it
  4. The $obj_xpl reference is set to 0, and because there are no other reference points to our object, it will be destroyed.

How to write loops

The following example shows:
my $old_obj=0 ;		                                          # 1
while (1) {		                                          # 2
	my $xyzin ;
	if ($old_obj) {
		$xyzin = $old_obj->iofiles("XYZIN") ;
	} else {
		$xyzin = "some_initial_value" ;
	} ;			                                  # 3
my $obj = new occp4("some_program",	                          # 4
			XYZIN=>$xyzin,
			XYZOUT=> "CHANA") ;
	keywords etc.
	if (some_condition) {
		break ;		                                  # 5
	} else {
		$old_obj = $obj ;	                          # 6
	} ;
} ;				                                  # 7

my $loop_obj = $old_obj;                                          # 8

  1. Allocate memory for a reference, no object is pointed to yet.
  2. start a block, specifying an infinite loop
  3. The input file is specified as follows:
  4. Allocate memory for a new object:
  5. If the condition is verified, exit from the loop
  6. If the condition is not verified, do not exit:
  7. This iteration is terminated, we exit from the block and start a new iteration. However, the object that was pointed to by $obj is still alive, and is now pointed to by $old_obj.
  8. The loop is terminated, $old_obj is the object produced by the penultimate iteration.

VI. Module availability

occp4.pm is available on the ipbs ftp server ftp://ftp.ipbs.fr/pub/occp4. Also available at the same url is refmac.pl, a perl rewrite of a script written in ksh by Laurent Maveyraud (refmac.ksh) to perform refmac refinement with bulk solvent correction (in xplor or refmac).


Newsletter contents... UP