Contents
IntroductionBasic Python
Advanced Features
CCP4 Molecular Graphics Documentation | ||
Python Scripting |
Documentation Contents | On-line Documentation | Tutorials | CCP4mg Home |
The picture defintion file written out by CCP4mg uses a simple limited set of Python features which are described below in the Basic Python section. Some more features which might be required to create a more sophisticated script are described under Advanced Features. If you can not find what you need in the documentation or you need a little help with picture definition scripts please contact the developers stuart.mcnicholas@york.ac.uk.
There is plenty of documentation and tutorials at python.org.
Python Essential Reference by David M. Beazley is a good, concise introduction. It may be too concise for novice programmers who might prefer...
File editors such as emacs can work in Python mode to colour text according to its role in the Python syntax. This can be very helpful when editing a file.
For simple scripts all lines should begin at the left margin with no spaces or tabs etc.. The error message:
Invalid syntax in line
may be due to wrong indentation. In more complex scripts Python uses left-hand indentation to indicate the grouping of lines for loops and conditional statements.
Everything on a line after a hash # is treated as a comment and ignored by the program. This does not apply to hashes within quotes.
Commands may be split over two or more lines by terminating the first line with a line-continuation character \. Beware any spaces after the continuation character will also result in a syntax error message. The line-continuation character is unnecessary if the break is between anything enclosed in parentheses (...), brackets [...] or braces {...}. Note that in the documentation examples and in the picture defintion files written by CCP4mg the items within parentheses or braces are ofter split one item per line without any line-continuation character.
A number specified without a decimal point such as 789 will be treated as an integer and a number such as 789.12 will be treated as a floating point number. In most context in the picture defintion file it does not matter if integer and floating point numbers are interchanged but try to use the appropriate form.
True or or false are indicated by the integer 1 (true) or 0 (false).
Text strings can be enclosed in single (') or double (") quotes. Triple quotes (''' or """) can be used to enclose text that includes newlines and single or double quotes. For example
label = '''This is O2' or O2* if you prefer'''
A list is enclosed in square brackets with comma separated elements. The elements in a list can be a mixture of integer, float or strings for example:
mylist = [ 'hello', 789, 3.14 ]
Within the picture definition file the obvious use of a list is in specifying a file:
filename = ['demo', '2ins.pdb', '/home/lizp/demo_data/2ins.pdb']
The filename list has three string items. The order of the elements in a list is important. To access one particular element in a list use the syntax:
filename[1]
The index of the required element is given in square brackets immediately after the identity of the list. There is a big catch here: in many programming languages, inclding Python, the first item in a list is considered to be 0 (zero). So the items in the filename list have indices 0,1 and 2 and in the example above filename[1] has the value '2ins.pdb'.
A dictionary is a group of items like a list but in a dictionary each item has a name (called a key) and the order of items is unimportant. For example, a dictionary is used to define a font:
font = { 'weight' : 'bold', 'slant' : 'i', 'family' : 'utopia', 'size' : '14' }
The dictionary definition is enclosed in braces. Within the picture definition file the keys are always one-word strings which are enclosed in quotes (eg 'family' or 'weight'). The value of each item can be an integer, a float, a string or a list and these can be mixed within any one dictionary (note that the values in the font example all happen to be strings). Note that a colon (:) and not an equals (=) separate the key and the value.
In CCP4mg documentation we usually talk about 'objects' and 'types of object' such as MolData or HBonds. In Python terminology the 'types of object' are called 'classes' and the individual objects of a given type are termed 'instances of a class'. For now I will continue to use the CCP4mg terminology of 'object' and 'type of object'. An example of the syntax to create an object in the picture defintion file:
Legend ( text = 'This is a legend <br>', y = 0.29, x = 0.3, font = { 'weight' : 'bold', 'slant' : 'i', 'family' : 'utopia', 'size' : '14' }, text_colour = 'complement' )
The new object will be a Legend object and the text, x, y, font and text_colour attributes are defined for the object. Note that the attribute names are not in quotes and the attribute name is separated from its value by an equals sign. This syntax is different from that of dictionaries. One of the attributes for this object is font which is a dictionary which follows the rules for dictionary syntax.
The identity of an object can be thought of as a pointer to the objects place in the computer memory. In the simple scripts we have looked at so far when an object is created its identity is not saved. In more complex scripts we may needed to ask the object questions about itself or give it commands and to do this we need to save its identity when it is created. For example you may want to find out what chains and monomers are in the data associated with a model data object:
model_id = MolData ( filename = [ 'demo', '2ins.pdb', '/home/lizp/demo/2ins.pdb'] ) chain_list = model_id.get(mode='chains') monomers_list = model_id.get(mode='monomers')
In this example model_id is the name (or identity) of the MolData object, i.e. its place in memory. This name can be used with the appropriate methods for that type of object to get information or to do something to the object. In the example the name 'model_id' is used with the get method to get some information. It is used twice, first to get the list of chains in the model and, secondly, to get a list of monomers. Note the syntax used is the object name and the method ('get') separated by a dot. After the method are arguments in parentheses. The syntax for the arguments is the same as for specifying the object attributes (see Objects above) i.e. the name of the argument (not in quotes), an equals sign, and the value for each argument. The 'get' method has one argument called 'mode', the value of this determines what information is returned. The returned information is saved with the names 'chain_list' and 'monomers_list'. The names model_id, chain_list and monomers_list are arbitary; you can use any name that you like for the object or the information with a few limitations given in the next section.
There are a few limitations on the identifiers (i.e. names of objects or data etc.). Identifiers must be one word of either alphanumeric characters or underscores. The first character can not be a number.
Sometimes rather than repeat identical blocks of script multiple times in a file it is better to have one block of script and to 'loop' through it multiple times. For example to load three different pdb files but to write the appropriate commands only once:
pdbname_list = ['1xxx.pdb', '2xxx.pdb', '3xxx.pdb'] for pdbname in pdbname_list: MolData( filename = [ 'MYPROJECT' , pdbname, '' ] )
First in this script a list called 'pdbname_list' is defined; it is a list of three pdb files - note that these are text strings so they are in quotes. Then the statement 'for pdbname in pdbname_list:' says to do the subsequent script for each 'pdbname' in the list 'pdbname_list'. Note that the statement ends with a colon and the subsequent line(s) that will be repeated each time round the loop are indented (usually by 3 characters). In this example there is only one line inside the loop which creates a MolData object with a filename whose project directory is 'MYPROJECT' and whose file name is whichever pdbname is set for the time around the loop. Note that the pdbname used in specifying the filename is not in quotes!
Sometimes what you want to do is dependent on some factor; for example when laoding a PDB file the required display objects may depend on whether there are any ligand monomers as in the following script. Note the use of hashes to start 'comment lines'.
# A short list of PDB files that can be found in the # ccp4mg/tutorial/data directory pdbname_list = ['1df7.pdb', 'rnase.pdb'] for pdbname in pdbname_list: model_id = MolData(filename = [ 'ccp4mg_tutorial' , pdbname, '' ] ) #Create one display object to show the protein ribbons MolDisp ( colour = 'bychain', style = 'SPLINE', selection_parameters = { 'select' : 'amino_acid' } ) # If there are any monomers display them as ballnstick monomers_list = model_id.get(mode='monomers') if len(monomers_list)>0: MolDisp ( colour = 'atomtype', style = 'BALLSTICK', selection_parameters = { 'select' : 'nopeptide', 'monomers' : monomers_list } )
In this script a short list of pdb files are defined as pdbname_list and then we loop over ever pdbname in that list. For each pdbname a MolData object is created and a MolDisp object which shows the 'amino_acid' as ribbons. We then query if there are any monomer ligands using the get method. Then comes the conditional part of the script. We use the function len (short for length) to find out how long the monomers_list is. If the length of monomers_list is greater than zero then we create another MolDisp object to show the monomers.
Another trivial use of conditional statements that could be tagged on the end of the previous script to write the number of monomers found:
if len(monomers_list)< 1: print pdbname,"has no ligands" elif len(monomers_list)== 1: print pdbname,"has one ligand:",monomers_list[0] elif len(monomers_list)== 2: print pdbname,"has two ligands:",monomers_list[0],'and',monomers_list[1] elif len(monomers_list)> 2: print pdbname,"has lots of ligands",monomers_list
In this this example the first test is if the length of the monomer list is less than 1; then elif (short for else if) is used make further tests (is the length of the monomer list equal to 1 or 2 or is it more than 2) Note that the elif test will only be applied if the preceding tests have failed. If a test is successful then the appropriate lines of script after the test are executed. In this example the print command is used to write out the name of the pdb file (pdbname) and information on the number of ligands.
The print statement can be used to write information which will appear ??????. The print statement is followed by a comma-separated list of objects which can of any type (integer,float,string,list or dictionary).