BIOXHIT/CCP4: Ms 5.2.4 Extending the tracking database
This document is a place to put ideas about extending the
CCP4i tracking database (i.e. database.def) content and functions.
Additional data items
- Project title: a short (e.g. one line) user-specified
description of the project
- Project description: a longer description (like an abstract?)
that could be added by a user. Both the project title and
description could be useful if the user has a lot of projects
or is able to share them with others.
- User-agent: (or "driver application") This would be the
name of the program that acts for the user when making changes to
the database - for example, for jobs run in CCP4i the user agent
would be "ccp4i", for jobs run by XIA2 it would be "xia2" and so
on.
Some kind of versioning information could also be included.
- Subjobs or substeps: these would be smaller jobs
within a single larger job. It is likely that for example a
single run of an automated pipeline would be a "job" but that
the automated process would explicitly divide this run into
smaller jobs.
- History data: currently no history data is stored in the
tracking db, instead it is derived implicitly based on filenames.
So we could store explicit history links between jobs - there
would be two types of link:
- Data flow: that is, when job X uses data produced
from job Y, and
- Logical flow: that is, when job X follows on from
from job Y due to some kind of "application logic".
Application logic means the logic encoded in a script or
other system, which determines that one step follows another
even if there is no apparent data flow - but it could
equal apply to "procedural logic", when a human user
follows a procedure in which the steps are linked by some
logical scheme.
It is possible that we might also wish to store inferred
links (which is what is generated by the ccp4i.history
class at the moment) and broken links (links that have
been explicitly removed by the user).
- Application control file: currently the name and
location of the CCP4i parameter file is not explicitly stored.
The name of the file is generated as
jobid_taskname.def (e.g. 123_scala.def)
and is stored in the CCP4_DATABASE subdirectory of the project.
To extend the tracking to other applications means explicitly
storing this data, or at least its location.
- Logfile: similar to application control files, in CCP4i
the logfile location isn't explicitly stored at present. The name
of the logfile is generated as
job_id_taskname.log
(e.g. 123_scala.log) and is stored in the project directory.
- Notebook: similar to the application control and log files
above. The notebook data is stored in a file with the name
generated as jobid_notebook.txt (e.g.
123_notebook.txt.
- Tags: this is just an idea. Tags would allow users (or
the system?) to associate one or more arbitrary keywords with
particular jobs. These could be used for selection purposes
by other functions. Needs some thought.
- Operation type: another idea. This would allow the
application to specify the type of operation, and allow program
or script runs to be distinguished from reported jobs or
editing operations. Needs some thought.
Additional functionality
- Manipulating projects:
- Allow jobs to be moved between projects
- Allow import/export of projects
- Allow projects to be split or merged
- Allow branched projects to be synchronised part
Gathering Feedback
Possible ways to get feeback:
- Talk to Graeme Winter looks at using the system in XIA2
- Talk to Paul Emsley and Liz Potterton about using the system
in Coot and CCP4mg
[Index]
[Deliverables]