Changes to Database Handler in CCP4I
Peter Briggs CCP4, 24th July 2002
Current Status:
- CCP4i creates a database for each project
- The "database" is a flat file in CCP4i parameter file format (.def) file
- Stores project history information (taskname, date, status, title, plus references to input/output files and locations) for each job in database
- Database handler - which reads/writes database.def files - is embedded in the main CCP4i process (figure 1)
- Handles requests to: register new job; get/set information for registered jobs (including updating status of job & input/output files); delete job records.
Aims and Approach:
- To separate the database handler process from main CCP4i process (figure 2)
- Use a client-server model with sockets for inter-process communication (standard issue Tcl)
- CCP4i main process & its children (scripts) interact with database only via db handler
Motivations:
- Allows processes other than CCP4i to communicate directly with the project history database e.g. Molecular Graphics or MOSFLM
- More robust than existing implementation (e.g. avoid problems with "unsaved" database information being lost before being "committed" to the database file from the main CCP4i process)
- Allows different database back-ends to replace database.def in future (e.g. mySQL) without significant changes to the rest of CCP4i
- Suitable for extension to a distributed computing environment
Issues:
- Do we start a new db handler process for each CCP4i process, or a new db handler for each database?
- Locking issues: how should multiple processes be allowed access (read-write, read-only) to the same database (queuing/lock-grab/lock-out)?
- Should subprocesses (i.e. scripts) initiated by the main CCP4i "controller" process communicate directly with the db handler, or via the controller? (Important if the controller is separated from the main CCP4i GUI)
- Will the scope of information stored in the database expand in future? (in which case we may need an API to the db handler which can accommodate a broader range of requests than at present)
- What are the security implications in a distributed computing environment?