Author: Peter Briggs
Revision: 0.3 Date:31/01/2003
Provide an API to the CCP4i project history database which can be accessed by: the main CCP4i process; independent processes started by the main CCP4i process (``jobs''); external (non-CCP4i) applications.
In the first instance this must reproduce the existing functionality within CCP4i v1.3.8 for interacting with the database. It must be able to interact with the existing format of the database (currently a flat format CCP4i-parameter or "def" file). However it should also be extensible, that is, be able to accommodate a wider scope of commands and to be easily extended to use different database backends.
Security and authentication are issues which will need to be addressed if the server is accessible over a network. These issues have not yet been explored.
Requirements for CCP4i
The main CCP4i process needs both read and write access to the database, as it needs to be able to register new jobs, set/edit associated information, and delete jobs from the database, as well as requesting information on the current state of the database. It must be able to handle ``update'' messages from the database handler when the status of the database changes.
Requirements for Running Scripts (``jobs'')
Running jobs need to be registered with the handler as part of their startup, but after this point only require limited write access (to add output files, register script termination) and no read access to the database.
Requirements for Non-CCP4i Applications
Applications may be written in any language, so the protocol for exchanging requests/information between the handler and the applications needs to be as generic as possible.
Requirements for accessing other Databases
These are not currently known.
The database for a given project is stored on disk as a flat file in CCP4i parameter file format (".def"). The data in the file is read into an array in the main CCP4i process when the project is first opened by the user. Subsequent queries or actions on the database is made via the array held in memory, which is periodically written out back to the file.
Functions for interacting with the database information are tightly embedded in the code for the main CCP4i process. In some cases general database functions are mixed together with CCP4i-specific code, thus the full range of functionality is not easily accessible by non-CCP4i applications.
The current implementation does not meet all the requirements, for example it doesn't provide an API which can be used easily by other applications.
The current prototype solution uses a separate database handler process. Application programs interact with the database via socket requests made to the database handler process. This addresses the issues of networked operation, and removes the requirement for multiple language-specific APIs.
Other possible solutions (currently not under consideration):
The provisional name for the database handler process is DbCCP4i.
The current model of DbCCP4i consists of three basic sets of components:
A DbCCP4i process can be started either by the main CCP4i process, or separately e.g. via a user command. Each CCP4 project history database should only be accessed by a single DbCCP4i at any one time (this could be controlled via lock files), though a single DbCCP4i could access more than one project database - for example, a user is browsing the project history data in one project, but also has running jobs which are registered in a second project. To conserve system resources each user should only have one DbCCP4i running at anytime.
Ultimately it should be possible to allow several users to access the same database simultaneously via a single DbCCP4i.
Processes which wish to access the database must first register themselves with the DbCCP4i - in a secure environment this should include some authentication procedure. Registered processes may have different interaction requirements, for example: the main CCP4i process needs to able to read and write to the database, but it also needs to know when the database content changes (so it can update its display); whereas running jobs only need to send information to the database (new output files, job finishes or fails) and do not need to be informed of updates. There will need to be an "update" mode whereby the DbCCP4i broadcasts notice of an updated database to all connected processes which have registered an interested in knowing when certain types of database content have changed.
There needs to be a mechanism for new processes to detect and connect to an existing DbCCP4i when trying to access a project database - possibly through information stored in the lock file.
DbCCP4i processes should have a number of different persistence modes. Initially the DbCCP4i will persist as long as it has still has registered processes (processes should unregister themselves on shutdown). It should also be possible to leave the DbCCP4i running indefinitely and have processes connect and disconnect as required. There also needs to be a way to cleanly shutdown a DbCCP4i e.g. via a user command.
A prototype version of dbCCP4i now exists which has been built on top of CCP4i 1.3.9.