Automation standards and frameworks: from data reduction to structure
This document aims to outline the specific questions that each of the workshop
sessions will try to answer. See
http://www.ebi.ac.uk/msd-srv/docs/bioxhit05_1.html
for information on the workshop.
1. Standards for Frameworks for Automation
This session will focus on technologies rather than on the science - we are
interested in the similarities and differences between developments.
Ideally each talk will give a description of the pipeline
(e.g. is a script, or a set of daemons, or something else) including the
language(s) that it is implemented in, and then focus on answering the
following questions:
- Inter-component communications:
how do components within the pipeline communicate with each other?
- Abstraction of actions: how are
tasks expressed in a generic fashion, that is: how are tasks described in
a way which separates the specific program (e.g. SCALA) from the generic
task (e.g. scaling)?
- Action prerequisites: how do
components get the right data to run? Where is the data stored? How are
decisions made? How is the decision-making expertise stored and accessed?
- Action results: how are
results reported to the "end user"? Do the results get stored anywhere?
If so, where, how and for how long?
The aim of this session is to make recommendations to answer
these questions for future developments.
2. Standards for data exchange between computational units in the structure determination software pipeline
This session will focus on defining a practical hierarchy of
functional blocks within the software
pipeline. A "functional block" could be a single program or a section of
pipeline. It could itself be made up of other smaller functional blocks (e.g.
functions within programs or smaller sections of pipeline).
The point of doing this is to define the interfaces between
functional blocks as this is across these interfaces that data will need to be
transferred. For each functional block we need to consider what information is
required for:
- Decision
making and feedback - that is, what do you need to know to go into a block
and what do you learn from doing it?
- Data
transfer (including metadata e.g. expected number of heavy atoms - what we
mean is, things which are known or required throughout the process or
parts of the process, e.g. sequence, spacegroup,
cell parameters etc)
- Archiving
and deposition - in particular what is missing from what we already store?
Are there issues to do with storing data which can easily be regenerated?
Can we anticipate requirements for future applications (e.g. data mining?)
- Definitions
of "success" and "failure" - what do these mean?
- We
need to consider issues of transferring ambiguous data for example spacegroup
names can have many different
representations
The session should also consider how much of this is covered
by existing standards.
The aim of this session is to provide a set of requirements which
can be used as the basis for or input to the BIOXHIT Data Exchange Standards
(task 5.1.2).
3. Toolboxes for Automation
This session will look at existing toolbox developments.
Each talk should describe the toolbox and should include:
- What problems did you set out to solve with the toolboxes?
- What did you learn from writing them?
- What would you do differently and what's missing?
The discussion should then focus on what functionality the
pipelines covered on day 1 need, and where they obtained this functionality
from - for example, using an existing toolbox, using existing programs from a
general software suite, or writing their own functions or "jiffy" utilities.
This would be a change to the current programme.
- What
functionality is required to make their pipelines work?
- Where
is this functionality already provided? Which
functions have you had to provide yourself?
- What
languages and approaches are you using?
The aim of this session is to summarise the answers to these
questions to feed into the report on a software toolbox for automation (BIOXHIT
task 4.7.1).
Outcomes of this meeting
Report from each section of the meeting:
- From day 1: Report on the existing
pipeline developments, comparison of technologies and recommendations for
pipeline frameworks
- From day 2: Report on the
"functional blocks" and the interfaces as a set of requirements to be fed
into the BIOXHIT data model for data transfer
- From day 3: Report on the existing
toolboxes to be fed into the BIOXHIT report on toolbox requirements
We should also agree on follow-up meetings and other actions
as part of this meeting. It is suggested that we add a short wrap-up session at
the end of the third day to set deadlines for the next steps.