From henrick@ebi.ac.uk Thu Jul 7 15:53:34 2005 Date: Thu, 7 Jul 2005 15:33:12 +0100 (BST) From: Kim Henrick To: P.J.Briggs Cc: Peter Briggs , Martyn Winn , Keith Wilson Subject: Re: quick question on data models (fwd) Peter we ship data to the rcsb via mmcif this is from a combination of dictionaries the official mmcif, the pdbx (the 2nd contains the mol replacement, the ccp4 and other additions like powder, soln scattering etc etc etc) and the nmr mmcif - these are mapped out of the relational database - we also both completely ignore certain mmcif items as irrelevent and never populated - the chemistry is handled in exchange mmcif's as ERF's -- agin all mapped in and out of the databases with tools to do the mapping as in http://www.ebi.ac.uk/msd/documentation/doc2.html oracif works both ways - we translate the mmcif into a relational form for ccp4 pims ehtpx we would rather people use xml but we also map the pdb format as from shelx and refmac we do not want to see ccp4 use the mmcif --> xml as the official pdbML as it is an exact copy of the mmcif in xml and this has the same lack of correct parent child relationships - although if people decide to use this xml we will cope ccp4/automation should use a different history database but if you want something like xtrack then as that is final product the msd is the final product ehtpx/ispyb have schemas that match ccp4 harvest for up to mosflm/scala/truncate but no more (and enough history to get started with) ANT and phenix has the history part to some degree open mm is an sql/java for mmcif is an rcsb product but the californian rcsb not the new jersey rcsb - each uses a diff database - the New Jersey use a c++ version of mmcif to load into a diff relational database as our database schemna is derived from oracle tools/designer2000 this is not really uml fro pretty pictures it is really sql, plsql and f77+imbedded sql, c with emmbedded sql and some c with the oracle OCI low level api - newer stuff is all java for the warehouse api the archive is normalised the warehouse is simple and java if ccp4 is in control of input through ccp4i and ccp4 i/o then you can use the warehouse which will be much easier and faster as the data is in effect 'clean' and consistent as ccp4 programs will always do the i/o and refmac will know what an ALA is if you want to deal in i/o from non-ccp4 programs like CNS/XPLOR/SHELX then you will not be incontrol of what an ATOM is nor what for example a FOM is in MIR or MOLREP etc and you will need the normalised archive to make data consistent actually using an individual user based on a localised set of XML with java and doing away with a relational database is better at the user level storage - for a site/institute/group storage level then the sql is better - the xml/sql can be made interchangeable for ccp4/ccp4i etc picking a user storage area is much quicker and easier to do - add on a totally separate tool to pool data from many users into a database only when you need to mine the data - individual histories through solution to deposition would be easier to manage at the user level and this is obviously an xml level both the xml and the sql can be identical in structure and meaning - you have to decide this early and what you actually want and expect users to cope with regardless the MSD database schema's are available and have everything but the history will the history part ever be used on a wide enough scale to be worth the extra effort ???????? kim On Thu, 7 Jul 2005, P.J.Briggs wrote: > > Hi Kim > > Martyn wanted to clarify something that you wrote to Colin Nave and that > was copied to us. From your message to Colin: > > > I have asked why they are doing this when they know > > full well the MSD has made the complete Xray experiment > > model and is continuing to work with ccp4 and other groups > > on the data models for Xray and our data model is freely > > available and already in part adopted by ccpn for the chemistry > > Martyn's questions are: > > On Thu, 7 Jul 2005, Martyn Winn wrote: > > > Kim's response to Anne's data model is along the > > lines of EBI already have this. I think he means > > the EBI back-end database (i.e. SQL data model?) > > that they use in preference to mmCIF. > > > Do EBI talk to RCSB in terms of SQL or mmCIF? > > It seems that you are the best person to clarify what you meant by your > original statement, and what the EBI are using to talk to the RCSB. > > Thanks, best wishes > > Pete > > -- > _____________________________________________________ > Peter J Briggs, pjx@ccp4.ac.uk Tel: +44 1925 603826 > CCP4, ccp4@ccp4.ac.uk Fax: +44 1925 603825 > http://www.ccp4.ac.uk/ > Daresbury Laboratory, Daresbury, Warrington WA4 4AD > Kim HENRICK henrick@ebi.ac.uk ::telephone: +44 (0) 1223 494629