Consortium for Management of Experimental Data in Structural Biology

Third imgCIF workshop (new series) at BSR 2007 in Manchester and at Diamond:

The Management of Synchrotron Image Data:
Changes to the imgCIF dictionary and software, interaction with NeXus

Sponsored by DOE under grant ER64212-1027708-0011962, NSF under grant DBI-0610407 and NIH under grant 1R13RR023192-01A1

Workshop Report
Draft for Comments and Corrections

G. Winter
14 Aug 07
C. Broennimann
14 Aug 07
A. Ashton
17 Aug 07
M. Dougherty
17 Aug 07
M. Folk
17 Aug 07

The pace of data collection and the volume of data collected at synchrotron beam lines is increasing. The ACA Data, Standards, and Computing Committee spearheaded an effort to improve the efficiency of the handling and storage of these data by encouraging the adoption of common data formats and standard software interfaces. The goal of this was firstly to have the data be self defining, therefore equally accessible to data-reduction and -visualization codes. The second goal, for the purposes of secure archiving, was to provide robust internal documentation of the source of the data.

The current effort began in 2005, building on work started in the mid 1990's on a Crystallographic Binary Format (CBF) proposed by Andy Hammersley. This effort was the basis for the image-supporting Crystallographic Information Format/Crystallographic Binary Format (imgCIF/CBF). The first imgCIF/CBF workshop took place at Brookhaven National Laboratory in 1997 and proposed a format combining support for an efficient binary representation of images with a fully CIF-compliant ASCII equivalent. An imgCIF/CBF dictionary and software to support the format were created, are available on the web, and are described in Volume G of the IUCr International Tables for Crystallography. Now the community should adopt a consensus standard for management of data at synchrotron beam lines and to make it easier for users to process data taken from various beam lines. Also, as our science evolves, new concepts will be considered: possibilities include NeXus and XML.

The first workshop in the new series on "Management of Synchrotron Image Data: imgCIF File System and Beyond", was held on 22 July 2006 as part of the 2006 ACA meeting in Honolulu, Hawaii. That workshop concluded that that was "the right time for more widespread use of imgCIF ... [and that] SR sources should start writing imgCIF image files as soon as possible, employing the imgCIF dictionary already adopted by the IUCr Committee on the Maintenance of the CIF Standard (COMCIFS) and published on the web and in International Tables Volume G.] " [from the report of the workshop, see http://www.medsbio.org/meetings/ACA_2006_WK02_Report.html].

Subsequent to the Hawaii workshop intensive work was started in response to these recommendations. Both the imgCIF dictionary and the supporting software library were reviewed and, after meetings at SLS and ESRF, extended. See SLS_report.html and ESRF_report.html for more information on those meetings. The work continued in collaboration with members of the community (see the imgCIF mailing list http://www.iucr.org/iucr-top/cif/cbf/imgcif-l/.

In light of this activity a workshop on data formats for synchrotron image data was held after the NSLS/CFN meeting on 24 May 2007 at BNL in the Biology Department. Topics discussed included proposed extensions to imgCIF, the use of NeXus, progress on software and the status of imgCIF at Diamond and at SLS. That workshop concluded that work was needed on support for the handling of uncorrected images and true bitmaps, creation of a utility to "tidy" CIFS, creation of an agreed interface with XML, NeXus and HDF, clarification of the specification of the relationship between detector specification and the physical locations of pixels in the laboratory, tags for robotics and remotes access, and the creation of more cookbooks. See http://www.medsbio.org/mettings/BNL_May07_imgCIF_Workshop_Report.html.

In the short time between the second and third workshops, work on many of the items on this task list began. In addition, building on discussions with J. Steinbrener at the second workshop, discussions with Matt Dougherty on the image needs in the microscopy community, and, through Matt Dougherty, with Mike Folk of the HDF Group on techniques for integration between imgCIF and NeXus, HDF and XML were started. Work on CBFlib continued and version 0.7.8 was released.

The third imgCIF workshop was held in two sessions at BSR 2007 in Manchester and at Diamond. Herbert Bernstein and Alun Ashton organized this workshop. The purpose of this workshop was to provide a review of the status of imgCIF and CBFlib for the European user community and to discuss the integration of imgCIF with NeXus, HDF and XML. The Manchester session was used for an introduction and review of the status of imgCIF and for some discussion. The Diamond Light Source session was used to deal with more detailed technical issues and further discussion.

The first session was in Manchester on Tuesday, 14 August 2007. It provided an introduction to imgCIF and NeXus and a brief review of current progress. This session took place in the Barbirolli Room at Bridgewater Hall from 12.45 to 13.45. The session began with a review of the status of imgCIF by H. Bernstein, followed by a review of the status of use of imgCIF with the Pilatus 6M detector at the Swiss Light Source by C. Broennimann and a presentation of a developer's perspective on imgCIF by G. Winter. The major discussion at this session was about what information, in what format whould be in the header of an image presented as a CBF.

The second session was at the Diamond Light Source (DLS) on Friday, 17 August 2007, to discuss recent changes in the imgCIF dictionary and software and the interaction with NeXus. This session took place in room 1.17 of Diamond House. It began with short presentations and some discussion from 13:15 to 14:20. Then there was a break, and more substantive discussions continued from 14:30 until about 17:15. The presentations began with H. Bernstein giving a recap of the status of imgCIF and proposals on versioning and integration between imgCIF and NeXus. A. Ashton and C. Nielsen then presented various aspects of the use of imgCIF at DLS. M. Dougherty presented the status of data formats in cryo-electron microscopy. M. Folk reviewed the capabilities and status of HDF.

The Manchester session was attended by 20 people, 14 of whom were formally registered, and the DLS session was attended by 14 people, 11 of whom were able to remain for the full discussions. A total of 28 people participated in one or both sessions.

The discussions at both sessions were constructive and productive and, at times, passionate. While there were sharp disagreements, they were over how best to use imgCIF, not over whether to use imgCIF, and there seemed to be the reasonable propsect that eventual compromise and consensus will be achived after further discussions and email exchanges. The major points of discussion and their status were:

(The pre-workshop agenda and charge is at http://www.medsbio.org/meetings/BSR_2007_imgCIF_Workshop.html).

The agenda for the workshop as held was

