Consortium for Management of Experimental Data in Structural Biology
Fifth imgCIF workshop (new series)
in Osaka on Tuesday 26 August 2008 at the XXI Congress of the IUCr

Informal Technical Discussions and Review of imgCIF Status for attendees at the XXI Congress of the IUCr

Herbert J. Bernstein, yaya@dowling.edu
Robert M. Sweet, sweet@bnl.gov
The new imgCIF workshop series has been sponsored in part by DOE under grant ER64212-1027708-0011962, NSF under grant DBI-0610407, and NIH under grant 1R13RR023192-01A1.

Workshop Report

This is a report on the fifth and final workshop in the new series of imgCIF workshops that began with a workshop at the summer 2006 meeting of the American Crystallographic Association. This workshop consisted of informal technical discussions to follow up on some of the work done in the fourth imgCIF workshop at BNL in May 2008 and on one-on-one discussions at the ACA meeting in Knoxville, TN in June 2008. In addition to helpful discussions to bring people up to speed on recent developments, the major products of this meeting were agreements on a minimal set of required tags for a valid imgCIF file and a magic number scheme to clearly identify imgCIF files that do not provide this set of tags.

Workshop Participants (left to right): Petr Salficky, Kay Diederichs, Jeffrey R. Deschamps, Harry Powell, James Hester, I. David Brown, Michael Blum, Chris Nielsen, John Westbrook, Herbert J. Bernstein, Georgi Darakev, Joseph D. Ferrara, Christian Broennimann, Clemens Schultze-Briese. Frances C. Bernstein behind the camera.

The pace of data collection and the volume of data collected at synchrotron beam lines is increasing. The ACA Data, Standards, and Computing Committee spearheaded an effort to improve the efficiency of the handling and storage of these data by encouraging the adoption of common data formats and standard software interfaces. The goal of this was firstly to have the data be self defining, therefore equally accessible to data-reduction and -visualization codes. The second goal, for the purposes of secure archiving, was to provide robust internal documentation of the source of the data.

The current effort began in 2005, building on work started in the mid 1990's on a Crystallographic Binary Format (CBF) proposed by Andy Hammersley. This effort was the basis for the image-supporting Crystallographic Information Format/Crystallographic Binary Format (imgCIF/CBF). The first imgCIF/CBF workshop took place at Brookhaven National Laboratory in 1997 and proposed a format combining support for an efficient binary representation of images with a fully CIF-compliant ASCII equivalent. An imgCIF/CBF dictionary and software to support the format were created, are available on the web, and are described in Volume G of the IUCr International Tables for Crystallography. Now the community should adopt a consensus standard for management of data at synchrotron beam lines and to make it easier for users to process data taken from various beam lines. Also, as our science evolves, new concepts will be considered: possibilities include NeXus and XML.

The first workshop in the new series on "Management of Synchrotron Image Data: imgCIF File System and Beyond", was held on 22 July 2006 as part of the 2006 ACA meeting in Honolulu, Hawaii. That workshop concluded that that was "the right time for more widespread use of imgCIF ... [and that] SR sources should start writing imgCIF image files as soon as possible, employing the imgCIF dictionary already adopted by the IUCr Committee on the Maintenance of the CIF Standard (COMCIFS) and published on the web and in International Tables Volume G.] " [from the report of the workshop, see http://www.medsbio.org/meetings/ACA_2006_WK02_Report.html].

Subsequent to the Hawaii workshop intensive work was started in response to these recommendations. Both the imgCIF dictionary and the supporting software library were reviewed and, after meetings at SLS and ESRF, extended. See SLS_report.html and ESRF_report.html for more information on those meetings. The work continued in collaboration with members of the community (see the imgCIF mailing list http://www.iucr.org/iucr-top/cif/cbf/imgcif-l/.

In light of this activity a workshop on data formats for synchrotron image data was held after the NSLS/CFN meeting on 24 May 2007 at BNL in the Biology Department. Topics discussed included proposed extensions to imgCIF, the use of NeXus, progress on software and the status of imgCIF at Diamond and at SLS. That workshop concluded that work was needed on support for the handling of uncorrected images and true bitmaps, creation of a utility to "tidy" CIFS, creation of an agreed interface with XML, NeXus and HDF, clarification of the specification of the relationship between detector specification and the physical locations of pixels in the laboratory, tags for robotics and remotes access, and the creation of more cookbooks. See http://www.medsbio.org/mettings/BNL_May07_imgCIF_Workshop_Report.html.

In the short time between the second and third workshops, work on many of the items on this task list began. In addition, building on discussions with Jan Steinbrener at the second workshop, discussions with Matt Dougherty on the image needs in the microscopy community, and, through Matt Dougherty, with Mike Folk of the HDF Group on techniques for integration between imgCIF and NeXus, HDF and XML were started. Work on CBFlib continued and version 0.7.8 was released.

The third imgCIF workshop was held in two sessions at BSR 2007 in Manchester and at Diamond. Herbert Bernstein and Alun Ashton organized this workshop. The purpose of this workshop was to provide a review of the status of imgCIF and CBFlib for the European user community and to discuss the integration of imgCIF with NeXus, HDF and XML. The Manchester session was used for an introduction and review of the status of imgCIF and for some discussion. The Diamond Light Source session was used to deal with more detailed technical issues and further discussion. That workshop raised several issues that needed further consideration: the Dectris Pilatus 6m miniCBF headers, integration with NeXus, HDF and XML, and dealing with common issues between microscopy and crystallography image handling. See BSR_2007_imgCIF_Workshop.

After discussion with the funding agencies, and signs of good progress on the adoption of imgCIF, a fourth workshop in the new series of imgCIF workshops was added and it was agreed that we would "follow up with less formal meetings in conjunction with the ACA meeting in early June 2008 in Knoxville, TN and in conjunction with the IUCr meeting in Osaka, Japan in late August 2008."

The last formal workshop in this series on "Raw Image Formats in Structural Biology" was held on 22 May 2008 in the Biology Department (Building 463) at Brookhaven National Laboratory. It was scheduled just after the NSLS/CFN user meeting. Three out of four of the major detector vendors had representatives at this workshop, and all three agreed to cooperate in an effort to define an agreed minimum set of common tags that would be provided in synchrotron diffraction images and to participate in a "bakeoff" to help resolve any open issues with respect to interoperability. Much work remained to be done to move from having imgCIF as an available option to having it used as a routine tool in the collection of images. In addition there was now wide recognition that there is much to be gained from careful consideration of the interactions among multiple raw data image formats in structural biology, such as imgCIF, NeXus, HDF, XML and the microscopy formats. In addition to the effort on the imgCIF bakeoff, it was the consensus of the group that another workshop addressing these more general issues is needed within the next one to two years. See BNL_May08_imgCIF_Workshop.

In order to follow up on the open issues, one-on-one discussions were had at the ACA meeting in Knoxville, TN in June 2008, and a informal group meeting as a lunch-time gathering for technical discussions and review of imgCIF status for attendees at the XXI Congress of the IUCr on 26 August 2008 in Osaka Japan was arranged.

The invitation to the participants was:

"A lot of good things have been happening with imgCIF/CBF lately, but that then means there are technical issues to discuss. If you are planning to be in Osaka for the IUCr meeting and would like to get together over lunch or dinner, please send me your planned days of attendance in Osaka and any particularly favorable/unfavorable lunch/dinner times for imgCIF discussions, and we will try to work out something suitable.

"Groundrules -- no formal presentations, no powerpoint slides, no handouts -- just food (each person pays their own tab), pleasant company and technical discussions on imgCIF/CBF."

The result was a very lively meeting of fifteen people that lasted nearly three hours and which produced solid agreement on what was needed to ensure acceptance of imgCIF by the detector equipment vendors in a way that would be acceptable to commonly used processing software. Certain results of that discussion were taken to COMCIFS for further review and recently tentative agreement seems to have been reached.

Discussion, Recommendations and Conclusions

This informal workshop consisted of several productive simultaneous discussions of the steps needed to achieve more complete acceptance of imgCIF. The major conclusions were:

  1. The licensing of CBFlib under the LGPL appears to be acceptable and workable for the commercial vendors in who were in attendance (ADSC, Rayonix, Rigaku and Dectris).

  2. The definition of the minimal required set of tags that should be included in an imgCIF file will be the set of tags that are required by the mosflm cbf wrapper routine (the interface between mosflm and CBFlib) and (thanks to agreement by Harry Powell) that routine will be incorporated into the CBFlib distribution under the LGPL.

  3. Any imgCIF file that does not comply with this requirement will be clearly identified with a "magic number" comment on the first line of the file (see detailed proposal below).

    After subsequent discussions on the imgcif-l discussion list, primarily among James Hester, Harry Powell and Herbert Bernstein the detailed proposal that seems to have the broadest support is:

      We propose to change the first line of all CBF/imgCIF files that are not fully populated with all the imgCIF tags needed for processing by mosflm and adxv.

      • What problem is being solved?. As the use of imgCIF has increased, two very distinct sets of files have appeared: the "miniCBFs" used for the Pilatus 6m detector and more fully populated imgCIF files, such as the ones produced for ADSC detectors. While the information necessary for processing can be discovered from context in handling a miniCBF, it may be necessary to read fairly far into the file to discover that the file is indeed a miniCBF, complicating the design of reading software.

      • The proposed solution. Currently CBF files begin with a magic number comment line

                   1         2         3         4         5
          ###CBF: VERSION n.m

        We propose to extend the magic number comment line with two optional fields to read

                   1         2         3         4         5
          ###CBF: VERSION n.m     style     style_version

        where "style" is a unique CBF style identifier left justified as a single word in columns 25-34 and "style_version" is a left justified integer in columns 35-44.

        Each style will be registered in a central repository along with information on the tags that will be carried for that style and a template of the tags that would be needed to fully populate the file.

      • To faciltiate writing DDLm methods to work with this or any other magic number convention, a pseudo-tag _ws.prologue will allow application manipulation of the comments and whitespace from before a data block. The prefix ws will be reserved for this purpose and for similar, related tags. No parser will have to work with this tag; it is provided simply to have an unambigous algorithmic way to state the relationship with the following actual CIF tags.

      • A new category, data_block_format, is proposed with two new tages to be carried within an imgCIF file to agree with the style and style_version: _data_block_format.data_style and _data_block_format.data_style_version. Ignoring the 0-base, vs. 1-base indexing issues, just to state the relationship between the first comment line and these tags in pseudocode:

        _data_block_format.data_style = trim(_ws.prologue[25:34])
        _data_block_format.data_style_version = trim(_ws.prologue[35:44])

      As of this writing (29 October 2008), a draft of the code to support this proposal is being implemented in a version of CBFlib that will also have a release of the cbf wrapper from mosflm.


"Frances C. Bernstein" <fcb at bernstein-plus-sons dot com> Bernstein + Sons
"Herbert J. Bernstein" <yaya at bernstein-plus-sons dot com> Dowling College
"Michael Blum" <blum at rayonix dot com> Rayonix
"Christian Broennimann" <christian.broennimann at dectris dot com> Dectris
"I. David Brown" <idbrown at mcmaster dot ca> McMaster University
"Georgi Darakev" <darakevg at gmail dot com> Dowling College
"Jeffrey R. Deschamps" <deschamps at nrl dot navy dot mil> U.S. Naval Research Laboratory
"Kay Diederichs" <kay.diederichs at uni-konstanz dot de> University of Konstanz
"Joseph D. Ferrara" <joseph.ferrara at rigaku dot com> Rigaku
"James Hester" <jamesrhester at gmail dot com> Bragg Institute, ANSTO
"Chris Nielsen" <cn at adsc-xray dot com> ADSC
"Harry Powell" <harry at mrc-lmb dot cam dot ac dot uk> U.K. Medical Research Council - Laboratory of Molecular Biology
"Petr Salficky" <petr.salficky at dectris dot com> Dectris
"Clemens Schultze-Briese" <clemens.schulze at psi dot ch> Swiss Light Source
"John Westbrook" <jwest at rcsb.rutgers dot edu> Rutgers