High Data-Rate Macromolecular Crystallography

Meeting 19 March 2025

An HDRMX meeting was held as an online Zoom meeting on 19 March 2025 at https://diamondlight.zoom.us/j/99127975284?pwd=Icc8ZDoEiDtQYzUz68xnoMkw6aFoi2.1

The theme for this two hour meeting was upcoming data rates and volumes. There was a second two-hour on-line meeting, http://MEDSBIO.org/meetings/HDRMX_26Mar25.html Wednesday, 26 March 2025 with a metadata theme. The host for the 19 March meeting was Herbert J. Bernstein, hbernstein at bnl dot gov. The note-taker for the meeting was Aaron S. Brewster, asbrewster at lbl dot gov,

AGENDA

For the 19 March meeting on the high data rate theme the main questions we wanted to address were:

How big are the data we need to handle?
How we are currently doing it?
How we could handle them in the future
Details of what sizes of data beamline scientists face?
- What is your networking and storage infrastructure?
- How well is it prepared to handle a big detector running at speed?
- How do your beamline users handle their data?

The list of speakers was:

Dr. Filip Leonarski, Beamline Data Scientist, Paul Scherrer Institute PSI, filip dot leonarski at psi dot ch
pdf
Graeme Winter, Scientist, Diamond Light Source (but moving shortly), graeme dot winter at gmail dot com
pdf
Shibom Basu, Beamline Scientist, ESRF ID29, shbasu at embl dot fr
pdf
Diego Gämperle, Product Owner, Dectris Ltd., diego dot gaemperle at dectris dot com
pdf for both Dectris talks
Camilla Buhl Larsen, Scientific Solution Architect, Dectris Ltd.

Speakers were brief to ensure most of the time was available for discussion.

Here are the local times for 1500 - 1700 UTC on March 19, 2025:

New York (EDT): 11:00 AM - 1:00 PM

California (PDT): 8:00 AM - 10:00 AM

Chicago (CDT): 10:00 AM - 12:00 PM

London (GMT): 3:00 PM - 5:00 PM

Paris (CET): 4:00 PM - 6:00 PM

Tokyo (JST): 12:00 AM - 2:00 AM (March 20)

New York (EDT):	11:00 AM - 1:00 PM
California (PDT):	8:00 AM - 10:00 AM
Chicago (CDT):	10:00 AM - 12:00 PM
London (GMT):	3:00 PM - 5:00 PM
Paris (CET):	4:00 PM - 6:00 PM
Tokyo (JST):	12:00 AM - 2:00 AM (March 20)

Notes:

North America was on Daylight Saving Time (EDT/PDT/CDT)

London was on Greenwich Mean Time (GMT)

Paris was on Central European Time (CET)

Tokyo did not observe Daylight Saving Time

Organization for these meetings is being done on Slack.com. Contact graeme.winter@gmail.com for an invitation to join HDRMX on slack.com. If you are a structural biology beamline scientist, user interested in upcoming directions at such beamlines, or otherwise interested in high datamacromolecular crystallography please join us. Updates for the meeting wil appear both onmedsbio.org Slack.com. The next meeting was an online Zoom meeting on 26 March 2024. There will be a hybrid meeting on 23 July 2025 immediately after the meeting of the Americam Crystallographic Association in Lombard Illinois.

DECTRIS is pleased to support the efforts of the HDRMX community. More information can be found here: dectris.com

Draft Meeting Report for review

There were HDRMX 37 participants

Filip Leonarski spoke about Big Data at the Paul Scherrer Institute PSI.

pdf

The Swiss Light Source 2.0 is back with beam, first X-ray light was expected expected in the coming days. There are three synchrotron beam lines and one xfel endstation:

X06SA (PXI): A versatile MX beamline for
- Dynamic MX studies
- Spectroscopy
- Chemical crystallography
- Small angle scattering tensor tomography (SAS-TT)
Cristallina-MX: SwissFEL endstation for serial femtosecond crystallography
X10SA (PXII): Industry
- Tailored to the needs of our proprietary beamline partners
- Manual & automated data collections
X06DA:
- Autonomous beamline (queued mode)
- Room-temperature experiments
Data rates are doubling every two years.
The Matterhorn detector The "Matterhorn" is a high-flux, single-photon counting X-ray detector designed for the Swiss Light Source (SLS) 2.0 upgrade, focusing on applications at synchrotrons and XFELs, with modules expected to be available in 2026. It is estimated to produce 4 megapixel images at 10kHz for 85 GB per second. CPUs are not showing such growth.
Jungfraujoch is an edge system connecting detector and facility IT infrastructure to reduce data close to the detector and limit data transfer. It is high end CPU, FGGA and GPU based. Development started in 2019. It was First tested in Photon Factory (KEK, Japan) in 2020 remotely. The first iteration used specialized IBM hardware. Now it uses an off-the-shelf x86 server.
GPU indexing, integrated in CrystFEL and DIALS
Data processing at 2 kHz, 9MP detector. Save all indexed images and random 5% of non-indexed images Exploring detectris cloud operation
Getting ready for 1.0.0 release of Jungfraujoch and making the code available
Challenge for NXmx: no definitions for on the fly generated results. Have to use NXcollections.

Graeme Winter spoke about Data capture and correction for Jungfrau 9M.

pdf

The Jungfrau 9M challenge is that it is a 9-megapixel detector with 2kHz 16-bit readout, resulting in 36GB per second data rate. The data need to be corrected for per-pixel / per gain mode pedestal and gain. The data need to be bit-shuffled and compressed (~6:1) then saved to disk. The process needs to be steady state i.e. run continuously - for sustainability data veto will be needed to avoid saving blank/empty frames.

Compression 6:1 still leaves 6GB/s on disk. Need to store only good data. Limiting factors: memory bandwidth, real time requirement
FPGA time consuming to implement and slow to debug, though demonstrated to be effective in Jungfraujoch.
NVIDIA Grace Hopper chip consists of 72 ARM NEOVERSE v2 cores and an H100 GPU with a very high bandwidth interconnect. Current set-up is:
SLS detector receiving UDP packets and assembling, 0mq to GPU: correction, bitshuffling, maybe compression, maybe initial analysis. Keep half-modules separated! Use VDS to bring back full image.
To develop this we have a simulation environment consisting of 5 servers, big network, 18 virtual jungfrauDetectorServers, etc. Current: optimisation, tune simulation (server sending packets continually is challenge) Need to optimize VDS (works in DIALS, not XDS/DURIN). Example data available at https://zenodo.org/records/15017658 for 1M instrument (but using the same VDS concept.) VDS extends runtime 5% for read. 36 parallel streams makes writing way more effective

Comment From Nick D: This algorithm, if people not familiar github.com/kiyo-masui/bitshuffle, implemented in CUDA

Shibom Basu spoke about Serial crystallography at the EMBL -- ESRF.

pdf

The science done at EBSL8/ID29 includes drug binding, PH/T-jump, protein dynamic @ μs -- sec, enzymology, photoactivatable proteins, and photoactivatable ligands (photo-switches and photocages). The ID29 experimental setup for serial microsecond crystallography (S�μX) is advancing macromolecular structure determination with nicrosecond X-ray pulses at a 4th generation synchrotron.

ESRF: 6 beamlines. ID29 high data rate, serial crystallography (binding, TR, etc.)
Chopped pulses to JF 4M detector. Spots are smeared, saved in int32 in ADU, pedestal correction on the fly.
Fixed target chip 240K images in 20 min. 2 petabyte in 1 year
Reject frames, only save hits, GPU hitfinding. Live monitoring of spot counts/frame.
Exploring different data types, like converting to photons, or int16/int8, to allow us to go to 1kHz.
Lima: architecture at ESRF. Lima2 improves scalability, real-time processing, and high-level API to experiment controls
Detector to 100GB switch to IBM Power 9 machines. 4 receivers, each writing 1000K images, then VDS to master file for HDF5. Spotfinding in cxi format for CrystFEL
Working on sparsification, save spots alone
231 Hz

Diego Gämperle and Camilla Larsen of Dectris spoke about Addressing Big Data Challenges.

pdf for both Dectris talks

Data rates have risen from a mere 0.14 GB/s in 2007 to over 20 GB/s now.

DCU (detector control unit) assembles images
Interfaces: stream2 (zmq), filewriter (h5), monitor (tiff)
Data reduction in DCU: autosummation, threshold difference mode, compression (bslz4), adaptive output
BitShuffle optimized by and for Dectris specific data
8 bit floating point compression (3bit exponent, 5bit mantissa). Reduce bandwidth by about a third.
NextGenDCU: FGPA and GPU
Community: HDRMX, HDF, and NeXus
Summary: compression requires de-compression, fair principles are key, edge will become more important, it's an infrastructure problem, will community embrace lossy compression?

Gerd Heber notes https://www.hdfgroup.org/hug/hug25/

Camilla Larsen talk (Dectris)

Dectris cloud: analysis tools, open and fair, globally accessible
Data to the cloud, directly from beamline or from local storage, can provide dedicated uplink
Example: 1.2 petabytes in 72h
Cloud in 3 datacenters (US, EU, Asea, Aus is coming). Bandwidth in EU is 2.7Tbit
Provide compute as well. Used James Holton xds_bench.com to test. 192 CPU for single session.

General discussion

Missed GW's comment
HJB: lossy, always
ASB: megaHz vs. kiloHz?
- Fabio DA: fastest is 8kHz
- GW: sample delivery is primary problem
- Daniel Eriksson: one photon per frame
- Jack Stubbs: can we make this much sample? do we mean this much data?
- Dan P: EuXFEL starts to get to megahz
- Nick D: data rate is really what's important
- Filip Leo: burst mode is 100kHz using integrated detector, but then deadtime between burst, application is TR cryst. Ends up to 2kHz
- GW: what limits us is sample delivery and frame rate
- Shibom B: plan was to start w/ 1kHz, practicalities are sample production and delivery. Viscous injector and fixed targets. Even 231 Hz, won't have enough sample available at the beam
- ASB: smSFX is not sample limited
- HJB: multiple beamlines increase data rate overall for a facility
Lossy discussion
- GW: mtz is lossy
- Diego G: 8 bit floats not yet tested for us, only for EM detectors
- GW: bitshuffle works well, so this is the baseline. Will have to go lossy in the near future. Work out what is the minimally destructive effect. Even a factor of 10 would be very helpful
- HJB: tried shorts in 2007, couldn't tell the difference
- Kay D: CBF files is single byte per pixel, lossless compression
- GW: bitshuffle is 2 bits per pixel
- Gerard B: slight slippage in gonio makes ultra-fine slicing plus stacking really hard
- Kay D: all about the corner cases. Do we take the penalty in lossy but get bit by corner cases?
- GW: we bring a collection of different challenges here which will have different requirements.
- Thomas W: We (well, Marina really) tried a bewildering range of different lossy and lossless compression methods. There was only a minor impact on CCano, Rsplit and Rwork all the way down to a precision of only 2 bits per pixe
- GW: DIALS and other tools are generic but there is need for tuning for specific uses, such as using GPU
- Thomas W: what data we want to store depends on experiment. Fragment screening: result could be just density present or not, 1 bit. Repeating a sample in that case isn't a problem. But a special sample should be saved. Saving hits in serial is a big gain. 100% in reproducibility and full storage benefits. But doesn't help with improvability. Computing can be fast by profiling and looking at dumb-timewasting things. Not really seeing big resistance to dropping frames.
- ASB: why not confident in dropping frames? Also remember one point of this meeting is to get topics for the workshop at the ACA.
- Sofia T: membrane protein, datasets were bad and autoprocessing was bad. Having the images was important, needed knowledgeable people looking by eye.
- GW: community has changed. Automatic processing has improved and users have become less expert. MTZ delivered is probably about as good as they expect, and for them that's ok.
- Thomas W: trust in the data processing. Usual suspect is detector geometry, but also calibration of pixel values. How to get where we have trust in these parts?
- GW: constantly nagging beamline scientists to get their beamtime right. People are now recording geometry correctly, and HDRMX is no small part of that, especially because of Eigers. Give scientists more time for calibration!
- Gerard B: gold standard paper has been helpful here. some fiddly bits for next satellite
- ASB: no time is given at xfels for calibration outside of user time.
- Thomas W: sacla auto darks, so users don't have to request separate darks
- Gerard B: amplitudes after merging lose a lot of information, such as damage
- Nick S: don't let the pendulum swing too far in the other way, where we only submit mtz file. Example: want to retain anomalous pairs, and note that all xfel data is partial, and new methods still coming
- Cecilia Casadei: empty frame rejection should be done, but need reliable methods to do that
- GW: what is and what is not a spot is well defined, but features in background can be more than we'd like. So good spotfinding parameters can usually be obtained.
- HJB: need user agreement as to what is a spot (like the physics community). The FPGAs can be used to a standard to reject frames and do spotfinding.
- GW: we are united in saving unmerged intensities!
Planning
- Nick D: need to persuade facilities to pay for network and storage
- Nick D and ASB: facilities tend to react to needs in play now rather than future possibilities
- Gerard B: missed comment, but might have been about encouraging facilities to plan again
- GW: facilities will want one high facility beamline, but not all beamlines at a facilities will be so fast
Last thoughts
- GW: everyone ok with making the recording available?
- HJB: start with dump of these notes and move toward a report
- Gerard B: forum is fruitful and productive, would be worse off if it didn't exist
- ASB: announcing HDRMX at ACA session
- Gerard B: can it be hybrid?
- GW: depends on facility. If we can we will.
- Nick S: make a case to the ACA for support. GW notes we will be sponsored by two sponsors

HDRMX is supported in the US by the DIALS National Resource (R24GM154040). Learn More: https://dials.github.io/national_resource.html#national-resource