| MEDSBIO | HDRMX List Server |


High Data-Rate Macromolecular Crystallography

26 March 2025

AnHDRMX meeting was held as an online Zoom meeting on 26 March 2025 at https://diamondlight.zoom.us/j/94415184159?pwd=alxmRRzbVYJdtUiiP1Je5mKYw2zLhU.1

The theme for this two hour meeting was upcoming metadata needs and changes..

AGENDA

For the 26 March meeting on the metadats theme:

main questions addressed were the needs and changes to metadata needed as a results to the upcoming changes in data.

The list of talks was:

Speakers were bref to assure that nost of the time was for discussion.

Here were the local times for 1500 - 1700 UTC on March 26, 2025:

New York (EDT): 11:00 AM - 1:00 PM
California (PDT): 8:00 AM - 10:00 AM
Chicago (CDT): 10:00 AM - 12:00 PM
London (GMT): 3:00 PM - 5:00 PM
Paris (CET): 4:00 PM - 6:00 PM
Tokyo (JST): 12:00 AM - 2:00 AM (March 20)

Notes:

North America was on Daylight Saving Time (EDT/PDT/CDT)

London was on Greenwich Mean Time (GMT)

Paris was on Central European Time (CET)

Tokyo did not observe Daylight Saving Time

Organization for these meetings is being done on Slack.com. Contact graeme.winter@gmail.com for an invitation to join HDRMX on slack.com. If you are a structural biology beamline scientist, user interested in upcoming directions at such beamlines, or otherwise interested in highrate macromolecular crystallography please join us. Updates for the meeting will appear both on medsbio.org and on Slack.com.

DECTRIS is pleased to support the efforts of the HDRMX community. More information can be found here: dectris.com

HDRMX is supported in the US by the DIALS National Resource (R24GM154040). Learn More: https://dials.github.io/national_resource.html#national-resource


Notes

HDRMX Meeting #2 -- 26 March 2025

Chair: Graeme Winter (GW)
Participants: 26

Talks & Presentations

Herbert Bernstein (HJB)

pdf

Finding Molecular Needles in Structural Biology Data and MetaData Haystacks The Importance of Relational Databases (Herbert J. Bernstein, FPRI, hbernstein at bnl dot gov)

Structural biology depends on data metadata. As of 23 March 2025, there were 233,249 structures in the RCSB Protein Data Bank as well as 1,068,577 computed structures models, over 1.25 million structures in the CCDC Cambridge Structural Database, and 523,157 structures in the Crystallographic Open Database. The volume of structural biology data has been growing exponentially for over half a century. Despite many attempts to find non-relational alternatives, for large dynamic databases with multiple readers and writers, as of 2025 there is no reliable alternative to relational databases. In a relational database, all the data stored in organized into relations (tables). Each relation contains all the data for a given category of data, such as atomic sites Each column of a relation identifies a particular set of data, such as atom x-coordinates. Each row of a table (a tuple) contains all the values for a relevant instance of data. Theonlywaytofindaparticulartupleisbythe values in columns identified as keys of the relation. There may be more than one key column for a relation, but there need not be more than one key column.

Failure to organize our exponentially growing data into relational databases would be a mistake.


Aaron Brewster (AB)

pdf

The Wide World of NeXus (Aaron Brewster, LBL, asbrewster at lbl dot gov)

NeXus and the larger community (Aaron Brewster, LBL, asbrewster at lbl dot gov)

NeXus is a Neutron/X-ray unified standard data format, used in neutron, X- ray, muon, and electron data science See www.nexusformat.org. It is in use and being adopted by many facilities/communities/detector manufacturs: SOLEIL and ESRF (France), Diamond and ISIS (UK), PSI, SINQ and SLS (Switzerland), NSLS-II, SNS, APS, Oak Ridge National Laboratory (SNS/HFIR), and Lujan/LANL (USA), KEK and Spring 8 (Japan), DESY and European XFEL (Germany), MAX IV (Sweden), μSR (muon spin rotation/relaxation/resonance) community, Extreme Light Infrastructure (Czech Republic, Hungary and Romania), CLS (Canada), and ALBA (Spain), Dectris: Eiger It is governed by the NeXus International Advisor Committee (NIAC). It is a file format designed to capture the entire experiment


Diego Gaemperle (DECTRIS)

pdf

Metadata @ DECTRIS (Diego Gaemperle, Dectris, diego dot gaemperle at dectris dot com)

Dectris combines raw data and json-based metadata into two either Filewriter format for writing to files on disk, or Stream format for immediate capture by applications. The standard metadata specification in current use is NXmx v2024.02, which is the "gold standard" from the Dectris point of view. It incorporates changes required by new detector possibilities. There are currently no open requirements or changes from the Dectris side.

Ongoing developments in metadata at Dectris includ "NXem" to satisfy the growing number of EM customers with a desire for standardized metadata, API modernization, and DECTRIS Cloud, with 1) partnership with SciCat, involving LIMS models, metadata standards and schemas offering better support to the scientific community, and 2) Collaboration with Global Phasing and scientific community, involving standardization / templates to cover different techniques and beamline specifics.


Ezra Peisach (PDB)

pdf

HDRMX: Perspectives from the PDB (Ezra Peisach, PDB, ezra dot peisach at rcsbdotorg)

The PDB has been around since 1971. Meta-data is present in legacy PDB file format as structured remarks. Remark 200's mirrored the table 1 information from a publication: Data source, Rmerge, resolution limits, # reflections, etc. Not until 2008 was structure factor data mandatory: Amplitude or intensity data -- not really structure factors. Prior to then there was only minimal checking on agreement between data and coordinate model. There is only a Loose coupling between model & SF files. PDB has treated coordinate file as authoritative for validation -- Unit cell, space group, wavelength; Data collection statistics; PDB attempts to reproduce R-factors. The only connection between coordinate file and SF is _diffrn.id -- Assumed to be 1; Does not handle connection of multiple datasets well.


General Discussion

Topic: Text History in NeXus

Topic: Multi-wedge and Multi-scan Experiments


Topic: XDSplugins & Metadata Updates

Gerard B.: Applauded the open discussion across users, tech companies, software developers, and PDB.

Final thoughts

Next meeting: July at ACA.

General consensus: The meeting was valuable and appreciated by the community.