Skip to main content

Data formats

Choosing the right data format makes your research data easier to share, understand, and reuse. For FAIR practice, use formats that are open, well documented, and accepted by your community.

Why this matters

In chemistry, data can include spectra, structures, reactions, and metadata. The chosen format influences whether:

  • others can open your files,
  • repositories accept your submission,
  • your data remain usable in the long term.

Quick guidance for everyday work

  • Prefer open formats whenever possible.
  • Keep both raw data and processed data.
  • Add clear metadata, including units and context.
  • Check repository requirements early.
  • If instruments produce proprietary files, export an open exchange format as well.
  • For structure data, store both an exchange file (e.g. SDF) and identifiers (SMILES, InChI).
  • For spectroscopy, use JCAMP-DX for exchange when available.

Recommendations by use case

  • General table-like data exchange: CSV (with clear headers and units).
  • NMR data exchange: JCAMP-DX, nmrML, NMReDATA.
  • Mass spectrometry exchange and archiving: mzML.
  • Crystallography deposition: CIF.
  • Structure exchange: SDF, SMILES, InChI.
  • Chemical table files (Molfile, rxnfile, SDF): use V2000 for broad interoperability; use V3000 if you need advanced features and tool support is confirmed.
  • Spectral exchange: JCAMP-DX is widely supported across techniques; for very large or complex datasets, check whether a more suitable format is required.

When possible, select formats with broad software support and active community maintenance.

Common pitfalls

  • Keeping only proprietary instrument files without an open export.
  • Comparing SMILES strings without canonicalization.
  • Omitting stereochemistry in SMILES when isomers matter.
  • Sharing table files without clear units or column meaning.

Common chemistry formats

FormatData typeMaintainerParent FormatSpecification
JCAMP-DXmultipleIUPACASCII, Textopen
AnIMLmultipleASTMXMLopen
netCDFmultipleUCARCDFopen
CSVmultipleIETF-RFCASCII, Textopen
ASCIImultiple(open)self explanatory
ISAmultipleISA Commons CommunityTSV or JSONopen
UDMmultiplePistoia AllianceXMLopen
ADFmultipleAllotropeHDF5+RDFfor members
mzMLmass spectrometryHUPO/PSIXMLopen
ANDI-MSmass spectrometryASTM InternationalnetCDFopen
nmrMLNMRCOSMOSXMLopen
NMReDATANMRNMReDATA InitiativeSDFopen
Bruker FIDNMRBruker(binary)proprietary
mnovaNMRMestrelab(binary)proprietary
Bruker OPUSspectroscopyBruker(binary)proprietary
Perkin ElmerspectroscopyPerkin ElmerASCII, Textproprietary
ThermoFisher GramsspectroscopyThermoFisherbinaryproprietary

Sources and further information