Skip to main content

Synthetic Organic / Inorganic Chemistry

Introduction

During the synthesis of a desired compound, all steps, such as planning, realisation and documentation of an experiment, or characterisation of the obtained product, provide research data. These data are synthetic procedures, experimental conditions, as well as manually and digital data collected with analytical devices. Processing and interpretation of the obtained research data will lead to a proof of concept for a given reaction, optimised conditions for future experiments or upscaling.

Data Types

In synthetic chemistry different types of research data can be obtained. In general, this data is not limited to characterisation of synthesised products. A typical experiment starts with its design and planning, followed by carrying out the procedure in a laboratory setting. During realisation of an experiment, observations, experimental conditions, and yields are documented. Ideally, this manually collected research data is recorded digitally in an Electronic Lab Notebooks (ELN).

The synthesis of a specific product is followed by analysing its properties. Regarding data collection, both manually determined and digital data can be obtained. Observations and results of analytical methods with no digital output (i.e., no data files) can be added manually to the ELN entry of the experiment, which include for example melting/boiling point, optical rotation, TLC Rf values, or refraction index. Digital data are obtained from analytical instruments, e.g., NMR, IR, MS. These data can be uploaded seamlessly from the analytical devices to an ELN and analysed therein. An overview on file extensions, file sizes and converters for several analytical methods is given in the table below. It is recommended to save raw data files in proprietary file formats alongside interoperable open file formats by using converters or the software of the analytical device. If no specific open format is currently available, export as .txt or .csv is recommended.

Overall, metadata should always be included when collecting and storing data to allow understanding of the research data in the long term.

ELNs and Other Tools

For planning research data management and creating data management plans, tools such as RDMO and DMP-Online are suitable. Many universities have their own instances of these solutions.

If you want your data to comply with the FAIR principles it can be very tedious and extremely time consuming if you try to apply the FAIR principles to your data retrospectively if your existing workflows involve a large degree of analogue documentation (e.g. paper lab notebooks). The reality is that you need tools to take care of certain aspects of the FAIR principles automatically so you don’t have to apply them manually each time. Electronic Lab Notebooks (ELNs) are very powerful tools that can help you with this. Depending on what ELN you use, the metadata can be automatically assigned in both human and machine-readable formats. Furthermore, some ELNs can automatically generate interoperable open file formats for your analytical data. Choosing the right ELN can be challenging and this process should be thought about and carried out carefully. You can find out more on how to choose the right ELN here:

Loading...

A tool to help you find the right ELN is the so-called ELN finder which is a searchable online repository for many different ELNs. It is important to note that not one size fits all and that one ELN may be appropriate for one research group, another ELN may be more appropriate for a different research group. Within NFDI4Chem, Chemotion ELN is the reference instance (find out more here in our knowledge base article overview of Chemotion). This means that our developments in automatically applying the FAIR data principles to research data are implemented in Chemotion first.

Chemotion is especially suitable for synthetic chemistry as it originally started out as an ELN for synthetic chemistry but has now been extended to a wider array of scientific disciplines through its LabIMotion extension.

Publishing Data

Publishing research data is important in order to allow for the reuse of data by other researchers or for machine learning. Especially for machine learning, it is crucial that the data is published in a structured and standardised way. Where can you publish your data? Open access data repositories are a good solution to provide your data for reuse by others. Choosing the right repository is crucial and as a general rule of thumb it is better to deposit your data in data-specific or discipline-specific repositories as these enforce more standardisation in how the data are published thus allowing for better machine-readability.

In order to reach as many researchers as possible, choosing the right repository can be crucial (more on this in the article on choosing the right repository).

Above you can see a modified version of our decision tree from our guide how to choose the right repository. Here is a table giving an overview of what data fits into what repository:

Data typeData formatSuggested RepositoryCriteria for selection
Nuclear Magnetic ResonanceBruker XWIN-NMR format (zip), JCAMP-DXChemotionPassing basic checks, curation
Nuclear Magnetic ResonanceBruker XWIN-NMR format, JOEL format NMReData, nmrML, ISA JSONnmrXivValidations / Minimum information reporting standards
Molecules and their properties, identification, reactions and experimental investigationsmass spectrometry: JCAMP-DX, mzMl, mzXML (open, visualisable and processable), RAW for selected mass data types (processed and converted in JCAMP-DX), IR and Raman: JCAMP-DX, XRD: JCAMP-DX, UV/VIS: JCAMP-DX, Cyclic voltammetry: JCAMP-DX. *Chemotion repo offers the option to convert data from different file formats into JCAMP-DX.ChemotionPassing basic checks, curation
Inorganic crystal structuresCrystallographic Information File (CIF)ICSDCrystal structure data available
Organic and metal-organic crystal structuresCrystallographic Information File (CIF) but other supporting file formats acceptedCSDCell parameters (single crystal), full coordinates (powder), in CIF format
Organic, inorganic and metal-organic crystal structure dataprimarily Crystallographic Information File (CIF) but other supporting file formats acceptedjoint CCDC/FIZ Access Structures ServiceDAt least one CIF file must be included in the submission and structure factor data for all structures should be provided (if possible)
Generic data from all disciplines of chemistry, all data that do not fit in the disciplinary repositoriesformat-independentRADAR4ChemValidation against metadata schema

Your own institution may also have additional guidelines & resources for publishing data, therefore it is always worth consulting the research data management experts of your local institution.

Challenges

While for some data types & workflows it may be obvious how to comply with the FAIR principles, for others it is not as no community standards have been set and/or no appropriate open data formats are available. This is especially true for more niche analytical methods. Remember though: FAIR is a spectrum not an absolute. Therefore even if one of your workflows may not be as FAIR as the other, if it is as FAIR as is currently possible then it is still worth doing.

Many old devices do not put out open-data formats and some devices have no digital output at all which makes good RDM more challenging though not impossible given the right tools (e.g. Chemotion’s ChemConverter which automatically generates open file formats from analytical devices which are not capable of outputting them).

One of the biggest challenges to RDM in Chemistry at the moment is the lack of inter-ELN interoperability. This means that it is very challenging if not impossible to transfer data between different ELNs. This makes it especially challenging for interdisciplinary collaborations where collaborating groups use different ELNs. There are, however, efforts underway to establish inter-ELN interoperability such as the ELN consortium of which Chemotion is a member.