Research Data Life Cycle
Introduction
In scientific work, the assurance of good research practice is the highest imperative.
To ensure compliance, professional handling of research data is of particular importance. To illustrate the handling of research data, there are various models such as the domain model or the research data life cycle model. The research data life cycle model describes the lifespan of the data and beyond based on various phases from planning to publication or conscious deletion. For the research data life cycle, there are different approaches to the same model depending on the institution, the funder, etc. Different priorities can be placed on the data life cycle. The approach taken by NFDI4Chem is shown in the figure on the right.
Phase 1: Experiment Design
In phase 1, you start with the planning of your experiment design as well as planning your data management (formats, storage …). During this planning phase, already existing data should be located and the legal framework for use should be checked. You should be aware of the various requirements of the research funder, your university, your institution, and your community and should write a first draft of your data management plan (DMP). In this first draft, you should define the responsibilities.
Phase 2: Experiment/ Data Collection
New data is created or collected in this phase, e.g. through chemical synthesis, simulations of molecules, or measurements carried out on samples. When conducting the respective experiment, make sure that you also create or record the relevant metadata that describes the data. The documentation of the created data with all further information and metadata should be recorded in an (electronic) laboratory notebook.
If you use existing data or samples, clarify the rights of use. In this case, provenance metadata may provide you with information. In addition, a link should be created between the existing data and the newly generated data. This is relevant information and helps you in the reuse phase.
Phase 3: Data Processing
In the data processing phase, the collected data should be digitised if it is not directly coming in a digital format and as far as it is possible. An electronic lab notebook can greatly assist in this step and brings the data together—whether digital or digitised. Furthermore, you should think about collecting further information and metadata on the data. Think about the different types of metadata such as descriptive, administrative, or structural metadata. Enriching the data with further, machine-readable metadata makes the data searchable. Moreover, quality assurance is important at this phase. Check, validate and clean the data! You should save the data and prepare a data backup.
Phase 4: Analysis
In the 4th phase, as the name suggests, the focus is on the analysis and interpretation of the data. After analysing the data, the data should be evaluated. You should also consider sharing the data with colleagues in a closed and secure environment like on a project or working group level. Secure environments for sharing data are often provided by universities or federal states through Sync&share solutions. Consult your local research data team about this.
Before sharing data, you should check whether the data is subject to copyright protection or other protective rights.
Phase 5: Disclosure/ Publication
During this exchange and the associated reflections on the data, you should think about archiving and using the data in scientific publications. If you are not aware of any criteria for archiving and no criteria are specified in your working group or institute, decision-making guides such as the “5 steps to decide what data to keep” outlined by the DCC can help. Based on the established criteria, it is determined which of the collected raw data should be archived and which should be deliberately deleted.
In addition to the criteria, the migration of the data into suitable formats and onto suitable media is important for archiving the data. In this step, the data should again be enriched with metadata so that it can be understood in the future without further knowledge about the data. In addition to archiving, the publication of the data plays a special role. Many research funders expect the data to be published if there are no special reasons not to do so, such as a non-disclosure agreement or the inclusion of personal data. A chemistry-specific or chemistry-related repository such as the Chemotion Repository, NOMAD, or MassBank is recommended for the publication of data. An overview of repositories can be found, for example, at re3data.org or fairsharing.org. re3data.org allows you to filter repositories according to certain criteria such as the assignment of a persistent identifier or access. Data publishing often takes place at certain milestones, for example, in combination with a text publication or at the end of a project. The final version of the data management plan is also required at the end of a project.
Phase 6: Re-use
In the re-use phase, you or others conduct further research on how the data can be put into new contexts and how new ideas can be generated. Questions such as: How can I further develop my synthesis strategy? Which settings do I need to change in my spectroscopic measurements? and so on are developed further in this phase. New usage scenarios can also arise in chemistry or in other disciplines, such as Big Data applications. Furthermore, these data and the insights gained from them can be valuable for teaching and learning purposes.
In order to return these data to the cycle, it is important that these data are described in detail with metadata, that proper documentation has been carried out in the form of a DMP, and that the data are citable.
What is the potential of your data?
Sources and further information
- DFG Guidelines for Safeguarding Good Research Practice. Code of Conduct
- Introducing research data management as a service suite at RWTH Aachen University
- Review of data management lifecycle models
- DCC: 5 steps to decide what data to keep
- German: Überblick zum Management von Forschungsdaten (FDM I) = Research Data Management - An Overview
- German: Forschungsdaten.info: Informieren und Planen