Chemical Table File Formats
Introduction
Chemical Table files (CTfiles) are a group of text-based chemical file formats to describe collections of atoms such as molecules, intermetallic compounds, mixtures, formulations, polymers or unconnected atoms. Bonds in a bond block and atoms with their x-y-z coordinates in an atom block are listed in a connection table (Ctab). CTfiles may be also used to store additional information. The following paragraph summarises essential details on these formats.
Molfies, rxnfile, SDfiles and RDfiles
The structures and relations of the most relevant CTfiles are shown in figure 1. The molfile is the most simple CTfile containing the atom coordinates as well as a Ctab of a single collection of atoms. The reaction file (rxnfile) wraps several molfiles of reactants and products, while the specification does not include agents such as solvents, reagents or catalysts, usually drawn above or below a reaction arrow. The SDfile (structure-data file) contains one or several molfiles and a data block for additional information, while the RDfile (reaction-data file) may also include rxnfiles.
Figure 1: Relationship between molfile, rxnfile, SDfile and RDfile formats following the specification. RGfiles and RDfiles containing molfiles omitted for clarity.
The data block may consist of several data items each starting with a data header in one line followed by the data, which might span over multiple lines. Each data item is terminated by a blank line.
All CTfiles do not make use of controlled vocabularies or ontologies in the data block, but these can be easily added or linked, due to the text-based nature of CTfile formats. One example is the controlled vocabulary for SDtags of the open analytical data format NMReDATA. The data block might be also used to store supplementary data to enhance machine-readability of chemical structures.
There are two variants of these formats in use. Due to limitations in the V2000 format, the extended V3000 format was released. This V3000 format also includes a collection block, template block, extended connection table (extended CTAB) and enhanced features on stereochemistry. The V2000 format is limited in the size of molecules by supporting up to 999 atoms, thus, it is not applicable to large molecules. Moreover, the V2000 format does not support R-groups. However, the V3000 variant is currently not supported by all tools and applications. The V2000 format should be used to maximise interoperability, especially for chemistry dealing with small molecules.
CTfiles were created by MDL Information Systems, later acquired by Symyx Technologies, merged with Accelrys Corp and renamed to BIOVIA, which is now part of the Dassault Group. These formats can be used without any restrictions by licences following European Union directive 2009/24/EC and the EuGH judgement C-406/10.
Sources and further information
Main author: ORCID:0000-0003-4480-8661