What is chemoinformatics?

What is chemoinformatics?

The health and environmental effects of chemicals are increasingly relevant to the general public, regulatory agencies, academia, and industry. Thanks to the evolution of computational techniques, we have powerful tools for data management and complex mathematical calculations. In this context, chemoinformatics, one of the most relevant fields in the New Approach Methodologies,  arises as the combination of computational methods that are used to transform and analyse chemical data, as well as to determine the relationship between the chemical structure and the function of molecules and, taking advantage of this knowledge, to design new compounds or optimize compounds of interest.

Frank K. Brown, who coined the term “chemoinformatics” in 1998, also indicated that the main objective of this discipline is to develop chemical compounds in a better and faster way. Thus, for example, in the case of drug development, a process that normally lasts between 12 and 15 years and involves investments of around 500 million dollars can be significantly reduced. Although chemoinformatics has been closely linked from its origins to the design and development of drugs, in general, bioactive compounds are those that show any type of effect on living beings, tissues or cells.

Thanks to chemoinformatics, we can collect, classify, and order large amounts of experimental data. These are analysed to find characteristics allowing to determine behaviour patterns of the compounds. These patterns are collected in models, that is, informative representations of reality. Chemoinformatic models of structure-activity relationship (SAR) began as mathematical formulas, but, thanks to the advancement of computer techniques, they have reached great levels of complexity and accuracy, including the use of artificial intelligence with machine learning and/or deep learning techniques. In fact, characteristics common to the different fields of chemoinformatics are its large scale (big data) and its double aim to understanding the relationships between the structure and the activity of the compounds, on the one hand, and on the other hand to predict the bioactivity or other property values for compounds without the corresponding experimental values.

These models can predict properties in very diverse areas, from the physical-chemical properties, interactions with other compounds or macromolecules (proteins, RNA, DNA …), bioactivity, ADME properties (absorption, distribution, metabolism and excretion of the compounds), human or environmental toxicity (ecotoxicity), and even efficacy in the treatment of a disease of interest. The performance of these models is assessed and improved by new experiments, which compare the real data with the predictions and are used to refine them, in a process known as induced learning.

Computational models, also known as in silico models, accelerate the discovery and development of new bioactive compounds, since they allow virtually screenings sets of known compounds (called “libraries”) to rationally select those with the most favourable characteristics. In this way, the number of compounds to test in vitro or in vivo is significantly reduced. This allows for research that is faster, less expensive, and with less associated ethical and translational problems.

The computational search for bioactive compounds is applied in many fields: from drug development; to the food industry for the selection of additives; including the agri-food industry for the production of pesticides, fertilizers, or animal feed in the livestock industry; the production of cleaning products, paints, etc. Furthermore, in silico models can be used not only to select compounds with specific functions, but also to avoid the use of environmentally toxic or health-damaging ones.

We are in an era in which we can rely on computational techniques to obtain tangible benefits in the academy and industry. This allows us both to improve our health and to take care of the environment. Thanks in part to chemoinformatics, we are certainly before a promising future.


  1. Brown, F. K. (1998). Chapter 35 – Chemoinformatics: What is it and How does it Impact Drug Discovery. In J. A. Bristol (Ed.), Annual Reports in Medicinal Chemistry (Vol. 33, pp. 375–384). Academic Press. https://doi.org/10.1016/S0065-7743(08)61100-8
  2. Gozalbes, R., & Pineda-Lucena, A. (2011). Small molecule databases and chemical descriptors useful in chemoinformatics: an overview. Combinatorial chemistry & high throughput screening, 14(6), 548-558.
  3. Engel, T., & Gasteiger, J. (2018). Chemoinformatics: Basic Concepts and Methods. John Wiley Sons.
  4. Trujillo, A. G. P. (2011). La quimioinformática, una herramienta eficiente para desarrollar los medicamentos del futuro. Teoría y praxis investigativa6(1), 77-86.