Skip to main content

What computational methods can we use to predict properties of substances?

In the previous blog entry, we talked about chemoinformatics within the so-called alternative methods. However, what computational methods can we use to predict a toxicological or physicochemical property of a substance, according to current regulations? How do we decide which one to use? In this post we bring you the answer.

There are four methods commonly accepted by the norms such as REACH, regulating the production and import of chemicals in Europe. These four methods are:

These computational methods are based on the calculation of molecular descriptors, which can be defined as numeric values quantifying properties calculated from the structure of the molecules. These descriptors can be of different types, from physical-chemical (e.g., number of atoms of each element), topological (one- or two-dimensional relationship between different atoms of the molecule), structural (3D relationship), etc.

When there is no experimental information on the molecule of interest, frequently structural analogues are searched. These can be defined as chemical compounds that have a similar chemical structure to the molecule of interest. If these structural analogues exist, the next step will be to define the type of prediction: qualitative or quantitative.

Qualitative predictions allow us to obtain classifications, such as active / inactive, toxic / non-toxic, etc. On the other hand, quantitative or regression predictions allow quantifying the property to be predicted, such as log Kow (octanol water partition coefficient), log S (water solubility unit), boiling point, etc.

If the prediction is qualitative, a first possible computational method is read-across. This method is based on the grouping of substances based on their structural similarities, such as common functional groups, precursors, etc., allowing to extrapolate a property of the molecule of interest from the data of similar compounds.

If the prediction is quantitative, we can use trend analysis. This method is very similar to read-across but allows us to perform a regression to predict the values of the property studied.

If structural analogs to the molecule of interest are not available, methods based on the structure-activity relationship are used. Qualitatively (SAR) and quantitatively (QSAR) methods exist. They are based on the relationship established between the chemical structure of a compound and its biological or chemical activity. These methods have been applied since the 19th century, when the biological activity of certain alkaloids was correlated with their molecular composition, but they have made great strides in recent years, due, among other causes, to increased computational capacity and artificial intelligence (machine learning) algorithms.

SAR methods use structural rules and alerts created from expert knowledge, in order to build models that relate subsets or fragments of molecules to a biological or chemical property. This method detects a specific structural fragment of the molecules that is known to be responsible for the studied property, such as toxicity, mutagenicity, carcinogenicity, etc. These methods allow us to make qualitative predictions.

The QSAR models are generated through statistical and machine learning methods that use the correlation between the molecular descriptors and the biological or chemical property studied. Among the statistical and machine learning methods most used in QSAR models are multiple linear regression, k-nearest neighbours, support vector machines, neural networks, etc. These methods allow us to make both qualitative and quantitative predictions.

Skip to content