Chemoinformatics

The following computational techniques are exclusively based on the knowledge of the chemical structures of compounds with a certain activity, and not on the structure of a particular target:

  • Calculation of molecular descriptors, that is, numerical values associated with the structural features of chemical compounds, so that different sets of these descriptors encode different chemical information. We can calculate more than a thousand of these indices, covering simple structural data (number of atoms, bonds, rings, etc.), topological information (shape, size, molecular branching), physicochemical properties (hydrophilicity/hydrophobicity, polarizability, etc.) .), or descriptors dependent on molecular conformation.
  • Filtering of chemical compounds according to pre-fixed rules: in the initial stages of medical chemistry projects it is common to determine certain standardized rules such as the well-known “Lipinski rules” (for the selection of drug-like compounds), “Oprea rules” (for the selection of lead-like compounds) or the “rule of three” (for the selection of fragments). In ProtoQSAR we have the means to calculate standard parameters (such as molecular weight, number of proton donors/acceptors, cLogP, number of rotatable bonds, polar surface area (PSA), etc.), which allow us to classify compounds according to these rules.
  • Analysis of chemical similarity and/or diversity: in ProtoQSAR we can characterize the molecules by using molecular descriptors such as the “MACCS keys” and standard algorithms such as the Tanimoto coefficient, as well as selecting subsets of compounds based on their structural similarity/diversity.
  • Alignment of small molecules: 3D superposition of potential and known ligands -after a conformational sampling of both types of structures- in order to deduce structural requirements for a given biological activity.
  • QSAR: construction of mathematical models relating the in silico structure of molecules with a biological property or activity, through the use of statistical tools. Once a correlation has been established, it can be used later to predict the property or biological effect of new structures.
  • Read-across (aka “neighborhood behavior”): when there is not enough data to build QSAR models, this method is a simpler alternative based on the well-known principle of “chemical similarity”: chemicals with common structural features usually exhibit similar physico-chemical and biological properties. Therefore, compounds that share structural similarities can be grouped into a chemical category, and it is possible to use data information from members of this group to estimate properties of other members of the same group.