# Legacy Research

**VSEPR Electron Pair Localisation**

The electron density does not directly reveal how electron pairs are localised in a molecule, if at all. In order to reveal this information other scalar fields have been invoked. The Laplacian of the electron density is such a function, but there are others such as ELF [29] and LOL. These scalar functions also have a topological structure, which is useful to better understand features of electronic structure such as lone pairs and to probe further into the concept of bonding. A considerable research effort has been devoted to the Laplacian, especially in connection Gillespie’s VSEPR model and ELF (the Silvi group in Paris).

To quote Gillespie (1996) “In its original formulation the model was based on the concept that valence shell electron pairs behave as if they repel each other and thus keep as far apart as possible. But in recent years more emphasis has been placed on the space occupied by a valence shell electron pair, called the domain of the electron pair, and on the relative sizes and shapes of these domains”. At this moment, the connection between these domains and the Laplacian is not well established; the ultimate answer lies in the full topology of the Laplacian [37]. This is why we conducted a major rewrite in the program MORPHY to enable a thorough and complete topological analysis of the Laplacian and any other 3D scalar function. Furthermore the change in the Laplacian’s topology in response to a change in nuclear geometry was studied in detail for the umbrella inversion of ammonia [45].

We joined the lively debate on the existence of multiple bonds between Group 14 elements [60]. Using ELF we managed to come up with an answer, which is at variance with opinions based on MO theory.

As a final goal we seek to provide an improved physical basis for the VSEPR model and possible its modification in areas where it currently fails [70]. Ultimately, the gap between this model and the physics of wave functions should be narrowed.

This research theme was sponsored via an EPSRC grant (GR/N 04423) (final report) funding Dr NOJ Malcolm.

**Bio-isosterism**

**A QCT Approach to Isosteric Fragment Replacement in Drug Design**

We are taking advantage of computational resources at GSK to accumulate a fragment database [1] for use in lead optimization using modern quantum mechanical *ab initio* calculations. The resulting body of data, resulting from around 2 million CPU hours, gives us unprecedented insight into chemical fragments.

Given the maturity of high-throughput technologies the pharmaceuticals industry increasingly appreciates the need for more detailed and high quality information. This need is already being addressed by newly introduced products such as ‘Quantum3.1’, used to describe protein-ligand interactions and ONIOM calculations to treat large systems. Furthermore, quantum chemical descriptors derived from the charge density have been successfully applied in numerous QSAR [8,9] studies, and with continual upgrades to computing power the increased use of *ab initio* methodologies looks set to continue.

At the same time there is increasing interest in fragment-based approaches in emerging ‘De-Novo’ technologies, and for the creation of databases such as the ‘Drug Rings’ [10] designed to find isosteric ring replacements using simple chemical descriptors. Use of databases to store fragment information removes the need for ‘on-the-fly’ calculations, allowing fast comparison of many thousands or even millions of chemical groups.

The Quantum Chemical Topology (QCT) approach [2,3] used in our research allows the extraction of atom and fragment information from the *ab initio* molecular charge density, as illustrated in Figure 1. QCT is the most rigorous and best documented [5] partitioning method in chemistry. The electronic properties obtained have been linked to important physical characteristics such as pK_{a} [13] and H-bond acceptor strengths [6,7], believed to contribute to compound solubility. Further properties of the charge density have been linked to reactivity [8,9,11], and we aim to use such descriptors to predict isosteric fragment replacements for optimization of biologically active ‘lead’ compounds.

Recent work has revealed the transferable nature of many chemical fragments between similar local environments, allowing us to use characteristics of their charge density to find suitable replacements with similar electronic properties. We hope to combine this approach with a form of metabolic compound profiling to identify problem areas in lead molecules and suggest modifications to overcome them.

**[1] **Quantum Isostere Database, (web tool under construction).

**[2]** R. F. W. Bader, ‘Atoms in Molecules. A Quantum Theory’, Oxford Univ. Press, 1990.

**[3] ** R. F. W. Bader, P. L. A. Popelier, and T. A. Keith, *Angew. Chemie-Intl Ed.in English*, 1994, **33**, 620.

[**4]** P. L. A. Popelier, ‘Atoms in Molecules. An Introduction.’ , Pearson Education, 2000.

**[5]** P. L. A. Popelier, F. M. Aicken, and S.E. O’Brien, in ‘Atoms in Molecules’, Roy.Soc.Chem. Spec.Period. Report, Ed. A. Hinchliffe, 2000.

**[6]** O. Lamarche and J.A. Platts, *Chem. Phys. Lett.*, 2003, **367**, 123.

**[7] ** O. Lamarche and J.A. Platts, *Phys. Chem. Chem. Phys.*, 2003, **5**, 677.

**[8]** P. L. A. Popelier, P.J. Smith and U. Chaudry, *J. Comp. Aided Mol. Des.*, 2004, **18**, 709.

**[9] **P. L. A. Popelier and P.J. Smith, *J. Comp. Aided Mol. Des.*, 2004, **18**, 135.

**[10] **X.Q. Lewell, A.C. Jones et al., *J. Med. Chem.*, 2003, **46**, 3257.

**[11]**** **A.Toro-Labbe, P. Jacque et al., *Chem. Phys. Lett.*, 2005, **407**, 143.

**[12] **A.S. Kalgutar, I. Gardener et al., *Current Drug Metabolism*, 2005, **6**, 161.

**[13]**** **U.A. Chaudry and P.L.A. Popelier, *J. Org. Chem.*, 2004, **69**, 233.

### Chemical Bonding

Although a cornerstone of chemistry the chemical bond continues to fuel debate. Ideas on chemical bonding, currently taught and used by mainstream chemists are based on elementary pictures, some even predating quantum mechanics.

The topological approach proposes a computable definition of a bond, via the definition of a bond path. Although this is an attractive asset that features strongly in high resolution crystallography, there are still issues to be settled [83].

The description of hydrogen bonding [22] has benefited from the topological approach, especially in unusual and perhaps controversial cases. This approach offers criteria that delineate conventional hydrogen bonds such that one can decide if unconventional candidates are hydrogen bonds based on their topological characteristics. This method is valid for both intra- and intermolecular hydrogen bonds [7][22]. Agostic bonds can be studied according to the same philosophy. We noted [32] that agostic bonds violate all hydrogen bond criteria (except the obvious presence of a BCP). That an agostic bond is not a hydrogen bond has been independently confirmed in the literature, and proves the discriminating power of the hydrogen bond criteria. However, the recently proposed di-hydrogen bond (mid 90s) does satisfy these criteria [31] and enables an ambiguous study of this new type of hydrogen bond.

The nature or even existence of Group 14 “triple” bonds caused a heated discussion, involving several research groups worldwide [60]. A topological analysis of the Electron Localization Function (ELF) indicate that the bond orders are less than three for the REER molecules and less than two for the R2EER2 molecules (where E=Si,Ge,Sn and R=H,CH3). We attribute this decrease in bond order to the decreasing ability of these larger and less electronegative elements to attract electrons into the bonding region so that an increasing fraction of the electron density remains as essentially non-bonding or lone pair density.

There is a need to extend the topological definition of bonding beyond equilibrium geometries. Furthermore more work needs to be done to settle the dispute whether atomic interaction lines can also indicate steric interactions [83].

### Intermolecular Interactions

This area is important for the development of force fields or intermolecular potentials to serve molecular dynamics calculations. There are two major schemes to study intermolecular interactions: the supermolecule approach and the perturbation approach. The latter defines contributions such as electrostatics, induction, dispersion, repulsion, whereas the former yields one single number for the interaction energy. An intermolecular potential, defined within the perturbation approach, contains atomic parameters that are given largely transferable values. However, if one aspires ultimate predictive power it is appropriate to provide the potential with realistic, tailor-made values. Ab initio calculations on monomers are ideal for that purpose, since they can provide values for all components of intermolecular interaction.

An accurate description of the electrostatic component requires a distributed multipole approach. This means that instead of a central site containing all the molecular multipoles up to a very high rank, one has a set of sites per molecule, each set containing lower rank multipoles. Such anisotropic description is vital if one strives for accuracy for an arbitrary system of molecules. Note that this model goes beyond typical (isotropic) point charge models. Analytical first and second derivatives with respect to rigid body coordinates were derived [14] and implemented in Stone’s program called ORIENT. When combined with the eigenvector following method these derivatives constitute a powerful and robust tool to explore potential energy surfaces [16][19][21].

We study the performance of the multipole moments of topological atoms, in particular their convergence behaviour in the expansion of the electrostatic potential [39][40] and the electrostatic interaction. We find that the claims made in the literature that topological atoms are not useful because of poor convergence are exaggerated and unjustified. We have computed the exact Coulomb interaction energy between two topological atoms occurring in the supermolecule, using a 6D integration over two atomic basins [41]. The convergence properties of the corresponding multipole expansion, defined both for intra- and intermolecular interaction, is favourable. Our work so far has proven that it is possible to construct an accurate intermolecular force field based on topological atoms. In particular, distributed polarizabilities benefit from the topological approach in terms of numerical stability and basis set insensitivity [43]. A study on the water dimer [43] showed that hydrogen bonding has the effect of decreasing the overall polarizability, due to reorganization of charge density along the hydrogen bond. This effect is most significant along the direction of the bonding, and is larger for the water molecule in which the oxygen atom is involved in the hydrogen bonding. Simultaneously an increase in dipole moment occurs. Placing the dimmer in a uniform external field shows that charge flows predominantly along the hydrogen bond.

For trustworthy geometry and energy predictions of local minima of complexes dominated by the electrostatic interaction it is important to include anisotropy via high order multipoles. We assessed the quality of a QCT potential for a set of small van der Waals molecules [46] first to then proceed with 27 natural base pairs and compare the results [59] with supermolecular calculations. Finally we tested such a QCT potential for water clusters up to the nonamer and for serine and tyrosine, each being solvated by up to five water molecules.

By a systematic and rigorous analysis into the electrostatic interaction of 27 DNA base pairs we find that there is no quantum chemical support for the secondary interaction hypothesis of Jorgensen [58]. We show that the convergence of the electrostatic multipole expansion can be improved by shifting parts to extra non-nuclear sites [61].

The topological part of this research theme is sponsored by EPSRC GR/M 18119 (final report) and funded Dr. D.Kosov and subsequently Dr. L.Joubert.

### QCT Algorithms

We find it important to develop our own algorithms and implement them in in-house code. The construction of topological objects, such as an atom or a bond path is conceptually simple because everything follows from the notion of a gradient path. However, the actual implementation of a topological analysis can be challenging, especially the integration of a property density over an atomic basin. Finding critical points in the general case of any scalar field (e.g. kinetic energy density, Laplacian) of an arbitrary molecule (or unit cell) is easier but also needed special attention [37][62].

A new approach to atomic integration started with the first analytical representation of an interatomic suface [12]. Subsequent development led to our in-house computer program called MORPHY [25], which was later extended with its own atomic integration algorithm with analytical [26] and non-analytical interatomic surfaces [33]. Furthermore important algorithmic issues have been resolved [66] in order to make integration over arbitrary basins possible. Successful integrations of volume and electronic population have been performed over basins of the Laplacian and were discussed in a contribution [70] to a Faraday Discussion meeting. Differential geometry [28] was used for the first time in order to characterize interatomic surfaces by means of their curvature.

The program MORPHY98 was the first commercially released in-house program with currently 55 licensed customers in over 20 countries worldwide. We are working on a new release of MORPHY, called MORPHY3. A local version of its GUI is already in use in our group.

### Molecular Similarity / QSAR / PKa Prediction

The long-term purpose of this research effort is to provide a new angle of approach to rational drug design and any area where Quantitative-Structure-Activity/Property-Relationships are relevant. We are developing a new method, which we called Quantum Topological Molecular Similarity (QTMS) [44]. This method is under continuous development and has originated from the PhD work of Dr. SE O’Brien sponsored by EPSRC grant GR/L65895 (final report). The basic idea goes back to 1995 [18] where a molecular representation was proposed based on properties evaluated at the so-called bond critical points (BCP). This (hyper)space of properties is called the BCP space. A molecular similarity measure can then be constructed as a distance in BCP space and molecules can be ranked according to their activity. The first successful QSAR using BCP space was set up for the acidity of benzoic acids [36]. Later it became clear that BCP space was also useful for several other systems of medicinal [53][81][87], physical organic [85] and ecological [57] interest.

#### The QTMS sequence

**Data generation**: This is where the raw data on the molecules are generated, i.e. their optimised geometries and wave functions. This is done preferentially at several levels of ab initio theory, but semi-empirical calculations can be included (just for geometries).**Localisation**: This is where the topological approach extracts local information from the wave functions. The molecules are represented in BCP space and/or via their atomic properties. For semi-empirical data, the topological approach just reduces to bond lengths.**Interpretation**: The topological descriptors are contrasted with experimental data in order to obtain a predictive model. We use tools such as Partial Least Squares (PLS), Neural Networks and Genetic Algorithms.