Optimization and visualization of the edge weights in optimal assignment methods for virtual screening
© Rosenbaum et al.; licensee BioMed Central Ltd. 2013
Received: 15 November 2012
Accepted: 10 March 2013
Published: 26 March 2013
Ligand‐based virtual screening plays a fundamental part in the early drug discovery stage. In a virtual screening, a chemical library is searched for molecules with similar properties to a query molecule by means of a similarity function. The optimal assignment of chemical graphs has proven to be a valuable similarity function for many cheminformatic tasks, such as virtual screening. The optimal assignment assumes all atoms of a query molecule to be equally important, which is not realistic depending on the binding mode of a ligand. The importance of a query molecule’s atoms can be integrated in the optimal assignment by weighting the assignment edges. We optimized the edge weights with respect to the virtual screening performance by means of evolutionary algorithms. Furthermore, we propose a visualization approach for the interpretation of the edge weights.
We evaluated two different evolutionary algorithms, differential evolution and particle swarm optimization, for their suitability for optimizing the assignment edge weights. The results showed that both optimization methods are suited to optimize the edge weights. Furthermore, we compared our approach to the optimal assignment with equal edge weights and two literature similarity functions on a subset of the Directory of Useful Decoys using sophisticated virtual screening performance metrics. Our approach achieved a considerably better overall and early enrichment performance. The visualization of the edge weights enables the identification of substructures that are important for a good retrieval of ligands and for the binding to the protein target.
The optimization of the edge weights in optimal assignment methods is a valuable approach for ligand‐based virtual screening experiments. The approach can be applied to any similarity function that employs the optimal assignment method, which includes a variety of similarity measures that have proven to be valuable in various cheminformatic tasks. The proposed visualization helps to get a better understanding of the binding mode of the analyzed query molecule.
High‐throughput screenings of a chemical library for ligands against a certain protein target play a fundamental part in the early stages of the drug discovery pipeline. The searching of a chemical library in‐silico, also called virtual screening (VS), is the complementary computational approach to high‐throughput screening . The goal of a VS is to enrich molecules that are biologically active against a protein target in a preferably small top fraction of the ranked library. Only molecules with a top rank are then further analyzed with biological assays to validate their biological activity . Using VS, a chemical library comprising several million molecules can be searched for the most promising compounds to test in an experimental screening.
The computational approaches can be divided into structure‐based and ligand‐based VS methods. Structure‐based approaches try to predict the conformation and orientation of potential ligands in the binding pocket of a target protein , which requires experimentally determined three‐dimensional coordinates of the protein structure. Ligand‐based VS methods rank a chemical library operating only on a small set of known ligands that serve as query structures. Hence, ligand‐based methods can still be applied if protein structures are not available .
In a ligand‐based VS experiment, the similarity between every library molecule and a query molecule is measured by some similarity function. Highly similar library compounds are assigned a top rank, whereas molecules with a small similarity to the query molecule are assigned a low rank. Thus, a ligand‐based VS experiment highly depends on the chosen similarity function.
A large variety of different similarity functions have been proposed in the last two decades and the development of new or improvement of existing functions is still a field of active research [5–7]. All similarity functions can be categorized into different classes regarding the representation of the molecules. A simple and fast approach encodes each molecule with a fingerprint like extended‐connectivity fingerprints (ECFP)  or MOLPRINT2D  and calculates the similarity between fingerprints with the well known Tanimoto coefficient. A more elaborate approach interprets the molecules as chemical graphs. The similarity between two chemical graphs can then be evaluated by means of maximum common subgraph approaches  or a variety of graph kernels . Fröhlich et al. introduced the concept of an optimal assignment (OA) of arbitrary objects of two sets in the field of cheminformatics [12, 13]. The OA similarity measure and its various extensions have proven to be a valuable approach in many cheminformatic problems, such as quantitative structure‐activity relationship predictions  and virtual screening [15, 16].
The OA assumes every object, or atom in case of chemical graphs, to be of the same importance. Hence, all parts of a query molecule are equally important for the OA. However, depending on the binding mode of a ligand to its protein target, not all substructures of a ligand might be equally important for binding. A substructure of a ligand that exhibits important interactions (e.g. H‐bonds) with the protein target should contribute considerably more to the binding affinity of a ligand than a substructure that does not directly interact with the protein. Hence, the assumption of equally important atoms might not be realistic in a VS for ligands against a specific protein target.
This study has three aims. First, we present an extension to the existing standard OA method, which allows for assigning a different importance to each atom of a given query molecule. The importance is included by weighting the assignment edge that originates from an atom of the query molecule. The importance of the atoms of the query molecule can be determined by looking at the binding mode of the query if a crystal structure and expert knowledge are at hand. However, a crystal structure is often not available in the early stages of drug discovery or the docking of the query into an existing crystal structure is not satisfying. However, a data set of known ligands and decoys is usually available for ligand‐based VS.
The second aim of this study deals with optimizing the assignment edge weights for optimal VS performance on a data set of known ligands and decoys. Optimizing for optimal VS performance results in a non differentiable objective function with multiple local optima. Evolutionary algorithms for numerical optimization problems, such as a genetic algorithm for numerical problems (realGA), evolution strategies (ES) , particle swarm optimization (PSO) , and differential evolution (DE)  are capable of solving such optimization problems.
ES, realGA, and DE are evolutionary algorithms, which use mutation, crossover, and selection operators to search the minimum of a certain fitness function. The realGA employs mutation and crossover operators that are adapted to real‐valued problems in combination with a fitness value based selection. Evolution strategies employ mutation operators and a deterministic selection based on fitness ranks. DE is similar to realGA regarding selection and crossover, but uses a special mutation operator that is based on differences between population individuals. PSO belongs to the class of swarm intelligence algorithms, which use a set of particles that move with a certain velocity to search the minimum of a fitness function. In this study, we concentrated on DE and PSO because of their performance compared to realGA and ES on two real‐world optimization problems recently tackled by our group [20, 21]. DE and PSO have also been successfully applied in other practical optimization tasks [22–24].
The third aspect of this study concerns the interpretability of the optimized edge weights because besides a good performance, the information, which parts of the molecule are important for a good retrieval of actives, is valuable for a medicinal chemist. The presented approach encodes each assignment edge weight by the radius of the atom it originates from in a Ball & Stick visualization.
The methodology of optimizing the assignment edge weights was presented at the EVOBio2012 conference . Compared to the conference contribution this study addresses several aspects of the edge weight optimization in more detail, augments the method with a visualization, and compares the performance to two literature VS similarity functions.
The performance of our method and the other literature VS methods was evaluated on 13 VS benchmark data sets using sophisticated VS performance metrics for measuring both the overall performance as well as the early enrichment performance. Furthermore, we demonstrate the abilities of the visualization approach on one of the benchmark data sets.
The results show that the optimization of edge weights results in a considerable performance gain compared to the equally weighted OA and two other literature similarity functions. The visualization of the edge weights helps to understand which atoms of the query are important to obtain a good retrieval of active compounds.
This section first gives a short introduction to the OA similarity measure and motivates the importance of the optimization of the edge weights. This part has already been presented in the conference contribution. Second, we present two evolutionary algorithms that are capable of optimizing the assignment edge weights. Finally, we discuss how the edge weights can be visualized to allow the interpretation of the optimization results.
Optimal assignment similarity measure
The first step of the OA approach is the calculation of the matrix S of pairwise inter‐molecule atom similarities. A pairwise atom similarity S i j is calculated by comparing the physio‐chemical properties of each atom with a radial basis function (RBF). The OA uses 24 atom and 8 bond descriptors calculated by the chemical expert system JOELib2 .
Using the Hungarian method , an optimal solution for the OA problem can be computed in O(max(m,n)3).
An example of an OA of two molecular graphs is visualized in Figure 1a.
Optimal assignment edge weights
In a ligand‐based VS against a specific protein target, a single query is used to search a chemical library for other potentially active compounds. Generally, not all parts of the query are equally important for activity. The exact physio‐chemical properties of substructures that exhibit crucial interactions with the protein target should be more important and conserved than the properties of some linker region. Hence, important substructures should receive more attention in the OA procedure than unimportant substructures.
The different importance of the atoms of a query molecule can be represented in the OA approach by weighting the assignment edges that originate from the atoms. Atoms that are part of more important substructures result in larger edge weights whereas atoms within less important parts of the query molecule result in smaller edge weights. Consequently, the contribution of an assignment to the sum of pairwise atom similarities increases with the importance of the substructure that contains the assignment.
The edge weights w i represent the importance of the corresponding atom of the query molecule. The edge weights are not optimized by the OA procedure. They are fixed throughout the similarity calculations of a complete VS run and need to be determined prior to the VS of a chemical library.
An OA with optimized edge weights is visualized in Figure 1b. Setting edge weights close to zero, can result in a very small weighted similarity (red in Figure 1b), even for highly similar atom environments (green in Figure 1a). Consequently, only a few assignments contribute to the sum of weighted similarities S(Q,B). The assignment in Figure 1b only has three assignment edges with a large weight. The focus on a few number of assignments can cause topological errors during the OA because only the correct mapping of the assignments with a large weight is important. However, an advantage is that the OA can focus on the mapping of important pharmacophores.
Edge weight optimization
The assignment edge weights can be determined using expert knowledge of the exact binding mode of the query molecule to its target protein. However, depending on the availability of high quality crystal structures and literature, the exact binding mode might be unknown. Hence, the parts of a query molecule that are important for binding to the protein target are often unknown. Additionally, it is hard to assign exact numerical weights, even with expert knowledge. We propose to optimize the assignment edge weights for optimal VS performance on a data set of known ligands and decoys, which is usually available even in the early stages of drug discovery.
We employed two different types of evolutionary algorithms to optimize the edge weights of a given query structure. The VS performance on a data set of known ligands and decoys, which is called optimization data set throughout this paper, was used as fitness value for the optimization. The fitness value of a given set of weights was calculated by performing a VS run on the optimization data set. To ensure the weight constraints (5), the weights were bounded by [−0.5,0.5] during an optimization step and then normalized to suffice the constraints for the fitness evaluation.
Changing the edge weights, in particular driving weights close to zero, can dramatically change the OA. Changing the OA of several library compounds results in a different VS result. Thus, the fitness landscape contains multiple local optima. Evolutionary algorithms for numerical optimization with a good exploratory behavior should be used for optimization. A good exploratory behavior is crucial to avoid premature convergence to a local optimum, which could be observed for an ES with elitism in a practical study with a multimodal fitness function . Besides a good exploratory behavior, an optimization algorithm should be able to keep a potential global minimum and do a local search around it, which is known as exploitation. A better exploration of the search space can improve the performance on a jagged fitness landscape. However, an algorithm with a good exploration of the search space, but a bad exploitation, can lose a previously found potential optimum. Hence, a balance between exploration and exploitation is desired.
DE and PSO are popular evolutionary algorithms for tackling numerical problems with multiple local optima or dynamically changing fitness functions [28, 29]. Furthermore, realGA and ES without elitism are suited to optimize the assignment edge weights. Implementations of the aforementioned algorithms are available in the optimization framework EvA2 .
In a preliminary experiment we applied the optimizers realGA, ES, DE, and PSO with their EvA2 standard settings. The results indicated that realGA and ES performed worse than DE and PSO for the edge weight optimization. These results are in line with other practical studies performed by our group [20, 21], which calculated fitness values using a complex or dynamically changing model. DE and PSO might perform better for such problems. However, this performance advantage and well performing parameter settings of the algorithms cannot be generalized as stated by the No Free Lunch Theorems for search .
Particle swarm optimization
PSO is a population‐based optimization technique inspired by swarms of fish or birds. Each individual, or swarm particle, x i is characterized by its position in the problem space and its current travel velocity v i (t) that allows it to move in the problem space. The particles are arranged in a logical topology which defines a neighborhood N(x i ) for each particle x i . The motion of a particle is influenced by both individual knowledge and knowledge of neighboring swarm individuals.
The parameters ϕ1 and ϕ2 control the trade‐off between the different attractors. The update is performed by component wise sampling r1,r2 ∼ U(0,1) for randomized exploration, where U is a uniform distribution. The constriction approach includes a constriction factor χ to assure that the swarm reliably converges . Using the constriction approach, it is not necessary to limit the velocity vector to a maximum velocity v m a x . Commonly used neighborhoods are ring, 2D‐grid, or star topologies .
The employed neighborhood and the parameter χ have a major influence on the exploration and exploitation behavior of PSO. A smaller neighborhood results in a slower convergence, but it exhibits a better exploration of the search space. Larger values of χ produce larger velocities of the particles, which enhances the exploration of the search space. However, faster velocities result in a slower convergence. Additionally, a larger population increases the exploration of the search space.
DE is an evolutionary algorithm that replaces the mutation operator of genetic algorithms with a differential operator. DE generates new candidates by recombining a population individual and a vector of differences between selected members of the population. There exist several DE variants, which differ in the calculation of the difference vector [19, 28]. The nomenclature scheme to reference different DE variants is as follows: D E/a/b/c. In this nomenclature, a specifies the vector to be mutated, b defines the number of difference vectors used for mutation, and c defines the recombination scheme. In the following, the parameter c is omitted because we only used binomial recombination (c=bin).
The vector to be mutated x a is a random population individual for a=“rand”, the current best individual for a=“best”, and x i +λ(x b e s t −x i ) for a=“current‐to‐best”. The parameters F,λ∈ [0.3,0.9] control the amplification of the differential variation and the parameter C R ∈ [0,1] represents the crossover rate. The vectors are randomly chosen, mutually exclusive population individuals. The randomly chosen index j r ensures that at least one position is recombined. In contrast to the version of DE/current‐to‐best/1 described by Mezura‐Montes et al. , the EvA2 version of the variant DE/current‐to‐best/1 uses a combination of probabilistic and arithmetic recombination.
The chosen mutation variant and the parameters CR and F influence the exploration and exploitation behavior of DE. The variant rand uses a random population individual for mutation, which results in a better exploratory behavior, but the variant can take long to converge. The variant best uses the best population individual, which can limit the exploration of the search space and result in premature convergence. Higher CR and F values result in a higher mutability, which enhances the exploration. At the same time, a higher mutability might limit the search around a potential global minimum. As for PSO, a larger population results in a better exploration.
Visualization of edge weights
The assignment edge weights represent the importance of the atoms of the query molecule. Due to the integration of local atom environments in the OA procedure, a large edge weight means that in addition to the corresponding atom itself the local environment around the atom is important. Hence, an edge weight needs to be redistributed between the corresponding atom and its local environment according to the decay parameter (1) of the atom similarity calculations.
After the distribution of the weights, the new weight vector w’ is normalized to [0,1] to avoid visualization problems for large weights. Then, the weight of an atom can be visualized by the ball size in Å in a Ball & Stick molecule visualization. A large ball size indicates a larger weight whereas a small ball size indicates a smaller weight.
We evaluated our approach on a wide range of pharmaceutically relevant targets using a subset of the Directory of Useful Decoys (DUD) Release 2 [34, 35]. The DUD contains known actives and mimetic decoys for 40 protein targets. The DUD was designed to serve as an unbiased, publicly available benchmark database for the evaluation of docking methods. Due to a possible analogue enrichment bias, the original DUD is not suited for ligand‐based VS experiments .
Thus, we applied the same preprocessing step as in the study of Jahn et al.  to make the data sets more suitable for ligand‐based VS. In the preprocessing step, a lead‐like filter as suggested by Good and Operea  was applied to the data sets and the active structures were clustered . Each cluster represents a different chemotype or scaffold, which allows for the analysis of the enrichment of structurally diverse compounds.
Filtered DUD data sets used for the VS experiments
Epidermal growth factor receptor
HIV reverse transcriptase
Enoyl ACP reductase
P38 mitogen activated protein
Platelet derived growth factor receptor kinase
Vascular endothelial growth factor receptor
In contrast to docking methods, ligand‐based VS methods require a biologically active query molecule. In line with other VS studies on the DUD data sets [15, 39], we used the ligands of the complexed crystal structures that were used to identify the binding sites for docking algorithms as query molecules.
Evaluation of VS performance
The evaluation of a VS experiment is an important but challenging task, which has been discussed thoroughly in literature [40–42]. A plethora of sophisticated evaluation metrics have been suggested for measuring the performance of VS experiments, each with its strengths and weaknesses . According to recommendations for assessing the VS performance , the VS results should be analyzed regarding three different aspects.
First, the performance on the complete data set should be considered. The overall performance can be measured by the well known area under the ROC curve (AUC). The receiver operator characteristics (ROC) curve plots the fraction of molecules correctly predicted as active (true positive rate or sensitivity) against the fraction of inactive molecules that are falsely predicted as active (false positive rate or 1‐specificity) for every possible decision threshold. The AUC can achieve values in the interval [0,1].
The second aspect is the early enrichment of actives, which originates from the situation in real‐world applications. Usually only a small fraction of hits of a VS experiment is further evaluated in experimental screens due to time and cost requirements. Consequently, it is important to enrich actives within the first few percent of a data set. We employed two different popular metrics, the BEDROC score and the ROC enrichment, to measure the early enrichment performance. The BEDROC score  augments the AUC with a decreasing exponential weighting function that reduces the influence of lower ranked structures. Hence, the amount a structure contributes to the final BEDROC score decreases exponentially with its rank. The weighting function is controlled by a parameter α. High values of α increase the importance of top ranked structures. In our experiments, we set α=53.6, which means that 80% of the final BEDROC score is based on the AUC performance in the first 3% of the ranked data set. The BEDROC score adopts values in the range [0,1].
Multirun stability and convergence speed
Due to the heuristic nature of evolutionary algorithms, the performance of several multirun optimizations can differ substantially. The stability throughout several multiruns is a desired property because the more stable an algorithm is the fewer multiruns are needed to achieve a good optimum, which results in lower runtime requirements. Generally, the performance on different splits of a data set varies more than the performance of several multiruns on a certain split. To obtain a sensible measure for the multirun stability we calculated the standard deviations for each split separately. The resulting deviations were then averaged over the different splits of a data set.
The performance on the different splits and data sets can vary substantially. To take the different performance into account, we also calculated the relative standard deviation (%RSD), which normalizes the standard deviation of a certain split of a data set by the mean performance on this split. However, the %RSD showed a stronger correlation with the mean performance (−0.60) compared to the standard deviations (−0.42). Hence, we decided to use the standard deviations as a measure for multirun stability.
Another important factor when optimizing a certain problem is the convergence speed of the evolutionary algorithm. The fewer fitness evaluations are needed, the faster the runtime. To evaluate the convergence speed, we ran each evolutionary algorithm until a large, fixed number of fitness evaluations was reached. The convergence point of a optimization run was defined as the number of fitness evaluations that are necessary to obtain a performance that is within a range of 1% of the best performance found after termination.
Literature similarity functions
We employed two different literature similarity functions to compare the performance of our approach. First, the ECFP  with a Tanimoto coefficient, which calculates the similarity using 2D fingerprinting information. Second, the 4D flexible atom‐pairs similarity measure (4DFAP) [16, 48], which is a sophisticated method that efficiently compares conformational ensembles of molecules.
The 4DFAP, a 4D‐based similarity measure, efficiently compares the conformational space of two given molecules. The approach can be divided into a preprocessing step and the actual similarity calculation.
In the preprocessing step, a conformational ensemble of a molecule is efficiently encoded by means of generative models. First, a conformational ensemble of a molecule M is generated. From the conformational ensemble the distance profiles of the atom‐pairs are extracted. For each flexible atom‐pair, the distance profile is stored as a Gaussian‐mixture model. The parameters of the Gaussian‐mixture models are trained by a modified Expectation Maximization algorithm.
After the preprocessing, the actual similarity calculation between two molecules can be conducted. It operates on both the molecular graph and the precomputed generative models of the flexible atom‐pairs. Similar to the OA similarity measure, it first computes a matrix S of pairwise inter‐molecule atom similarities S i j . A pairwise atom similarity S i j is calculated by comparing the intra‐molecule atom‐pairs of two molecules. The rigid atom‐pairs are compared by a Tanimoto similarity function and the flexible atom‐pairs are compared by calculating the similarity between the stored GMM models. The final similarity is calculated by performing an OA on the matrix S of pairwise atom similarities. In principle, our approach of optimizing the edge weights could also be applied to the final step of the 4DFAP similarity measure.
To robustly evaluate our method, we generated 10 randomized 50/50 splits. The first half of a data set was used as optimization data set to optimize the assignment edge weights for optimal VS performance. In each fitness evaluation, a complete VS run was performed and evaluated on the first half of a data set. The second half of a data set was used as external test set to obtain 10 VS results for unbiased performance evaluation. The literature similarity measures were also evaluated on these external test sets.
We performed 10 multiruns for each optimization. Each optimization was terminated after 15000 fitness evaluations, which were enough evaluations for each tested algorithm to converge. Due to the runtime requirements of 1.5 million fitness evaluations per data set, we evaluated the evolutionary algorithms only with their EvA2 default parameters.
We employed the constriction PSO implementation of EvA2 with its default parameters: ϕ1 = ϕ2 = 2.05, χ ≈ 0.73, an initial velocity v = 0.2, and a population size of 30 individuals. These parameter settings worked well in earlier problems solved with PSO [20, 50]. As neighborhood, we used a 2D‐grid with range two.
We evaluated the three DE variants DE/rand/1, DE/best/2, and DE/current‐to‐best/1 with the Eva2 default parameters: F = 0.8, C R = 0.6, and λ = 0.6. The population size was set to 30 individuals. The population size is considerable smaller than suggested in a DE parameter study , which might limit the exploration of the search space. However, this size has proven useful in a recent study with a similar number of dimensions .
Results and discussion
The results section is divided in four different subsections. The first subsection compares four different evolutionary algorithms for their suitability for optimizing the assignment edge weights. Then, the influence of the optimized VS performance measure is discussed. The third part compares the OA with optimized edge weights to the equally weighted OA and two literature VS similarity functions. The final subsection demonstrates the interpretability of our approach by visualizing the edge weights of the query of the InhA data set.
Comparison of optimizers
We evaluated the four different evolutionary algorithms PSO, DE/rand/1, DE/best/2, and DE/current‐to‐best/1 regarding VS performance, multirun stability, and convergence speed.
We assessed both the overall performance and the early enrichment performance of the four algorithms on the optimization split. The AUC performance was used to evaluate the overall performance. As an early enrichment metric we applied the BEDROC score. The AUC and BEDROC performance were obtained from the best multirun after optimizing for a maximum AUC and BEDROC score, respectively. While the performance on an external test set should be used to compare different models, the optimization performance is suited to compare the capability of the evolutionary algorithms to find the optimum of a given problem. Hence, we compared the optimization performance of the different algorithms.
While similar results concerning the performance of PSO and DE were obtained in a study that optimizes the parameters of metabolic networks , comparative studies between PSO and DE on various benchmark problems suggest that DE often performs better than PSO [52, 53]. However, the global optimization performance of DE depends a lot on the parameter settings . Increasing the population size might improve the performance of the DE variants.
Both DE/rand/1 and DE/current‐to‐best/1 performed better than DE/best/2. The weights of the multiruns of an iteration exhibit considerable differences even for multiruns with a similar performance, which indicates that the optimization problem is multimodal with several local optima. The variant DE/best/2 has problems to efficiently explore the search space because of the recombination with the best individual of the population. Mezura‐Montes et al.  showed that DE/current‐to‐best/1 exhibits this problem for multimodal problems due to the arithmetic recombination. Including a probabilistic recombination in the variant improves the exploratory behavior.
To evaluate the multirun stability of the evolutionary algorithms, we calculated the average multirun standard deviation of an algorithm when optimizing the AUC and the BEDROC performance. The optimized AUC and BEDROC performance on the optimization split was used for the stability calculations when optimizing the AUC and BEDROC, respectively.
With an average rank of 1.54 and 1.85, PSO and DE/current‐to‐best/1, respectively, were most stable throughout the AUC optimizations. DE/rand/1 and DE/best/2 behaved less stable on several data sets, which resulted in an average rank of 2.77 and 3.85, respectively. When optimizing the BEDROC performance, the PSO, DE/current‐to‐best/1, and DE/rand/1 resulted in a similar multirun stability with an average rank of 2.08, 2.00, and 2.46, respectively. With an average rank of 3.46, DE/best/2 also exhibited the worst stability when optimizing the BEDROC. The problem to efficiently explore the search space is a reason for the lower multirun stability of DE/best/2. The DE variant tends to converge to different local optima more easily than the other tested evolutionary algorithms.
The convergence speeds indicate that 15000 fitness evaluations were enough evaluations for the algorithms to converge. Concerning the optimizers, DE/best/2 exhibited the fasted convergence with an average rank of 1.00 regardless of the optimized performance measure, whereas DE/rand/1 converged most slowly with an average rank of 4.00 and 3.77 when optimizing the AUC and BEDROC, respectively. PSO and DE/current‐to‐best/1 achieved an average rank of 2.15 and 2.85, respectively, when optimizing the AUC performance. When optimizing the BEDROC score, PSO and DE/current‐to‐best/1 exhibit a similar convergence behavior with an average rank of 2.54 and 2.69, respectively. The convergence speeds of the different DE variants are in line with a study on several benchmark functions by Mezura‐Montes et al. . Their study showed that on average DE/best/* exhibits the fastest convergence. In our study, DE/best/2 converged fastest because it converged to suboptimal local optima, which resulted in a lower performance.
On average, all algorithms, but DE/best/2, take between 45% and 80% more evaluation steps to converge when optimizing the BEDROC score, whereas DE/best/2 needs 29% less evaluation steps. A possible explanation for this could be that the convergence point of 1% within the range of the best solution after 15000 evaluation steps is more stringent for the BEDROC score than for the AUC because the BEDROC performance is generally lower than the AUC performance. However, similar results were obtained with a fixed range of 0.01. The differences between the convergence speeds underline the assumption that the problem of optimizing the BEDROC score exhibits a more jagged fitness landscape. The variant DE/best/2 converges faster because it hits a local optimum faster in a fitness landscape with more local optima.
The analysis of performance, multirun stability, and convergence speed favored PSO and DE/current‐to‐best/1 for optimizing the assignment edge weights. Both algorithms exhibit the best performance and multirun stability combined with a reasonably fast convergence speed. Good multirun stability and fast convergence speed allow for the reduction of the number of fitness evaluations and multiruns in a productive environment to speed up the runtime while maintaining the performance. The following analyses are limited to PSO because it is one of the two suitable optimizers.
Influence of the optimized VS metric
The assignment weights can be optimized with respect to any VS metric. The fitness function and with it the resulting model considerably depends on the employed VS metric. Hence, optimizing a certain metric can influence the model performance with respect to another VS metric. To evaluate the influence of the optimized metric, we performed experiments optimizing the AUC, a measure for the overall performance, and the BEDROC score, an early enrichment metric.
To analyze the effect of the two optimization metrics on the performance, we compared the performance of the optimized queries with respect to the AUC, BEDROC, and awROCE5% performance on the external test splits. The results are presented as boxplots in Figure 6. Optimizing the AUC improved the overall performance of the optimized query. The AUC optimized queries resulted in a better AUC performance on all of the benchmark data sets. Vice versa, optimizing the BEDROC score resulted in a better BEDROC performance compared to optimizing the AUC. The BEDROC optimized queries achieved a better median BEDROC performance on all but one data set. However, both the BEDROC and AUC optimized queries result in a similar awROCE5% performance on most of the data sets.
The reason for the difference between the two early enrichment metrics BEDROC and awROCE5% is twofold. First, the BEDROC optimized queries tend to enrich molecules of larger chemotype clusters, which is penalized in the awROCE metric. Molecules of large clusters are likely to occur in the optimization split and the queries tend to be optimized for molecules of these clusters when focusing on an early enrichment. Thus, optimizing for the BEDROC score can result in weights that are applicable for a smaller number of clusters compared to optimizing for the AUC performance. This fact is supported by a better ROCE5% performance of the BEDROC optimized queries. Unlike the awROCE, the ROCE metric does not penalize large clusters. On the FXa data set, the ROCE5% performance favored the BEDROC optimized query whereas the awROCE5% performance slightly favored the AUC optimized query. A detailed analysis of the result table of the split with the most differing awROCE5% performance reveals that both optimized queries enriched an equal number of actives along with the top 5% of true decoys. However, the BEDROC optimized query mainly enriched actives from one of the largest clusters whereas the AUC optimized query also enriched a diverse set of smaller clusters.
The analysis of the AUC and BEDROC optimized queries showed that the optimized VS metric has indeed a substantial influence on the resulting weights and the performance. Thus, care should be taken to select the correct optimization metric. Optimizing the AUC results in a good overall performance, a reasonable early enrichment, and a diversity of chemotypes for most of the data sets. Hence, it should be the default option for the optimization step. Employing the BEDROC score as optimization metric can lead to a substantially better early enrichment. However, this improved enrichment can come at the expense of a diversity of scaffolds or at the expense of the enrichment at a slightly larger fraction of the data set. Optimizing the awROCE would ensure a diversity of scaffolds, but it is not a very robust metric for optimization because of the strict cutoff at a certain fraction of true decoys.
We chose to use the AUC optimized query for the comparison between the OA with optimized edge weights and the literature similarity functions.
Comparison to other similarity measures
In this subsection, we compare the OA with optimized edge weights (PSO‐OA) to its unoptimized pendant, the OA with equal edge weights (OA), and two literature similarity functions, an ECFP with a Tanimoto similarity coefficient (ECFP) and the 4D flexible atom‐pair kernel (4DFAP). The edge weights were optimized for the AUC performance using PSO. We evaluated the AUC, BEDROC, and awROCE5% performance on the external test splits of the 13 filtered DUD data sets.
Performance of PSO‐OA and literature similarity functions
The results on the InhA data set indicate that the optimization of edge weights encodes a structure‐activity model of the molecules of the optimization data set in the query weights, which is not the case for the other similarity functions. Consequently, a considerable performance gain is to be expected. Perfectly fitting a structure‐activity model to the optimization data set might lead to substantial overfitting, a common problem of machine learning approaches. However, the fitness on the external test splits during the optimization runs as well as the final performance on the external splits indicate that overfitting was not a problem.
On the EGFr data set the 4DFAP outperformed all other approaches by a considerable margin. The 4DFAP was able to retrieve all actives within ≈30% of the data set whereas the PSO‐OA and the OA retrieved about 20% of the actives in the last percent of the data set. Hence, the conformational space encoding adds a significant amount of information for the EGFr data set . While the PSO‐OA can improve the enrichment of actives compared to the original OA, it is restricted to the same 2D information as the OA, which limits the performance on the EGFr data set.
In principle, the optimization of edge weights can be applied to any optimal assignment approach. Thus, the weights in the optimal assignment step in the 4DFAP can be optimized with our approach. The atom similarity matrices of the 4DFAP are based on a different type of information (atom‐pair distances) compared to the OA (local atom similarities). Therefore, the optimized weights of the OA cannot be directly applied to the 4DFAP. Experiments with the optimized weights of our study (data not shown) resulted in a considerably reduced performance of the 4DFAP.
Visualization of edge weights on InhA
In this subsection, we demonstrate and discuss the visualization of the optimized assignment edge weights on the InhA data set. The data set is among the three best performing data sets, which is crucial because a reasonable performance is necessary to obtain a sensible visualization. Furthermore, the weighted query of the InhA data set is suited to discuss the strengths and problems of the edge weight optimization and the visualization approach. For the visualization, we repeated the optimization of the edge weights with the complete data set. We performed 5 multiruns with 8000 fitness evaluations, which should be enough according to the multirun stability and the convergence speed of the PSO on the InhA data set. We employed the AUC as optimization metric.
The InhA data set contains ligands against the Enoyl ACP reductase, which is a promising protein target for the treatment of tuberculosis and malaria. The Enoyl ACP reductase mediates the reduction of fatty acid substrates using NAD+ as cofactor. As the DUD data sets were designed for docking, the binding mode of the query can be analyzed using the PDB entry 1P44, which contains the target protein complexed with the query.
The distribution of the weights critically depends on the ligands in the optimization set. If the 15 ligands without a carbonyl exhibit a completely different binding mode it would be less meaningful to force a fit between the query and those ligands. Such a forced mapping can result in a less sensible visualization, in our case the low weighted carbonyl group. An user of the visualization should keep in mind that increased edge weights represent atoms that need to be optimally assigned for a good retrieval of actives, which might not coincide with all features that are actually important for a specific binding mode.
There exist countermeasures to prevent the problem of an increased number of topological errors. First, pseudo weights can be added to the optimized weights, which would reduce the influence of large weights and increase the influence of weights that are close to zero in an optimization without pseudo weights. While pseudo weights might reduce the topological errors that were induced by a focus on a few large weights, they might also reduce the ability of discovering different chemotypes. The second countermeasure results from the general applicability of our method to any similarity function that employs the optimal assignment method. The 2‐step hierarchical assignment (2SHA) , a similarity function that was designed to prevent topological errors by doing a hierarchical assignment, can be subject to our optimization approach. A combination of both approaches might yield an improved VS performance with less topological errors.
In this study we presented an extension of the optimal assignment method for chemical graphs that uses evolutionary algorithms to optimize the assignment edge weights with respect to a VS performance metric. In principle, the method is applicable to all similarity functions that employ the optimal assignment method like 4DFAP [16, 48] or 2SHA . Thus, the VS performance of a variety of VS similarity functions can be improved by optimizing the edge weights. An improved in‐silico VS performance, in particular an increased early enrichment, helps to increase the success rate of further biological assays and thus, reduces cost and time requirements. Additionally, we presented a visualization approach that allows for the interpretation of the optimized weights.
We evaluated 3 different DE variants and PSO for the suitability for the optimization of the OA edge weights. A suitable optimizer combines a good optimization performance, a reasonable multirun stability and a fast convergence speed. Both DE/current‐to‐best/1 and PSO are suitable optimizers, but PSO achieved slightly better results than the DE variant.
Besides DE and PSO, other promising evolutionary algorithms can be applied for optimizing the assignment edge weights. An ES with covariance matrix adaptation (CMA‐ES)  or ant colony optimization (ACO)  could be used to solve the edge weight optimization problem. CMA‐ES and ACO are currently not available in the EvA2 framework, but the implementation and evaluation could be addressed in a further study. Additionally, GA or ES might perform better with different parameter settings.
The optimized VS metric has a considerable influence on the optimized weights. The results indicated that the optimization of the BEDROC score improves the early enrichment performance, whereas optimizing the AUC results in better overall performance and a better enrichment of different scaffolds. We advise to use the optimization of the AUC as default option.
We compared our method to the equally weighted OA and two other literature similarity functions on 13 DUD benchmark data sets. The results showed that the OA with optimized edge weights achieves a considerably better overall and early enrichment performance on a variety of pharmaceutically relevant targets. Particularly, our approach was able to perform so called “scaffold‐hops”, which is important for a pharmaceutical company.
The visualization approach was demonstrated on the weighted query of the InhA data set. The visualization showed that the optimization of edge weights yields sensible results with respect to the binding mode of the query. Thus, our approach helps medicinal chemists to get a better understanding of the binding mode of the query. However, if a large amount of compounds with a considerably different scaffold is present in the optimization set, the optimized weights might not completely coincide with the binding mode of the optimized query.
To conclude, we think that the optimization of edge weights combined with the visualization of the weights is a valuable tool for the VS of chemical libraries.
All employed programs are available free of charge as executable jar and source code at http://www.cogsys.cs.uni-tuebingen.de/software/OAOptimization/. This includes the library that was used for the optimization of the edge weights and a graphical user interface for the visualization of the results of an optimization.
This investigation was supported in parts by the Kompetenznetz Diabetes mellitus (Competence Network for Diabetes mellitus) funded by the Federal Ministry of Education and Research (FKZ 01GI0803‐04) and a grant from the German Federal Ministry of Education and Research to the German Center for Diabetes Research (DZD eV).
- Bajorath J: Integration of virtual and high‐throughput screening. Nat Rev Drug Discov. 2002, 1 (11): 882-894. 10.1038/nrd941.View ArticlePubMed
- Shoichet BK: Virtual screening of chemical libraries. Nature. 2004, 432 (7019): 862-865. 10.1038/nature03197.PubMed CentralView ArticlePubMed
- Kitchen DB, Decornez H, Furr JR, Bajorath J: Docking and scoring in virtual screening for drug discovery: methods and applications. Nat Rev Drug Discov. 2004, 3 (11): 935-949. 10.1038/nrd1549.View ArticlePubMed
- von Korff, Freyss J, Sander T: Flexophore, a new versatile 3D pharmacophore descriptor that considers molecular flexibility. J Chem Inf Model. 2008, 48 (4): 797-810. 10.1021/ci700359j.View Article
- Bender A, Jenkins JL, Scheiber J, Sukuru SC, Glick M, Davies JW: How similar are similarity searching methods? a principal component analysis of molecular descriptor space. J Chem Inf Model. 2009, 49: 108-119. 10.1021/ci800249s.View ArticlePubMed
- Geppert H, Vogt M, Bajorath J: Current trends in ligand‐based virtual screening: molecular representations, data mining methods, new application areas, and performance evaluation. J Chem Inf Model. 2010, 50 (2): 205-216. 10.1021/ci900419k.View ArticlePubMed
- Sheridan R: Why do we need so many chemical similarity search methods?. Drug Discov Today. 2002, 17 (7): 903-911.View Article
- Rogers D, Hahn M: Extended‐connectivity fingerprints. J Chem Inf Model. 2010, 50: 742-754. 10.1021/ci100050t.View ArticlePubMed
- Bender A, Mussa HY, Glen RC, Reiling S: Similarity searching of chemical databases using atom environment descriptors (MOLPRINT 2D): evaluation of performance. J Chem Inf Comput Sci. 2004, 44 (5): 1708-1718. 10.1021/ci0498719.View ArticlePubMed
- Mohr J, Jain B, Sutter A, Laak AT, Steger‐Hartmann T, Heinrich N, Obermayer K: A maximum common subgraph kernel method for predicting the chromosome aberration test. J Chem Inf Model. 2010, 50: 1821-1838. 10.1021/ci900367j.View ArticlePubMed
- Ralaivola L, Swamidass SJ, Saigo H, Baldi P: Graph kernels for chemical informatics. Neural Networks. 2005, 18: 1093-1110. 10.1016/j.neunet.2005.07.009.View ArticlePubMed
- Fröhlich H, Wegner JK, Sieker F, Zell A: Optimal assignment kernels for attributed molecular graphs. Proceedings of the 22nd International Conference on Machine Learning, New York. 2005, ACM, 225-232.
- Fröhlich H, Wegner JK, Sieker F, Zell A: Kernel functions for attributed molecular graphs ‐ a new similarity‐based approach to ADME prediction in classification and regression. QSAR Comb Sci. 2006, 25 (4): 317-326. 10.1002/qsar.200510135.View Article
- Fechner N, Jahn A, Hinselmann G, Zell A: Atomic local neighborhood flexibility incorporation into a structured similarity measure for QSAR. J Chem Inf Model. 2009, 49 (3): 549-560. 10.1021/ci800329r.View ArticlePubMed
- Jahn A, Hinselmann G, Fechner N, Zell A: Optimal assignment methods for ligand‐based virtual screening. J Cheminf. 2009, 1: 14. 10.1186/1758-2946-1-14.View Article
- Jahn A, Rosenbaum L, Hinselmann G, Zell A: 4D Flexible atom‐pairs: an efficient probabilistic conformational space comparison for ligand‐based virtual screening. J Cheminf. 2011, 3: 23. 10.1186/1758-2946-3-23.View Article
- Beyer HG, Schwefel HP: Evolution strategies – a comprehensive introduction. Nat Comput. 2002, 1: 3-52. 10.1023/A:1015059928466.View Article
- Kennedy J, Eberhart R: Particle swarm optimization. Proceedings of the IEEE International Conference on Neural Networks, Perth. 1995, IEEE Computer Society, 1942-1948.View Article
- Storn R, Price K: Differential evolution ‐ a simple and efficient heuristic for global optimization over continuous spaces. J Global Optim. 1997, 11: 341-359. 10.1023/A:1008202821328.View Article
- Beck M, Bayer P, Zell A, de Paly and Jozsef Hecht‐Mendez M: Evaluation of the performance of evolutionary algorithms for optimization of low‐enthalpy geothermal heating plants. Proceedings of the 14th International Genetic and Evolutionary Computation Conference, New York. 2012, ACM Press, 1047-1054.
- Dräger A, Kronfeld M, Ziller MJ, Supper J, Planatscher H, Magnus JB, Oldiges M, Kohlbacher O, Zell A: Modeling metabolic networks in C. glutamicum: a comparison of rate laws in combination with various parameter optimization strategies. BMC Syst Biol. 2009, 3: 5. 10.1186/1752-0509-3-5.PubMed CentralView ArticlePubMed
- Geis M, Middendorf M: A particle swarm optimizer for finding minimum free energy RNA secondary structures. 2007, Piscataway: IEEEView Article
- Rogalsky DepartmentOf, Rogalsky T, Derksen RW, Rt N, Rt N, Kocabiyik S: Differential evolution in aerodynamic optimization. Proceedings of the 46th Annual Conference of the Canadian Aeronautics and Space Institute. 1999, 29-36.
- Vahdat A, NourAshrafoddin N, Ghidary S: Mobile robot global localization using differential evolution and particle swarm optimization. Proceedings of the IEEE Congress on Evolutionary Computation. Edited by: Srinivasan D, Wang L. 2007, Singapore: IEEE Press, 1527-1534.
- Rosenbaum L, Jahn A, Zell A: Optimizing the edge weights in optimal assignment methods for virtual screening with particle swarm optimization. Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics, Volume 7246 of Lecture Notes in Computer Science. Edited by: Giacobini M, Vanneschi L, Bush W. 2012, Heidelberg: Springer Berlin, 26-37.
- Guha R, Howard MT, Hutchison GR, Murray‐Rust P, Rzepa H, Steinbeck C, Wegner J, Willighagen EL: The blue obelisk interoperability in chemical informatics. J Chem Inf Model. 2006, 46 (3): 991-998. 10.1021/ci050400b.View ArticlePubMed
- Kuhn HW: The hungarian method for the assignment problem. Naval Res Logist. 1955, 2: 83-97. 10.1002/nav.3800020109.View Article
- Mezura‐Montes E, Velázquez‐Reyes J, Coello Coello CA: A comparative study of differential evolution variants for global optimization. Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation, New York. 2006, ACM, 485-492.
- Li X, Branke J, Blackwell T: Particle swarm with speciation and adaptation in a dynamic environment. Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation. 2006, New York: ACM Press, 51-58.
- Kronfeld M, Planatscher H, Zell A: The EvA2 optimization framework. Proceedings of the Learning and Intelligent Optimization Conference (LION IV), Volume 6073 of LNCS. Edited by: Blum C, Battiti R. 2010, Heidelberg: Springer Berlin, 247-250.
- Wolpert D, Macready W: The hungarian method for the assignment problem. IEEE Trans Evol Comput. 1997, 1: 67-82. 10.1109/4235.585893.View Article
- Clerc M, Kennedy J: The particle swarm ‐ explosion, stability, and convergence in a multidimensional complex space. IEEE Trans. Evolut. Comput. 2002, 6: 58-73. 10.1109/4235.985692.View Article
- Kennedy J, Mendes R: Population structure and particle swarm performance. Proceedings of the Congress Evolutionary Computation. 2002, Piscataway: IEEE, 1671-1676.
- Huang N, Shoichet BK, Irwin JJ: Benchmarking sets for molecular docking. J Med Chem. 2006, 49 (23): 6789-6801. 10.1021/jm0608356.PubMed CentralView ArticlePubMed
- DUD: A directory of useful decoys. [http://dud.docking.org]
- Irwin JJ: Community benchmarks for virtual screening. J Comput‐Aided Mol Des. 2008, 22 (3–4): 193-199.View ArticlePubMed
- Good AC, Oprea TI: Optimization of CAMD techniques 3. Virtual screening enrichment studies: a help or hindrance in tool selection?. J Comput‐Aided Mol Des. 2008, 22 (3–4): 169-178.View ArticlePubMed
- Oprea TI, Davis AM, Teague SJ, Leeson PD: Is there a difference between leads and drugs? a historical perspective. J Chem Inf Comput Sci. 2001, 41 (5): 1308-1315. 10.1021/ci010366a.View ArticlePubMed
- Cheeseright TJ, Mackey MD, Melville JL, Vinter JG: FieldScreen: Virtual screening using molecular fields. Application to the DUD data set. J Chem Inf Model. 2008, 48 (11): 2108-2117. 10.1021/ci800110p.View ArticlePubMed
- Hawkins PCD, Warren GL, Skillman AG, Nicholls A: How to do an evaluation: pitfalls and traps. J Comput‐Aided Mol Des. 2008, 22 (3–4): 179-190.PubMed CentralView ArticlePubMed
- Jain AN, Nicholls A: Recommendations for evaluation of computational methods. J Comput‐Aided Mol Des. 2008, 22 (3–4): 133-139.PubMed CentralView ArticlePubMed
- Nicholls A: What do we know and when do we know it?. J Comput‐Aided Mol Des. 2008, 22: 239-255. 10.1007/s10822-008-9170-2.PubMed CentralView ArticlePubMed
- Kirchmair J, Markt P, Distinto S, Wolber G, Langer T: Evaluation of the performance of 3D virtual screening protocols: RMSD comparisons, enrichment assessments, and decoy selection ‐ what can we learn from earlier mistakes?. J Comput‐Aided Mol Des. 2008, 22 (3–4): 213-228.View ArticlePubMed
- Truchon JF, Bayly CI: Evaluating virtual screening methods: good and bad metrics for the “Early Recognition” problem. J Chem Inf Model. 2007, 47 (2): 488-508. 10.1021/ci600426e.View ArticlePubMed
- Good AC, Hermsmeier MA, Hindle S: Measuring CAMD technique performance: a virtual screening case study in the design of validation experiments. J Comput‐Aided Mol Des. 2004, 18 (7): 529-536. 10.1007/s10822-004-4067-1.View ArticlePubMed
- Clark RD, Webster‐Clark DJ: Managing bias in ROC curves. J Comput‐Aided Mol Des. 2008, 22 (3–4): 141-146.View ArticlePubMed
- Mackey MD, Melville JL: Better than random? the chemotype enrichment problem. J Chem Inf Model. 2009, 49 (5): 1154-1162. 10.1021/ci8003978.View ArticlePubMed
- Jahn A, Hinselmann G, Rosenbaum L, Fechner N, Zell A: Boltzmann‐enhanced flexible atom‐pair kernel with dynamic dimension reduction. Mol Inf. 2011, 30 (4): 307-315. 10.1002/minf.201000120.View Article
- Morgan HL: The generation of a unique machine description for chemical structures‐a technique developed at chemical abstracts service. J Chem Doc. 1965, 5 (2): 107-113. 10.1021/c160017a018.View Article
- Kronfeld M, Zell A: Gaussian process assisted particle swarm optimization. Proceedings of the Learning and Intelligent Optimization Conference (LION IV), Volume 6073 of LNCS. Edited by: Blum C, Battiti R. 2010, Heidelberg: Springer Berlin, 139-153.
- Gämperle R, Müller SD, Koumoutsakos P: A Parameter Study for Differential Evolution. Proceedings of the WSEAS International Conference on Advances in Intelligent Systems. 2002, Evolutionary Computation: Fuzzy Systems, 293-298.
- Mezura‐Montes E, Lopez‐Ramirez BC: Comparing bio‐inspired algorithms in constrained optimization problems. Proceedings of the Congress on Evolutionary Computation. 2007, Piscataway: IEEE, 662-669.
- Vesterstrom J, Thomsen R: A comparative study of differential evolution, particle swarm optimization, and evolutionary algorithms on numerical benchmark problems. Proceedings of the Congress on Evolutionary Computation. 2004, Piscataway: IEEE, 1980-1987.
- Kuo MR, Morbidoni HR, Alland D, Sneddon SF, Gourlie BB, Staveski MM, Leonard M, Gregory JS, Janjigian AD, Yee C, Musser JM, Kreiswirth B, Iwamoto H, Perozzo R, Jacobs WR, Sacchettini JC, Fidock DA: Targeting tuberculosis and malaria through inhibition of Enoyl reductase: compound activity and structural data. J Biol Chem. 2003, 278 (23): 20851-20859. 10.1074/jbc.M211968200.View ArticlePubMed
- Stierand K, Rarey M, Drawing the PDB: protein‐ligand complexes in two dimensions. ACS Med Chem Lett. 2010, 1 (9): 540-545. 10.1021/ml100164p.PubMed CentralView ArticlePubMed
- Hansen N, Ostermeier A: Completely derandomized self‐adaptation in evolution strategies. Evol Comput. 2001, 9 (2): 159-195. 10.1162/106365601750190398.View ArticlePubMed
- Colorni A, Dorigo M, Maniezzo V: Distributed optimization by ant colonies. Proceedings of the 1st European Conference on Artificial Life. Edited by: Varela FJ, Bourgine P. 1991, Cambridge: MIT Press, 134-142.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.