
ASAP
Web Release Date: April 22,
Bootstrap-Based Consensus Scoring Method for Protein–Ligand Docking
Nano Electronics Research Laboratories and Bio-IT Center, Central Research Laboratories, NEC Corporation, 34, Miyukigaoka, Tsukuba, Ibaraki 305-8501, Japan, and Riken, Next-Generation Supercomputer R&D Center, sixth Fl., Meiji Seimei Kan, 2-1-1 Marunouchi, Chiyoda-ku, Tokyo 100-0005
Received June 11, 2007
Abstract:
To improve the performance of a single scoring function used in a protein–ligand docking program, we developed a bootstrap-based consensus scoring (BBCS) method, which is based on ensemble learning. BBCS combines multiple scorings, each of which has the same function form but different energy-parameter sets. These multiple energy-parameter sets are generated in two steps: (1) generation of training sets by a bootstrap method and (2) optimization of energy-parameter set by a Z-score approach, which is based on energy landscape theory as used in protein folding, against each training set. In this study, we applied BBCS to the FlexX scoring function. Using given 50 complexes, we generated 100 training sets and obtained 100 optimized energy-parameter sets. These parameter sets were tested against 48 complexes different from the training sets. BBCS was shown to be an improvement over single scoring when using a parameter set optimized by the same Z-score approach. Comparing BBCS with the original FlexX scoring function, we found that (1) the success rate of recognizing the crystal structure at the top relative to decoys increased from 33.3% to 52.1% and that (2) the rank of the crystal structure improved for 54.2% of the complexes and worsened for none. We also found that BBCS performed better than conventional consensus scoring (CS).
A protein–ligand docking program has been used to efficiently discover lead compounds for a target protein from a huge compound database. Since the pioneering work by Kuntz et al.,1 numerous docking programs have been developed.2–15 There are two standpoints for assessing docking programs: conformation search or scoring. For the former, it is assessed whether the pose or binding mode of a crystal structure can be reproduced. For the latter, it is assessed whether the predicted score is correlated with experimentally measured binding affinity. Many reports assessing the performance of a docking program have been published.16–29 Of particular interest is the recent critical assessment reported by Warren et al.21 These reports generally reached a conclusion that docking algorithms are highly successful at generating good binding modes, while scoring functions are less successful at correctly identifying a binding mode. More specifically, some of the docking programs can generate a conformation very close to the crystal binding structure, but one cannot know which conformation created by which program is close to the true structure. Under such situation, it is desirable to further improve scoring method to rerank conformations generated. To supplement the weakness of a single scoring function, a consensus scoring (CS) method combining multiple scoring functions was proposed.22–30 Although CS typically does not perform better than the best of the scoring functions combined, it has the advantage of generally providing stable results. Since which scoring function is the best can only be known after the analysis of results, not before, CS is a reliable method for a blind trial. It has been demonstrated22–30 that hit rate or false positive is improved by CS.
In this study, we developed a bootstrap-based consensus scoring (BBCS) method. BBCS combines multiple scorings, each of which has the same function form but different energy-parameter set. BBCS is derived from the idea of bootstrap aggregating (bagging),31 which is one kind of ensemble learning. In general, it is not easy to uniquely determine the energy-parameter set that provides the global minimum of binding free energy over all complexes when the number of data in a training set is not enough. To circumvent this problem, BBCS based on ensemble learning is a rational method.
Multiple energy-parameter sets are determined in two steps: (1) generation of training sets by the bootstrap method and (2) optimization of an energy-parameter set by a Z-score approach against each training set. The Z-score approach is based on energy landscape theory, which leads to describes a funnel-shaped landscape of protein folding32 and has been applied to folding simulation, structure prediction33–38 and de novo design39 of proteins and protein-peptide docking.40 We consider that protein–ligand docking also has a funnel-shaped landscape, as discussed by Camacho and Vajda,41 and the Z-score approach works well for protein–ligand docking.
Machine leaning has been applied in several ways to overcome difficulties in structure-based consensus scoring.27, 42, 43 For example, Jacobsson et al. used the ensemble method (bagging) to create rule-based prediction models based on the scoring matrices.27 Another example is in a recent paper of Antes et al., who applied an ensemble method to neural networks and k-nearest neighbor models.42 Although the aim of theirs and our work is to find a more plausible binding mode among numerous modes, there are marked differences. They used an ensemble method to optimize a parameter set efficiently, while we use it to improve the rank of the crystal structure relative to decoy structures. In other words, they determined a single parameter set used for a scoring function, not multiple parameter sets as ours does. Their parameter set is optimized so that the binding mode with the lowest rmsd becomes close to the crystal structure, while our parameter set is optimized so that crystal structure is maximally separated from decoys with respect to score.
Preparation of training and test sets. In this study, 98 protein–ligand complexes were used. These complexes are classified into three types26 according to the dominant interactions between the protein and ligand: electrostatic interaction for type 1, hydrophobic for type 2, and mixed electrostatic and hydrophobic for type 3. The complexes in each type were randomly divided into “training set” and “test set,” as listed in Table 1. For each complex, the crystal structure and 100 docked structures (decoys) generated by Wang et al.26 were used. Their docked structures are carefully generated in conformational space of ligand as completely as possible so that their distribution is not biased by the docking algorithm employed. Such structures allow various scoring functions to be compared on the same ground. Wang et al. used AutoDock,7 because it can generate a wide variety of conformations during long multiple stochastic searches on potential energy surface. Wang’s data set is useful for testing the performance of scoring independently of the performance of docking algorithm. Our intention of using Wang’s data set is to proceed to refining scoring functions beyond comparing their performances. To show an example of clear advantage of Wang’s data set over other data sets such as the data set generated by FlexX program with default protocol, we compared the distributions of conformations in the two data sets, in Supporting Information S-1. There is seen a remarkable difference for some of the complexes; FlexX fails to generate conformations sufficiently thoroughly in a low range of rmsd for the complex shown. If such insufficient data set of conformations is used for test of scoring, the performance of scoring should be coupled with the performance of docking algorithm. On the other hand, Wang’s data set is more complete, containing conformations searched in wide range. Therefore, we consider that Wang’s data set is useful as benchmark set for scoring.
Wang’s data set is published online at http://sw16.im.med.umich.edu/software/xtool. In the published structures, bound water molecules are eliminated but metal atoms are included. In this study, metal atoms are also eliminated.
Scoring Function. In BBCS, any scoring function proposed can be employed. We employ the FlexX scoring function, which is given as24

For convenience in optimizing energy-parameter, we rewrite eq 1 in the form

Determining Multiple Energy-Parameter Sets. Multiple energy-parameter sets are determined in two steps, as shown in Figure 1.
| | Figure 1. Process for determining multiple parameter sets. |
Step 1: Generation of multiple training sets.
Each observation in our sampling is taken as one crystal structure plus Ndecoy decoy structures: C ={ rcrystal, r1decoy, r2decoy, r3decoy, …, rNdecoydecoy}, for a given protein–ligand system (complex), where r represents the Cartesian coordinates of the structure. The size of the original sample is the number Ncomplex of complexes. Then, from the original sample, a bootstrap sample T(B) = {C(1), C(2), C(3), …, C(Ns)} of size Ns is generated by randomly sampling with replacement, with the superscript on C indicating the sampled complex. This means that some observation C in the original sample may be included several times in the bootstrap sample T(B), i.e., C(i) = C(j) is allowed for i ≠ j. By repeating the above procedure Ntrain times, multiple training sets denoted by TMul = {T(1), T(2), T(3), …,T(Ntrain)} are generated. In this study, Ndecoy, Ns, and Ntrain are all 100. In various applications of bagging, Ntrain of order 10–103 is empirically found to be suitable.31
Step 2: Optimization of the parameter set.
We want to determine the parameter set p(B) optimized for the training set T(B). For a set of structures C = {rcrystal, r1decoy, r2decoy, r3decoy, …, rNdecoydecoy} of a given complex, the Z-score is defined as a function of parameter set p as follows,





Combining Multiple Predictors (Consensus Ranking Method). Multiple predictors can be combined in various ways to make a final prediction. Here, we consider two methods, rank-by-wcs (weighted consensus score) and rank-by-number. Multiple predictors under consideration are pMul = {p(1), p(2), p(3), …, p(Npred)} for BBCS and Pcs = {FlexX, DOCK, GOLD, PMF, ChemScore} for CS.
(i) Rank-by-wcs. We introduce a weighted consensus score as an extension of rank-by-rank.30 First, the protein–ligand docking structures including crystal structure and decoys are ranked according to the ascending order of scores calculated by each predictor. A matrix of rank is then obtained as shown in Figure 2a. Next, for each structure, population PR = nR/Npred is calculated, where nR is the number of the predictors that rank the structure as Rth among Npred predictors. As a result, a matrix of PR is obtained, as shown in Figure 2b. For example, DS_1 gets first, second, and third rank by 25%, 19%, and 12% of all predictors, respectively. A weighted consensus score, SWCS, is defined by using PR as a weight factor:


(ii) Rank-by-Number. As an alternative, consensus rank by rank-by-number30 is also calculated. First, an average score is calculated for every docking structure as follows:


Why BBCS Works. A semi-complete scoring function can be constructed if all possible conformations of all complexes existing in nature are used for optimizing a parameter set. Note that it is complete only within a parameter space (e.g., adjustment factor p = {αi}) of a given scoring function. However, because it is impossible to use all complexes as a training set, some error between actually constructed scoring function and semi-complete one is inevitable. The error becomes large, especially when the number of data in the training set is small. In this case, BBCS, whose key idea is based on bootstrap aggregating (bagging) in ensemble learning, reduces the error of scores between the actual scoring function and the semi-complete one. On the basis of bagging theory,31 we can explain why BBCS works, for example, in case of rank-by-number. Here, eA is defined as the error of the combined predictor:



Validation of Z-score Approach (Optimization Method). Figure 3 shows the correlation between binding scores and rmsd values of the heavy atom positions of a ligand against the crystal structure as a reference, where scores are obtained with the original FlexX parameter set45 and a typical optimized parameter set in pMul. For the plot of 1d3p (Figure 3a), the original parameter set succeeds in ranking the crystal structure at the top but leaves only a marginal score gap between the crystal structure and decoys. In contrast, the optimized parameter set makes the score gap quite clear. For the plot of 1buc (Figure 3b), the optimized parameter set makes the crystal structure top-ranked (low score), but the original parameter set does not. These results are examples of how the Z-score approach successfully discriminates crystal structure from decoys. The plots with the optimized parameter set appear to describe a funnel shaped energy landscape.
| | Figure 3. Typical plots of binding score vs rmsd value: (a) 1d3p and (b) 1buc . The triangles represent values obtained from eq 1 by using the original FlexX parameter set and the circles represent values obtained from eq 2 by using a typical optimized parameter set in pMul. |
Figure 4 shows the average Z-scores calculated by

| | Figure 4. Average of Z-score, ZBBCSave, for each complex in the training and test sets. The average is taken over all parameter sets contained in multiple parameter sets pMul. |
It is interesting to compare ZBBCSave with ZFlexX calculated with the original parameter set45 of the FlexX scoring function. Table 2 lists the distribution of ZBBCSave − ZFlexX. Negative values of ZBBCSave − ZFlexX mean that pMul provides a larger score gap than the original parameter set does, i.e., higher performance in discriminating the crystal structure from decoys. High percentages for the training set (76%) and test set (68.75%) were obtained.
Validation of BBCS. To determine whether multiple parameter sets used in BBCS are more effective than a single parameter set optimized using all complexes in the training set without bootstrapping, we compared the results of BBCS with those of one reparameterization of the FlexX scoring function, referred to as “single scoring”. To make a fair comparison, the parameter set for single scoring was also optimized with the Z-score approach.
We compared their performances against 48 complexes in the test set in two ways. The first comparison was with respect to the success rate of recognizing the crystal structure, which was calculated as the percentage of complexes whose crystal structure or low rmsd structures are top-ranked relative to decoys. The results are summarized in Table 3, where results mentioned in the next subsections are also included. BBCS attains the highest success rate of 52%, compared with 18–33% of the others in recognizing crystal structure. It is clear that both BBCSs, especially BBCS by rank-by-wcs, perform better than single scoring. This conclusion remains valid, even if we relax the success criteria by including low rmsd structures with rmsd < 1.0 or 2.0 Å as hits.
The second comparison was of the rank R of the crystal structure relative to 100 decoys. Table 4 lists the percentage of complexes giving positive, zero, and negative DS−B, where DS−B is an indicator defined by

| | Figure 5. Average of DX−B values, DX−B, for values over 48 complexes. Note that the averages of Rsingle, RFlexX and RCS are 10.4, 10.8, and 11.5, respectively. |
Comparison of BBCS and the Original FlexX Scoring Function. Now that we have established the advantage of BBCS over single scoring on parameter sets determined by the same Z-score approach, let us proceed to a comparison of BBCS with the original FlexX scoring function.45 Table 3 shows that BBCS has a higher success rate for the recognizing crystal structure or low rmsd structures than the original FlexX, especially for rank-by-wcs. Figure 6 shows the DF−B value of each complex in the test set, where

Comparing single scoring with the Z-score approach and original FlexX, one sees from Tables 3 and 4 that they are quite similar in performance in spite of the difference in optimization method.
In this paper, we have restricted ourselves to rescore conformations generated by AutoDock. When one wants to achieve best performance BBCS for the conformations generated by FlexX docking program, it may be advantageous to newly train scoring function only against a large number of conformations created by FlexX program. In practice, generated conformations are biased to some extent, because conformation search and scoring function are coupled. Thus it is not efficient to train against a set including conformations particularly disfavored by FlexX docking program. However, its determination is beyond the scope of this paper.
Comparison of BBCS and CS. The scoring functions used for CS were FlexX and D-score, G-score, PMF, and ChemScore, the latter four of which are included in CScore of the Sybyl module (Tripos Inc.). The number of predictors giving the best or better performance for each method should be used when different methods are compared. Thus, the number of predictors is 100 for BBCS and 5 for CS in this study.
Table 3 shows that compared with CS, BBCS gives a higher success rate for recognizing crystal structure, especially for rank-by-wcs. Figure 7 shows DC−B for each complex in the test set, where

CS gave the worst performance among all scorings in Tables 3 and 4; it was even worse than the predictor FlexX contained in CS. The difference between BBCS and CS lies in whether predictors are constructed based on a statistically rational method (bootstrapping) or constructed by simply gathering scoring functions. For CS, there is no guarantee that predictors will supplement each other’s weak points if ones that give quite different results or almost the same results are gathered.
CPU time required is another important standpoint. Note that as long as docked conformations are generated by single procedure and these conformations are rescored by scoring functions, CPU time is not proportional to the number of predictors, i.e. CPU time of BBCS is not simply 20 times longer than that of CS in this study. Actually, there was little difference of CPU time between them. This is because, for BBCS, once values of Fi in eq 1, are calculated as a function of coordinates, scores for any parameter sets are simply obtained as ΣciFi, whereas for CS, all scoring functions must be separately calculated because their function forms are different.
Worst Case by BBCS. There remains some complexes for which BBCS still fails to highly rank crystal structure among decoys; namely, the worst case is 1tet for which RBBCS was 75th in 101 protein–ligand docking structures for rank-by-wcs. Hence, it is interesting to examine why BBCS failed for 1tet . As shown in Figure 8a, the ligand is a citrate. Figure 8b shows the difference between ligand binding sites of the crystal structure and of the top-ranked (incorrect) structure. For the crystal structure, the citrate binds to the receptor on the surface. To our knowledge, many docking programs regard such structure as energetically and structurally unstable. On the other hand, for the top-ranked structure, the citrate binds to the receptor in the pocket, which is mostly regarded as a more plausible site. A correct binding site can not be predicted. The reason for this is explained as follows. The binding mode of the crystal structure in Figure 8c shows two important histidine residues. It is likely that these imidazole rings are positively charged and make strong ion–ion interactions with two carboxylate ions of citrate, as discussed by Shoham et al.46 In our study, however, the histidine residues were treated as neutral. That is a main reason why the crystal structure was top-ranked. However, it must be remarked that one generally assumes pH = 7.0 for a blind trial, without considering delicate protonation state of histidines and that, in this case, histidines are neutral. On the condition that histidines are neutral, it is no wonder that the incorrect structure obtained top rank, because hydrogen bonds between ligand and four residues are plausibly formed for that structure. Thus, not only the performance of the scoring function but also adequate assignment of protonation state is very important.
| | Figure 8. Structure of 1tet . (a) Ligand: citrate. (b) Binding site of crystal structure and the top-ranked (incorrect) structure obtained by BBCS. (c) Binding mode of crystal structure. (d) Binding mode of (incorrect) top-ranked structure. |
BBCS was examined for pose prediction of protein–ligand docking. BBCS comes from two key ideas: the bagging to reduce the statistical error and the Z-score approach to distinguish the crystal structure from decoys. BBCS was shown to be an improvement over single scoring in a comparison of results obtained with the same Z-score approach. BBCS also performed better than original FlexX and CS in ranking the crystal structure. In particular, BBCS’s success rate of recognizing the crystal structure at the top was higher. As for the strategies of combining multiple scoring functions, rank-by-wcs performed better than rank-by-number. Once the parameter sets optimized in BBCS are obtained, one can rescore placements generated by arbitrary docking algorithm in the following steps.
(1) Perform only scoring of their placements by FlexX module to obtain individual energy components.
(2) Calculate rescores of the placements by using the parameter sets optimized in BBCS.
(3) Perform consensus scoring based on the rescores obtained.
It would be interesting to apply BBCS to other scoring functions different in form from the FlexX function.
This work is supported in part by New Energy and Industrial Technology Development Organization, Japan (NEDO) under the project of Development of Basic Technologies for Advanced Production Methods Using Microorganism Functions.
Sections S1 and S2. This material is available free of charge via the Internet at http://pubs.acs.org.
* Corresponding author. E-mail address: h-fukunishi@bu.jp.nec.com. Phone: +81 298 856 6155 . Fax: +81 298 856 6136.
† Nano Electronics Research Laboratories, NEC Corporation.
‡ Bio-IT Center, NEC Corporation.
§ Riken, Next-Generation Supercomputer R&D Center.
1. Kuntz, I. D.; Blaney, J. M.; Oatley, S. J.; Sheridan, R. P.; Langridge, R.; Ferrin, T. E. A geometric approach to macromolecule-ligand interactions. J. Mol. Biol. 1982, 161, 269–288.
2. Abagyan, R. A.; Totrov, M. M.; Kuznetsov, D. A. ICM: a new method for structure modeling and design: applications to docking and structure prediction from the distorted native conformation. J. Comput. Chem. 1994, 15, 488–506.
3. Rarey, M.; Kramer, B.; Lengauer, T.; Klebe, G. A fast flexible docking method using an incremental construction algorithm. J. Mol. Biol. 1996, 261, 470–89.
4. Welch, W.; Ruppert, J.; Jain, A. N. Hammerhead: fast, fully automated docking of flexible ligands to protein binding sites. Chem. Biol. 1996, 3, 449–462.
5. Jones, G.; Wilett, P.; Glen, R. C.; Leach, A. R.; Taylor, R. Development and validation of a genetic algorithm for flexible docking. J. Mol. Biol. 1997, 267, 727–748.
6. McMartin, C.; Bohacek, R. QXP: Powerful, rapid computer algorithms for structure-based drug design. J. Comput.-Aided Mol. Des. 1997, 11, 333–344.
7. Morris, G. M.; Goodsell, D. S.; Halliday, R.; Huey, R.; Hart, W. E.; Belew, R. K.; Olson, A. J. Automated docking using a lamarckian genetic algorithm and an empirical binding free energy function. J. Comput. Chem. 1998, 19, 1639–1662.
8. Baxter, C. A.; Murray, C. W.; Clark, D. E.; Westhead, D. R.; Eldridge, M. D. Flexible docking using Tabu search and an empirical estimate of binding affinity. Proteins 1998, 33, 367–382.
9. Hou, T.; J., W.; Chen, L.; Xu, X. Automated docking of peptides and proteins by using a genetic algorithm combined with a tabu search. Protein Eng. 1999, 12, 639–647.
10. Liu, M.; Wang, S. MCDOCK: a Monte Carlo simulation approach to the molecular docking problem. J. Comput.-Aided Mol. Des. 1999, 13, 435–451.
11. Perola, E.; Xu, K.; Kollmeyer, T. M.; Kaufmann, S. H.; Prendergast, F. G.; Pang, Y. P. Successful virtual screening of a chemical database for farnesyltransferase inhibitor leads. J. Med. Chem. 2000, 43, 401–408.
12. Ewing, T. J.; Makino, S.; Skillman, A. G.; Kuntz, I. D. DOCK 4.0: search strategies for automated molecular docking of flexible molecule databases. J. Comput.-Aided. Mol. Des 2001, 15, 411–428.
13. Zavodszky, M. I.; Sanschagrin, P. C.; Korde, R. S.; Kuhn, L. A. Distilling the essential features of a protein surface for improving protein. ligand docking, scoring, and virtual screening. J. Comput.-Aided. Mol. Des. 2002, 16, 883–902.
14. Jain, A. N. Surflex: fully automatic flexible molecular docking using a molecular similarity-based search engine. J. Med. Chem. 2003, 46, 499–511.
15. Friesner, R. A.; Banks, J. L.; Murphy, R. B.; Halgren, T. A.; Klicic, J. J.; Mainz, D. T.; Repasky, M. P.; Knoll, E. H.; Shelley, M.; Perry, J. K.; Shaw, D. E.; Francis, P.; Shenkin, P. S. Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J. Med. Chem. 2004, 47, 1739–1749.
16. Kellenberger, E.; Rodrigo, J.; Muller, P.; Rognan, D. Comparative evaluation of eight docking tools for docking and virtual screening accuracy. Proteins 2004, 57, 225–242.
17. Perola, E.; Walters, W. P.; Charifson, P. S. A detailed comparison of current docking and scoring methods on systems of pharmaceutical relevance. Proteins 2004, 56, 235–249.
18. Kontoyianni, M.; McClellan, L. M.; Sokol, G. S. Evaluation of docking performance: Comparative data on docking algorithms. J. Med. Chem. 2004, 56, 235–249.
19. Kroemer, R. T.; Vulpetti, A.; McDonald, J. J.; Rohrer, D. C.; Trosset, J. Y.; Giordanetto, F.; Cotesta, S.; McMartin, C.; Kihlen, M.; Stouten, P. F. W. Assessment of docking poses: Interactions-based accuracy classification (IBAC) versus crystal structure deviations. J. Chem. Inf. Comput. Sci. 2004, 44, 871–881.
20. Kontoyianni, M.; Sokol, G. S.; McClellan, L. M. Evaluation of library ranking efficacy in virtual screening. J. Comput. Chem. 2005, 26, 11–22.
21. Warren, G. L.; Andrews, C. W.; Capelli, A.-M.; Clarke, B.; LaLonde, J.; Lambert, M. H.; Lindvall, M.; Nevins, N.; Semus, S. F.; Senger, S.; Tedesco, G.; Wall, I. D.; Woolven, J. M.; Peishoff, C. E.; Head, M. S. A critical assessment of docking programs and scoring functions. J. Med. Chem. 2006, 49, 5912–5931.
22. Charifson, P. S.; Corkery, J. J.; Murcko, M. A.; Walters, W. P. Consensus scoring: A method for obtaining improved hit rates from docking databases of three-dimensional structures into proteins. J. Med. Chem. 1999, 42, 5100–5109.
23. Bissantz, C.; Folkers, G.; Rognan, D. Protein-based virtual screening of chemical databases: 1. Evaluation of different docking/scoring combinations. J. Med. Chem. 2000, 43, 4759–4767.
24. Stahl, M.; Rarey, M. Detailed analysis of scoring functions for virtual screening. J. Med. Chem. 2001, 44, 1035–1042.
25. Clark, R. D.; Strizhev, A.; Leonard, J. M.; Blake, J. F.; Matthew, J. B. Consensus scoring for ligand/protein interactions. J. Mol. Graph. Model. 2002, 20, 281–295.
26. Wang, R.; Lu, Y.; Wang, S. Comparative evaluation of 11 scoring functions for molecular docking. J. Med. Chem. 2003, 46, 2287–2303.
27. Jacobsson, M.; Liden, P.; Stjernschantz, E.; Bostrom, H.; Norinder, U. Improvement structure-based virtual screening by multivariate analysis of scoring data. J. Med. Chem. 2003, 46, 5781–5789.
28. Verdonk, M. L.; Berdini, V.; Hartshorn, M. J.; Mooij, W. T. M.; Murray, C. W.; Taylor, T. D.; Watson, P. Virtual screening using protein-ligand docking: Avoiding artificial enrichment. J. Chem. Inf. Comput. Sci. 2004, 44, 793–806.
29. Yang, J. M.; Chen, Y. F.; Shen, T. W.; Kristal, B. S.; Hsu, D. F. Consensus scoring criteria for improving enrichment in virtual screening. J. Chem. Inf. Model. 2005, 45, 1134–1146.
30. Wang, R Wang S. How does consensus scoring work for virtual library screening? An idealized computer experiment. J. Chem. Inf. Model. 2001, 41, 1422–1426.
31. Breiman, L. Bagging predictors. Machine Learning. 1996, 24, 123–140.
32. Bryngelson, J. D.; Onuchic, J. N.; Socci, N. D.; Wolynes, P. G. Funnels, Pathways, and the energy landscape of protein folding: a synthesis. Proteins 1995, 21, 167–195.
33. Koretke, K. K.; Luthey-Schulten, Z. A.; Wolynes, P. G. Self-consistently optimized energy functions for protein structure prediction by molecular dynamics. Proc. Natl. Acad. Sci. USA 1998, 95, 2932–2937.
34. Fukunishi, H.; Watanabe, O.; Takada, S. On the Hamiltonian replica exchange method for efficient sampling of biomolecular system: Application to protein structure prediction. J. Chem. Phys. 2002, 116, 9058–9067.
35. Fujitsuka, Y.; Takada, S.; Luthey-Schulten, Z. A.; Wolynes, P. G. Optimizing physical energy functions for protein folding. Proteins 2004, 54, 88–103.
36. Chikenji, G.; Fujitsuka, Y.; Takada, S. Protein folding mechanisms and energy landscape of src SH3 domain studied by a structure prediction toolbox. Chem. Phys. 2004, 307, 157–162.
37. Chikenji, G.; Fujitsuka, Y.; Takada, S. Shaping up the protein folding funnel by local interaction: Lesson from a structure prediction study. Proc. Natl. Acad. Sci. USA 2006, 103, 3141–3146.
38. Fujitsuka, Y.; Chikenji, G.; Takada, S. SimFold energy function for de novo protein structure prediction: Consensus with Rosetta. Proteins 2006, 62, 381–398.
39. Jin, W.; Kambara, O.; Sasakawa, H.; Tamura, A.; Takada, S. De novo design of foldable proteins with smooth folding funnel: automated negative design and experimental verifications. Structure 2003, 11, 581–590.
40. Liu, Z.; Dominy, B. N.; Shakhnovich, E. I. Structural Mining: Self-Consistent Design on Flexible Protein-Peptide Docking and Transferable Binding Affinity Potential. J. Am. Chem. Soc. 2004, 126, 8515–8528.
41. Camacho, C. J.; Vajda, S. Protein docking along smooth association pathways. Proc. Natl. Acad. Sci. USA 2001, 98, 10636–10641.
42. Antes, I.; Merkwirth, C.; Lengauer, T. J. POEM: Parameter Optimization Using Ensemble Methods: Application to Target Specific Scoring Functions. J. Chem. Inf. Model. 2005, 45, 1291–1302.
43. Teramoto, R.; Fukunishi, H. Supervised consensus scoring for docking and virtual screening. J. Chem. Inf. Model 2007, 47, 526–534.
44. Böhm, H. J. Prediction of binding constants of protein ligands: A fast method for the prioritization of hits obtained from de novo design or 3D database search programs. J. Comput.-Aided Mol. Des. 1998, 12, 309–323.
45. FlexX, version 1.12.2 L; BioSolveIT GmbH: Sankt Augustin, Germany.
46. Shoham, M.; Scherf, T.; Anglister, J.; Levitt, M.; Merritt, E. A. HOL, WGJ. Structure diversity in a conserved cholera toxin epitope involved in ganglioside binding. Protein Sci. 1995, 4, 841–848.
| Table 1. List of Protein–Ligand Complexes (PDB Codes) Classified According to the Dominant Interactions between Protein and Ligand | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| Table 2. Distribution of ZBBCSave − ZFlexX | ||
|---|---|---|
|
| Table 3. Success Rate of Recognizing the Crystal Structure (Percentage of Complexes Whose Crystal Structure or Low rmsd Structures Are Top-Ranked) | |||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| |||||||||||||||||||||||||||||||||||||||||
| a Rank-by-wcs. b Rank-by-number. c Parameter set optimized by Z-score approach. | |||||||||||||||||||||||||||||||||||||||||
| Table 4. Comparison of BBCS and Various Scoring with Respect to the Rank R of crystal structurea | ||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||||||||||||||||||||||||||||||||
| a Percentage of complexes giving positive, zero, and negative DX−B values are shown for all 48 complexes in the test set. DX−B (= RX − RBBCS) is the rank difference. b Rank difference between single scoring with Z-score approach and BBCS. c Rank difference between original FlexX and BBCS. d Rank difference between CS and BBCS. | ||||||||||||||||||||||||||||||||||||||||||||
| Table 5. Rank Difference Averaged over Complexes Whose RX Falls in the Indicated Range | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| a Average of rank differences between single scoring with Z-score approach and BBCS. b Average of rank differences between original FlexX and BBCS. c Average of rank differences between CS and BBCS. d D̅C−B value when 4xia is omitted. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||