Article
Three Descriptor Model Sets a High Standard for the CSAR-NRC HiQ Benchmark
Purchase the full-text
- PDF/HTML,
figures/images,
references and tables,
(where available)
Abstract

Here we report the results we obtained with a proteochemometric approach for predicting ligand binding free energies of the CSAR-NRC HiQ benchmark data set. Using distance-dependent atom-type pair descriptors in a bagged stepwise multiple-linear regression (MLR) model with subsequent complexity reduction we were able to identify three descriptors that can be used to build a very robust regression model for the CSAR-NRC HiQ data set. The model has an R2cv of 0.55, a MUEcv of 1.19, and an RMSEcv of 1.49 on the out-of-bag test set. The descriptors selected are the count of protein atoms in a shell between 4.5 Å and 6 Å around each heavy ligand atom excluding oxygen and phosphorus, the count of sulfur atoms in the vicinity of tryptophan, and the count of aliphatic ligand hydroxy hydrogens. The first two descriptors have a positive sign indicating that they contribute favorably to the binding energy, whereas the count of hydroxy hydrogens contributes unfavorably to the binding free energy observed. The fact that such a simple model can be so effective raises a couple of questions that are addressed in the article.
Tools
-
Add to Favorites
-
Download Citation
-
Email a Colleague -
Permalink
Order Reprints
Rights & Permissions
Citation Alerts
History
- Published In Issue September 26, 2011
- Article ASAPJune 27, 2011
- Just Accepted ManuscriptMay 30, 2011
- Received: January 21, 2011
Cart

ACS
Network






