pKa Prediction of Monoprotic Small Molecules the SMARTS Way
Abstract
Realizing favorable absorption, distribution, metabolism, elimination, and toxicity profiles is a necessity due to the high attrition rate of lead compounds in drug development today. The ability to accurately predict bioavailability can help save time and money during the screening and optimization processes. As several robust programs already exist for predicting logP, we have turned our attention to the fast and robust prediction of pKa for small molecules. Using curated data from the Beilstein Database and Lange’s Handbook of Chemistry, we have created a decision tree based on a novel set of SMARTS strings that can accurately predict the pKa for monoprotic compounds with R2 of 0.94 and root mean squared error of 0.68. Leave-some-out (10%) cross-validation achieved Q2 of 0.91 and root mean squared error of 0.80.
Cited By
This article is cited by 26 publications.
- Yipin Lu, Shankara Anand, William Shirley, Peter Gedeck, Brian P. Kelley, Suzanne Skolnik, Stephane Rodde, Mai Nguyen, Mika Lindvall, Weiping Jia. Prediction of pKa Using Machine Learning Methods with Rooted Topological Torsion Fingerprints: Application to Aliphatic Amines. Journal of Chemical Information and Modeling 2019, 59
(11)
, 4706-4719. https://doi.org/10.1021/acs.jcim.9b00498
- Haoyu S. Yu, Mark A. Watson, and Art D. Bochevarov . Weighted Averaging Scheme and Local Atomic Descriptor for pKa Prediction Based on Density Functional Theory. Journal of Chemical Information and Modeling 2018, 58
(2)
, 271-286. https://doi.org/10.1021/acs.jcim.7b00537
- Art D. Bochevarov, Mark A. Watson, and Jeremy R. Greenwood , Dean M. Philipp . Multiconformation, Density Functional Theory-Based pKa Prediction in Application to Large, Flexible Organic Molecules with Diverse Functional Groups. Journal of Chemical Theory and Computation 2016, 12
(12)
, 6001-6019. https://doi.org/10.1021/acs.jctc.6b00805
- John Manchester, Grant Walkup, Olga Rivin and Zhiping You. Evaluation of pKa Estimation Methods on 211 Druglike Compounds. Journal of Chemical Information and Modeling 2010, 50
(4)
, 565-571. https://doi.org/10.1021/ci100019p
- Yilin Meng and Adrian E. Roitberg. Constant pH Replica Exchange Molecular Dynamics in Biomolecules Using a Discrete Protonation Model. Journal of Chemical Theory and Computation 2010, 6
(4)
, 1401-1412. https://doi.org/10.1021/ct900676b
- Xiong-Yi Huang, Hua-Jing Wang and Jing Shi. Theoretical Study on Acidities of (S)-Proline Amide Derivatives in DMSO and Its Implications for Organocatalysis. The Journal of Physical Chemistry A 2010, 114
(2)
, 1068-1081. https://doi.org/10.1021/jp909043a
- Shuming Zhang, Jon Baker and Peter Pulay . A Reliable and Efficient First Principles-Based Method for Predicting pKa Values. 2. Organic Acids. The Journal of Physical Chemistry A 2010, 114
(1)
, 432-442. https://doi.org/10.1021/jp9067087
- Adam C. Lee and Gordon M. Crippen. Predicting pKa. Journal of Chemical Information and Modeling 2009, 49
(9)
, 2013-2033. https://doi.org/10.1021/ci900209w
- Jialu Wu, Yu Kang, Peichen Pan, Tingjun Hou. Machine learning methods for pKa prediction of small molecules: Advances and challenges. Drug Discovery Today 2022, 27
(12)
, 103372. https://doi.org/10.1016/j.drudis.2022.103372
- Patrick J. Ropp, Jesse C. Kaminsky, Sara Yablonski, Jacob D. Durrant. Dimorphite-DL: an open-source program for enumerating the ionization states of drug-like small molecules. Journal of Cheminformatics 2019, 11
(1)
https://doi.org/10.1186/s13321-019-0336-9
- Tadi Venkata Sivakumar, Anirban Bhaduri, Rajasekhara Reddy Duvvuru Muni, Jin Hwan Park, Tae Yong Kim. SimCAL: a flexible tool to compute biochemical reaction similarity. BMC Bioinformatics 2018, 19
(1)
https://doi.org/10.1186/s12859-018-2248-5
- Nripendra Madhab Biswas, Amit Shard, Sagarkumar Patel, Pinaki Sengupta. Drug development and bioanalytical method validation for a novel anticancer molecule, 4‐(dimethylamino)‐2‐(p‐tolylamino) thiazole‐5‐carbonitrile. Drug Development Research 2018, 79
(8)
, 391-399. https://doi.org/10.1002/ddr.21462
- Kavya Sri Nemani, Amit Shard, Pinaki Sengupta. Establishment of a quantitative bioanalytical method for an acetylcholinesterase inhibitor Ethyl 3-(2-(4-fluorophenyl) amino)-4-phenylthiazol-5-yl)-3-oxopropanoate including its physicochemical characterization and in vitro metabolite profiling using Liquid Chromatography-Mass Spectrometry. Journal of Chromatography B 2018, 1096 , 214-222. https://doi.org/10.1016/j.jchromb.2018.08.019
- K.R. Przybylak, J.C. Madden, E. Covey-Crump, L. Gibson, C. Barber, M. Patel, M.T.D. Cronin. Characterisation of data resources for
in silico
modelling: benchmark datasets for ADME properties. Expert Opinion on Drug Metabolism & Toxicology 2018, 14
(2)
, 169-181. https://doi.org/10.1080/17425255.2017.1316449
- Frank C. Pickard, Gerhard König, Florentina Tofoleanu, Juyong Lee, Andrew C. Simmonett, Yihan Shao, Jay W. Ponder, Bernard R. Brooks. Blind prediction of distribution in the SAMPL5 challenge with QM based protomer and pK
a corrections. Journal of Computer-Aided Molecular Design 2016, 30
(11)
, 1087-1100. https://doi.org/10.1007/s10822-016-9955-7
- Harun M. Patel, Malleshappa N. Noolvi, Poonam Sharma, Varun Jaiswal, Sumit Bansal, Sandeep Lohan, Suthar Sharad Kumar, Vikrant Abbot, Saurabh Dhiman, Varun Bhardwaj. Quantitative structure–activity relationship (QSAR) studies as strategic approach in drug discovery. Medicinal Chemistry Research 2014, 23
(12)
, 4991-5007. https://doi.org/10.1007/s00044-014-1072-3
- . Nitrogen Acids. 2013, 77-92. https://doi.org/10.1201/b16128-12
- R. Fraczkiewicz. In Silico Prediction of Ionization. 2013https://doi.org/10.1016/B978-0-12-409547-2.02610-X
- Shuming Zhang. A reliable and efficient first principles‐based method for predicting p
K
a
values. 4. organic bases. Journal of Computational Chemistry 2012, 33
(31)
, 2469-2482. https://doi.org/10.1002/jcc.23068
- Nina Jeliazkova. Web tools for predictive toxicology model building. Expert Opinion on Drug Metabolism & Toxicology 2012, 8
(7)
, 791-801. https://doi.org/10.1517/17425255.2012.685158
- John C. Dearden. Prediction of Physicochemical Properties. 2012, 93-138. https://doi.org/10.1007/978-1-62703-050-2_6
- Paul G. Seybold. Quantum Chemical‐QSPR Estimation of the Acidities and Basicities of Organic Compounds. 2012, 83-104. https://doi.org/10.1016/B978-0-12-396498-4.00015-6
- Nina Jeliazkova, Vedrin Jeliazkov. AMBIT RESTful web services: an implementation of the OpenTox application programming interface. Journal of Cheminformatics 2011, 3
(1)
https://doi.org/10.1186/1758-2946-3-18
- Nina Jeliazkova, Nikolay Kochev. AMBIT‐SMARTS: Efficient Searching of Chemical Structures and Fragments. Molecular Informatics 2011, 30
(8)
, 707-720. https://doi.org/10.1002/minf.201100028
- David Lagorce, Christelle Reynes, Anne‐Claude Camproux, Maria A. Miteva, Olivier Sperandio, Bruno O. Villoutreix. In Silico
ADME/Tox Predictions. 2011, 29-124. https://doi.org/10.1002/9780470915110.ch2
- A. P. Harding, P. L. A. Popelier. pKa Prediction from an ab initio bond length: part 2—phenols. Physical Chemistry Chemical Physics 2011, 13
(23)
, 11264. https://doi.org/10.1039/c1cp20379g