Generative Deep Learning for Targeted Compound Design
- Tiago SousaTiago SousaCentre of Biological Engineering, Campus Gualtar, University of Minho, 4710-057 Braga, PortugalMore by Tiago Sousa
- ,
- João CorreiaJoão CorreiaCentre of Biological Engineering, Campus Gualtar, University of Minho, 4710-057 Braga, PortugalMore by João Correia
- ,
- Vítor PereiraVítor PereiraCentre of Biological Engineering, Campus Gualtar, University of Minho, 4710-057 Braga, PortugalMore by Vítor Pereira
- , and
- Miguel Rocha*Miguel Rocha*Email: [email protected]Centre of Biological Engineering, Campus Gualtar, University of Minho, 4710-057 Braga, PortugalMore by Miguel Rocha
Abstract

In the past few years, de novo molecular design has increasingly been using generative models from the emergent field of Deep Learning, proposing novel compounds that are likely to possess desired properties or activities. De novo molecular design finds applications in different fields ranging from drug discovery and materials sciences to biotechnology. A panoply of deep generative models, including architectures as Recurrent Neural Networks, Autoencoders, and Generative Adversarial Networks, can be trained on existing data sets and provide for the generation of novel compounds. Typically, the new compounds follow the same underlying statistical distributions of properties exhibited on the training data set Additionally, different optimization strategies, including transfer learning, Bayesian optimization, reinforcement learning, and conditional generation, can direct the generation process toward desired aims, regarding their biological activities, synthesis processes or chemical features. Given the recent emergence of these technologies and their relevance, this work presents a systematic and critical review on deep generative models and related optimization methods for targeted compound design, and their applications.
This publication is licensed under
License Summary*
You are free to share (copy and redistribute) this article in any medium or format within the parameters below:
Creative Commons (CC): This is a Creative Commons license.
Attribution (BY): Credit must be given to the creator.
Non-Commercial (NC): Only non-commercial uses of the work are permitted.
No Derivatives (ND): Derivative works may be created for non-commercial purposes, but sharing is prohibited.
*Disclaimer
This summary highlights only some of the key features and terms of the actual license. It is not a license and has no legal value. Carefully review the actual license before using these materials.
License Summary*
You are free to share (copy and redistribute) this article in any medium or format within the parameters below:
Creative Commons (CC): This is a Creative Commons license.
Attribution (BY): Credit must be given to the creator.
Non-Commercial (NC): Only non-commercial uses of the work are permitted.
No Derivatives (ND): Derivative works may be created for non-commercial purposes, but sharing is prohibited.
*Disclaimer
This summary highlights only some of the key features and terms of the actual license. It is not a license and has no legal value. Carefully review the actual license before using these materials.
License Summary*
You are free to share (copy and redistribute) this article in any medium or format within the parameters below:
Creative Commons (CC): This is a Creative Commons license.
Attribution (BY): Credit must be given to the creator.
Non-Commercial (NC): Only non-commercial uses of the work are permitted.
No Derivatives (ND): Derivative works may be created for non-commercial purposes, but sharing is prohibited.
*Disclaimer
This summary highlights only some of the key features and terms of the actual license. It is not a license and has no legal value. Carefully review the actual license before using these materials.
License Summary*
You are free to share (copy and redistribute) this article in any medium or format within the parameters below:
Creative Commons (CC): This is a Creative Commons license.
Attribution (BY): Credit must be given to the creator.
Non-Commercial (NC): Only non-commercial uses of the work are permitted.
No Derivatives (ND): Derivative works may be created for non-commercial purposes, but sharing is prohibited.
*Disclaimer
This summary highlights only some of the key features and terms of the actual license. It is not a license and has no legal value. Carefully review the actual license before using these materials.
License Summary*
You are free to share (copy and redistribute) this article in any medium or format within the parameters below:
Creative Commons (CC): This is a Creative Commons license.
Attribution (BY): Credit must be given to the creator.
Non-Commercial (NC): Only non-commercial uses of the work are permitted.
No Derivatives (ND): Derivative works may be created for non-commercial purposes, but sharing is prohibited.
*Disclaimer
This summary highlights only some of the key features and terms of the actual license. It is not a license and has no legal value. Carefully review the actual license before using these materials.
Introduction
Representing Molecules
Figure 1

Figure 1. Acetaminophen (center) under various molecular representations. Top-left: Sequence based representations. Prior to being fed to the models, these sequences are also usually one-hot encoded. Top-right: Graph-based representations. While connection matrices are a suitable input for standard architectures, graphs can also be directly handled using graph neural networks. Bottom: Three dimensional representations, images from PubChem. (26) Graphs may be enhanced by including 3D information as node attributes, such as internal distances and angles, or based on a coordinate system such as Cartesian space. Molecular surfaces can be voxelized into a 3D grid for easier processing.
1D Sequences
SMILES
InChI
DeepSMILES
SELFIES
2D (Chemical) Structures
3D Structures
Databases
database | molecules | information |
---|---|---|
ChEMBL (43) | 2M compounds | bioactive drug-like small molecules |
ExCAPE-DB (46) | 1M compounds | active/inactive molecules by target |
ZINC (47) | 750M compounds | drug-like molecules, available for purchase |
PubChem (26) | 111M compounds | mostly small molecules |
DrugBank (42) | 13K drug entries | approved and experimental drugs |
GDB-17 (4) | 166B compounds | combinatorially generated molecules |
REAL database (48) | 1.95B compounds | database of enumerated structures |
Tox21 (49) | 11K compounds | toxicity data for various assays |
QM8 (50) | 22K compounds | electronic spectra and excited state energy |
QM9 (51) | 134K compounds | geometric, energetic, electronic, thermodynamic |
PDBbind (52) | 17K compounds | 3D structures and binding affinity |
Number of available molecules reported as of October 2020.
Deep Learning models for De Novo Molecular Design
Architectures
Figure 2

Figure 2. Top-left: Three layer Recurrent Neural Network (RNN) both rolled and unrolled. In each layer, the output of a step, besides flowing to the next layer, also flows to the next step of the layer itself. These recurrent connections are depicted in the unfolded view of the network as vertical arrows. Top-right: Variational Autoencoder (VAE) where the input is encoded to the parameters of a statistical distribution, namely, the means (μ) and standard deviation (σ). In practice, these correspond to two vectors which, on the sampling step, are interpreted as a set of means and standard deviations. Bottom-left: Generative Adversarial Network (GAN) composed by a generator and a discriminator. Training seeks not a minimum but a useful equilibrium between the generator and the discriminator. Bottom-right: Adversarial Autoencoder (AAE) where the attached discriminator must discern between encoded points and samples drawn from a prior statistical distribution.
Recurrent Neural Networks
Generative Adversarial Networks
Autoencoders
Generating Molecules
Figure 3

Figure 3. Three layer RNN, unfolded over four time-steps. In autoregressive sequence generation, the process is started with a special start token, here “G”. The model then predicts the next token, which is sampled and used as input for the next step. Generation ends when a stop token is predicted.
Figure 4

Figure 4. Left: In sequential graph generation, a graph is built by evaluating a current partial graph, adding a node/edge and repeating until the network outputs a stop signal. Right: In the one-shot generation of graphs, probabilities over the full adjacency matrix and node/edge attribute tensors are produced. The graph is then obtained by taking a sample or the argmax of these outputs.
Figure 5

Figure 5. Left: General procedure for the generation of 3D shapes as proposed by Skalic et al. (72) The convolutional decoder of a VAE is used to produce a 3D molecular shape which is converted to SMILES by a captioning network. Right: General process for generating molecules as 3D point sets, proposed by Gebauer et al. (73) It is conceptually similar to the sequential graph generation, operating on point sets with an internal coordinate system.
Evaluating Generative Models
Generating Compounds of Interest
method | representation | architecture | ref | |
---|---|---|---|---|
Transfer Learning | SMILES | Stacked RNN | (18), (34), (59), (62), (63), (82) | |
SMILES | (83) | |||
Graph | GNN | (84) | ||
3D point sets | SchNet +2 MLP | (73) | ||
Reinforcement Learning | Pretrain + RL | SMILES | Stacked RNN | (18), (58), (82), (85) |
SMILES | VAE | (86) | ||
Graph | two RNNs | (87) | ||
Adversarial + RL | SMILES | GAN | (60,88−90) | |
Graph | GNNs | (91), (92) | ||
Graph | GAN | (68) | ||
Latent Space Navigation | Bayesian Optimization | SMILES | VAE | (10), (93) |
SMILES | AAE and VAE | (94) | ||
SMILES (production rules) | VAE | (95), (96) | ||
Graph (junction trees) | VAE | (97) | ||
Graph | VAE | (98) | ||
Gradient Ascent | Graph (junction trees) | VAE | (97) | |
Graph | VAE | (66), (99) | ||
DL Model | SMILES | GAN + AE (32) | (36) | |
Graph (junction trees) | CycleGAN + VAE (97) | (100) | ||
GA | SMILES | AE | (101) | |
PSO | SMILES | AE | (102) | |
CLaSS | SMILES | VAE | (103) | |
Conditioned | Conditioned | SMILES | VAE | (61) |
SMILES | Stacked RNN | (104) | ||
SMILES | Two AAEs | (105) | ||
SMILES (production rules) | Two GANs | (106) | ||
SELFIES | VAE | (107) | ||
Graph | GNNs | (65), (84) | ||
Graph | VAE | (69) | ||
Graph (junction trees) | VAE | (108) | ||
3D shape | VAE + RNN | (72) | ||
3D shape | VAE + GAN | (109) | ||
semisupervised | SMILES | VAE | (110) | |
SMILES | AAE | (64) | ||
Graph (scaffold extension) | VAE | (111) |
Screening
Transfer Learning
Figure 6

Figure 6. In transfer learning, a general model is first trained on a large data set and then fine-tuned toward generating the desired properties with a smaller, focused, data set.
Reinforcement Learning
Figure 7

Figure 7. Top: The model is first pretrained through maximum likelihood estimation, learning the structure of the output space along with general chemical rules. Then, using RL, the model is optimized for specific properties such as binding affinity or solubility. While similar in concept to transfer learning, the use of RL allows one to bias the model toward a wider range of objectives. Bottom: Directed generation with RL and GAN. This method leverages adversarial training to produce feasible molecules and RL to bias the generation toward desired properties.
Exploration and Exploitation of Molecules Latent Space
Figure 8

Figure 8. Here, the latent space of an AE is used as a reversible and continuous molecular representation allowing for the application of various optimization algorithms.
Bayesian Optimization
Genetic Algorithms and Particle Swarms
Gradient-Based Methods
Deep Learning for Latent Space Navigation
Conditioned and Semisupervised Generation
Figure 9

Figure 9. Top: In conditioned generation, the desired properties are introduced as explicit inputs to the model. These properties are precomputed for each compound of the training set and used during training to induce a correlation between the two. This correlation is then leveraged during the generation process to target specific property values. Bottom: In the semisupervised case of conditioned generation, only part of the training set has the desired properties available. To overcome this, a predictor network is trained on the labeled instances and used to predict the properties of unlabeled ones.
Synthetic Accessibility
Current Applications
Drug Development
activity | |||||
---|---|---|---|---|---|
target | directed generation | in silico | in vitro | in vivo | ref |
RXR, PPAR | transfer learning | SPiDER | 5 synthesized; reported: 4 active | (63) | |
RXR | transfer learning | SPiDER WHALES | 4 synthesized; reported: 2 active | (62) | |
JK3 selective | reinforcement learning | docking | 1 synthesized; reported: active and selective for JK3 | (64) | |
kinase inhibitors | reinforcement learning | 50 purchased (similar); reported: 7 active | (89) | ||
DRD2, 5-HT1A, 5-HT2A | transfer learning | MT-DNN on ECFP4 | 1+6 analogues synthesized; reported: active for the 3 receptors | 1+6 analogues; 1 active and acceptable safety | (127) |
VEGFR-2 | train on actives | docking | 5 synthesized; reported: 3 active and noncytotoxic | (17) | |
DDR1 | reinforcement learning | SOM pharmacophore | 6 synthesized; reported: 2 active and stable | 1 tested; half-life 3.5 h; 10 analogues tested | (86) |
p300/CBP inhibitors | transfer learning | docking | 1+26 analogues synthesized; reported: active, selective, and stable | good bioavailability, efficacy, safety | (128) |
LXR agonists | transfer learning | 25 synthesized, 3 purchased; reported: 12 active | (129) |
COVID-19
Organic Photovoltaics
Future Directions of Research
Acknowledgments
This project has received funding from the European Union’s Horizon 2020 research and innovation programme (Grant Agreement Number 814408).
References
This article references 139 other publications.
- 1Polishchuk, P. G.; Madzhidov, T. I.; Varnek, A. Estimation of the size of drug-like chemical space based on GDB-17 data. J. Comput.-Aided Mol. Des. 2013, 27, 675, DOI: 10.1007/s10822-013-9672-4[Crossref], [PubMed], [CAS], Google Scholar1https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXhtlWjtbjM&md5=80a95002031f04319351c7e342bc1a55Estimation of the size of drug-like chemical space based on GDB-17 dataPolishchuk, P. G.; Madzhidov, T. I.; Varnek, A.Journal of Computer-Aided Molecular Design (2013), 27 (8), 675-679CODEN: JCADEQ; ISSN:0920-654X. (Springer)The goal of this paper is to est. the no. of realistic drug-like mols. which could ever be synthesized. Unlike previous studies based on exhaustive enumeration of mol. graphs or on combinatorial enumeration preselected fragments, we used results of constrained graphs enumeration by Reymond to establish a correlation between the no. of generated structures (M) and the no. of heavy atoms (N): logM = 0.584 × N × logN + 0.356. The no. of atoms limiting drug-like chem. space of mols. which follow Lipinsky's rules (N = 36) has been obtained from the anal. of the PubChem database. This results in M ≈ 1033 which is in between the nos. estd. by Ertl (1023) and by Bohacek (1060).
- 2Schneider, G. Automating drug discovery. Nat. Rev. Drug Discovery 2018, 17, 97– 113, DOI: 10.1038/nrd.2017.232[Crossref], [PubMed], [CAS], Google Scholar2https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhvFOntbbK&md5=4fc1e441a0a31d7421642eb91fbac8c7Automating drug discoverySchneider, GisbertNature Reviews Drug Discovery (2018), 17 (2), 97-113CODEN: NRDDAG; ISSN:1474-1776. (Nature Research)A review. Small mol. drug discovery can be viewed as a challenging multidimensional problem in which various characteristics of compds. including efficacy, pharmacokinetics and safety need to be optimized in parallel to provide drug candidates. Recent advances in areas such as microfluidics-assisted chem. synthesis and biol. testing, as well as artificial intelligence systems that improve a design hypothesis through feedback anal., are now providing a basis for the introduction of greater automation into aspects of this process. This could potentially accelerate time frames for compd. discovery and optimization and enable more effective searches of chem. space. However, such approaches also raise considerable conceptual, tech. and organizational challenges, as well as scepticism about the current hype around them. This article aims to identify the approaches and technologies that could be implemented robustly by medicinal chemists in the near future and to critically analyze the opportunities and challenges for their more widespread application.
- 3DiMasi, J. A.; Grabowski, H. G.; Hansen, R. W. Innovation in the pharmaceutical industry: New estimates of R&D costs. Journal of Health Economics 2016, 47, 20– 33, DOI: 10.1016/j.jhealeco.2016.01.012[Crossref], [PubMed], [CAS], Google Scholar3https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC28jls1egsw%253D%253D&md5=a78649853406ddbfe54abb0052ca964bInnovation in the pharmaceutical industry: New estimates of R&D costsDiMasi Joseph A; Grabowski Henry G; Hansen Ronald WJournal of health economics (2016), 47 (), 20-33 ISSN:.The research and development costs of 106 randomly selected new drugs were obtained from a survey of 10 pharmaceutical firms. These data were used to estimate the average pre-tax cost of new drug and biologics development. The costs of compounds abandoned during testing were linked to the costs of compounds that obtained marketing approval. The estimated average out-of-pocket cost per approved new compound is $1395 million (2013 dollars). Capitalizing out-of-pocket costs to the point of marketing approval at a real discount rate of 10.5% yields a total pre-approval cost estimate of $2558 million (2013 dollars). When compared to the results of the previous study in this series, total capitalized costs were shown to have increased at an annual rate of 8.5% above general price inflation. Adding an estimate of post-approval R&D costs increases the cost estimate to $2870 million (2013 dollars).
- 4Ruddigkeit, L.; van Deursen, R.; Blum, L. C.; Reymond, J.-L. Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17. J. Chem. Inf. Model. 2012, 52, 2864– 2875, DOI: 10.1021/ci300415d[ACS Full Text
], [CAS], Google Scholar
4https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XhsFClsL3J&md5=d0bf9a29f3e9ae1e57bb1c953a562cedEnumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17Ruddigkeit, Lars; van Deursen, Ruud; Blum, Lorenz C.; Reymond, Jean-LouisJournal of Chemical Information and Modeling (2012), 52 (11), 2864-2875CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Drug mols. consist of a few tens of atoms connected by covalent bonds. How many such mols. are possible in total and what is their structure. This question is of pressing interest in medicinal chem. to help solve the problems of drug potency, selectivity, and toxicity and reduce attrition rates by pointing to new mol. series. To better define the unknown chem. space, we have enumerated 166.4 billion mols. of up to 17 atoms of C, N, O, S, and halogens forming the chem. universe database GDB-17, covering a size range contg. many drugs and typical for lead compds. GDB-17 contains millions of isomers of known drugs, including analogs with high shape similarity to the parent drug. Compared to known mols. in PubChem, GDB-17 mols. are much richer in nonarom. heterocycles, quaternary centers, and stereoisomers, densely populate the third dimension in shape space, and represent many more scaffold types. - 5Walters, W. P. Virtual Chemical Libraries: Miniperspective. J. Med. Chem. 2019, 62, 1116– 1124, DOI: 10.1021/acs.jmedchem.8b01048[ACS Full Text
], [CAS], Google Scholar
5https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXhsFyqtbnM&md5=ede412935f56e350915f35e79f1b969fVirtual Chemical LibrariesWalters, W. PatrickJournal of Medicinal Chemistry (2019), 62 (3), 1116-1124CODEN: JMCMAR; ISSN:0022-2623. (American Chemical Society)A review. Advances in computer processing speed and storage capacity have enabled researchers to generate virtual chem. libraries contg. billions of mols. While these nos. appear large, they are only a small fraction of the no. of org. mols. that could potentially be synthesized. This review provides an overview of recent advances in the generation and use of virtual chem. libraries in medicinal chem. The authors also consider the practical implications of these libraries in drug discovery programs and highlight a no. of current and future challenges. - 6Hartenfeller, M.; Zettl, H.; Walter, M.; Rupp, M.; Reisen, F.; Proschak, E.; Weggen, S.; Stark, H.; Schneider, G. DOGS: reaction-driven de novo design of bioactive com- pounds. PLoS Comput. Biol. 2012, 8, e1002380 DOI: 10.1371/journal.pcbi.1002380[Crossref], [PubMed], [CAS], Google Scholar6https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38Xjt1Sjtr0%253D&md5=cdcf6905d52761614c33ab068b608ac5DOGS: reaction-driven de novo design of bioactive compoundsHartenfeller, Markus; Zettl, Heiko; Walter, Miriam; Rupp, Matthias; Reisen, Felix; Proschak, Ewgenij; Weggen, Sascha; Stark, Holger; Schneider, GisbertPLoS Computational Biology (2012), 8 (2), e1002380CODEN: PCBLBG; ISSN:1553-7358. (Public Library of Science)We present a computational method for the reaction-based de novo design of drug-like mols. The software DOGS (Design of Genuine Structures) features a ligand-based strategy for automated 'in silico' assembly of potentially novel bioactive compds. The quality of the designed compds. is assessed by a graph kernel method measuring their similarity to known bioactive ref. ligands in terms of structural and pharmacophoric features. We implemented a deterministic compd. construction procedure that explicitly considers compd. synthesizability, based on a compilation of 25'144 readily available synthetic building blocks and 58 established reaction principles. This enables the software to suggest a synthesis route for each designed compd. Two prospective case studies are presented together with details on the algorithm and its implementation. De novo designed ligand candidates for the human histamine H4 receptor and γ-secretase were synthesized as suggested by the software. The computational approach proved to be suitable for scaffold-hopping from known ligands to novel chemotypes and for generating bioactive mols. with drug-like properties.
- 7Spiegel, J.; Durrant, J. AutoGrow4: An open-source genetic algorithm for de novo drug design and lead optimization. J. Cheminf. 2020, 12, 25, DOI: 10.1186/s13321-020-00429-4[Crossref], [CAS], Google Scholar7https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXnsVWksrs%253D&md5=10cb3d13c137b5e93fc3f92568edba22AutoGrow4: an open-source genetic algorithm for de novo drug design and lead optimizationSpiegel, Jacob O.; Durrant, Jacob D.Journal of Cheminformatics (2020), 12 (1), 25CODEN: JCOHB3; ISSN:1758-2946. (SpringerOpen)Abstr.: We here present AutoGrow4, an open-source program for semi-automated computer-aided drug discovery. AutoGrow4 uses a genetic algorithm to evolve predicted ligands on demand and so is not limited to a virtual library of pre-enumerated compds. It is a useful tool for generating entirely novel drug-like mols. and for optimizing preexisting ligands. By leveraging recent computational and cheminformatics advancements, AutoGrow4 is faster, more stable, and more modular than previous versions. It implements new docking-program compatibility, chem. filters, multithreading options, and selection methods to support a wide range of user needs. To illustrate both de novo design and lead optimization, we here apply AutoGrow4 to the catalytic domain of poly(ADP-ribose) polymerase 1 (PARP-1), a well characterized DNA-damage-recognition protein. AutoGrow4 produces drug-like compds. with better predicted binding affinities than FDA-approved PARP-1 inhibitors (pos. controls). The predicted binding modes of the AutoGrow4 compds. mimic those of the known inhibitors, even when AutoGrow4 is seeded with random small mols. AutoGrow4 is available under the terms of the Apache License, Version 2.0. A copy can be downloaded free of charge from <a><a><a> not available: see fulltext].
- 8Jensen, J. H. A graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical space. Chem. Sci. 2019, 10, 3567– 3572, DOI: 10.1039/C8SC05372C[Crossref], [PubMed], [CAS], Google Scholar8https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXislClt78%253D&md5=108c19e322025e736330203e0d312237A graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical spaceJensen, Jan H.Chemical Science (2019), 10 (12), 3567-3572CODEN: CSHCCN; ISSN:2041-6520. (Royal Society of Chemistry)This paper presents a comparison of a graph-based genetic algorithm (GB-GA) and machine learning (ML) results for the optimization of log P values with a constraint for synthetic accessibility and shows that the GA is as good as or better than the ML approaches for this particular property. The mols. found by the GB-GA bear little resemblance to the mols. used to construct the initial mating pool, indicating that the GB-GA approach can traverse a relatively large distance in chem. space using relatively few (50) generations. The paper also introduces a new non-ML graph-based generative model (GB-GM) that can be parameterized using very small data sets and combined with a Monte Carlo tree search (MCTS) algorithm. The results are comparable to previously published results (Sci. Technol. Adv. Mater., 2017, 18, 972-976) using a recurrent neural network (RNN) generative model, and the GB-GM-based method is several orders of magnitude faster. The MCTS results seem more dependent on the compn. of the training set than the GA approach for this particular property. Our results suggest that the performance of new ML-based generative models should be compared to that of more traditional, and often simpler, approaches such a GA.
- 9Yoshikawa, N.; Terayama, K.; Sumita, M.; Homma, T.; Oono, K.; Tsuda, K. Population-based De Novo Molecule Generation, Using Grammatical Evolution. Chem. Lett. 2018, 47, 1431– 1434, DOI: 10.1246/cl.180665[Crossref], [CAS], Google Scholar9https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXitlKnsLvP&md5=ccb65d8a0d2bf9b350b4607ef2e70f67Population-based de novo molecule generation, using grammatical evolutionYoshikawa, Naruki; Terayama, Kei; Sumita, Masato; Homma, Teruki; Oono, Kenta; Tsuda, KojiChemistry Letters (2018), 47 (11), 1431-1434CODEN: CMLTAG; ISSN:0366-7022. (Chemical Society of Japan)Automatic mol. design with machine learning and simulations has shown a remarkable ability to generate new and promising drug candidates. We propose a new population-based approach using a grammatical evolution named ChemGE, that can update a large population of mols. concurrently and evaluate with multiple simulators in parallel. In computational expts., ChemGE succeeded in finding hundreds of candidate mols. whose affinity for thymidine kinase is better than that of known binding mols. in a database (DUD-E).
- 10Ǵomez-Bombarelli, R.; Wei, J. N.; Duvenaud, D.; Hernández-Lobato, J. M.; Śanchez- Lengeling, B.; Sheberla, D.; Aguilera-Iparraguirre, J.; Hirzel, T. D.; Adams, R. P.; Aspuru-Guzik, A. Automatic Chemical Design Using a Data-Driven Continuous Rep- resentation of Molecules. ACS Cent. Sci. 2018, 4, 268– 276, DOI: 10.1021/acscentsci.7b00572[ACS Full Text
], [CAS], Google Scholar
10https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXntlWquw%253D%253D&md5=322d9ff569fc9c8831e91d915104d985Automatic Chemical Design Using a Data-Driven Continuous Representation of MoleculesGomez-Bombarelli, Rafael; Wei, Jennifer N.; Duvenaud, David; Hernandez-Lobato, Jose Miguel; Sanchez-Lengeling, Benjamin; Sheberla, Dennis; Aguilera-Iparraguirre, Jorge; Hirzel, Timothy D.; Adams, Ryan P.; Aspuru-Guzik, AlanACS Central Science (2018), 4 (2), 268-276CODEN: ACSCII; ISSN:2374-7951. (American Chemical Society)We report a method to convert discrete representations of mols. to and from a multidimensional continuous representation. This model allows us to generate new mols. for efficient exploration and optimization through open-ended spaces of chem. compds. A deep neural network was trained on hundreds of thousands of existing chem. structures to construct three coupled functions: an encoder, a decoder, and a predictor. The encoder converts the discrete representation of a mol. into a real-valued continuous vector, and the decoder converts these continuous vectors back to discrete mol. representations. The predictor ests. chem. properties from the latent continuous vector representation of the mol. Continuous representations of mols. allow us to automatically generate novel chem. structures by performing simple operations in the latent space, such as decoding random vectors, perturbing known chem. structures, or interpolating between mols. Continuous representations also allow the use of powerful gradient-based optimization to efficiently guide the search for optimized functional compds. We demonstrate our method in the domain of drug-like mols. and also in a set of mols. with fewer that nine heavy atoms. - 11Gawehn, E.; Hiss, J. A.; Schneider, G. Deep learning in drug discovery. Mol. Inf. 2016, 35, 3– 14, DOI: 10.1002/minf.201501008[Crossref], [PubMed], [CAS], Google Scholar11https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXitV2rs7bE&md5=bd2553359c4b824c5272388462772659Deep Learning in Drug DiscoveryGawehn, Erik; Hiss, Jan A.; Schneider, GisbertMolecular Informatics (2016), 35 (1), 3-14CODEN: MIONBS; ISSN:1868-1743. (Wiley-VCH Verlag GmbH & Co. KGaA)Artificial neural networks had their first heyday in mol. informatics and drug discovery approx. two decades ago. Currently, we are witnessing renewed interest in adapting advanced neural network architectures for pharmaceutical research by borrowing from the field of "deep learning". Compared with some of the other life sciences, their application in drug discovery is still limited. Here, we provide an overview of this emerging field of mol. informatics, present the basic concepts of prominent deep learning methods and offer motivation to explore these techniques for their usefulness in computer-assisted drug discovery and design. We specifically emphasize deep neural networks, restricted Boltzmann machine networks and convolutional networks.
- 12Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press, 2016.Google ScholarThere is no corresponding record for this reference.
- 13Chollet, F. Deep learning with Python; Manning Publications Co: Shelter Island, NY, 2018.Google ScholarThere is no corresponding record for this reference.
- 14Foster, D.; Safari, A. O. M. C. Generative deep learning: teaching machines to paint, write, compose, and play; O’Reilly Media, 2019.Google ScholarThere is no corresponding record for this reference.
- 15White, D.; Wilson, R. C. Generative models for chemical structures. J. Chem. Inf. Model. 2010, 50, 1257– 1274, DOI: 10.1021/ci9004089[ACS Full Text
], [CAS], Google Scholar
15https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXos1ersr4%253D&md5=f694ddd48c404ebea7792e4a52458ebbGenerative Models for Chemical StructuresWhite, David; Wilson, Richard C.Journal of Chemical Information and Modeling (2010), 50 (7), 1257-1274CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)We apply recently developed techniques for pattern recognition to construct a generative model for chem. structure. This approach can be viewed as ligand-based de novo design. We construct a statistical model describing the structural variations present in a set of mols. which may be sampled to generate new structurally similar examples. We prevent the possibility of generating chem. invalid mols., according to our implicit hydrogen model, by projecting samples onto the nearest chem. valid mol. By populating the input set with mols. that are active against a target, we show how new mols. may be generated that will likely also be active against the target. - 16Sanchez-Lengeling, B.; Aspuru-Guzik, A. Inverse molecular design using machine learning: Generative models for matter engineering. Science 2018, 361, 360– 365, DOI: 10.1126/science.aat2663[Crossref], [PubMed], [CAS], Google Scholar16https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXhtlyitr3L&md5=779c4a42ba1e84d99d13ad1b32b9529aInverse molecular design using machine learning: Generative models for matter engineeringSanchez-Lengeling, Benjamin; Aspuru-Guzik, AlanScience (Washington, DC, United States) (2018), 361 (6400), 360-365CODEN: SCIEAS; ISSN:0036-8075. (American Association for the Advancement of Science)The discovery of new materials can bring enormous societal and technol. progress. In this context, exploring completely the large space of potential materials is computationally intractable. Here, we review methods for achieving inverse design, which aims to discover tailored materials from the starting point of a particular desired functionality. Recent advances from the rapidly growing field of artificial intelligence, mostly from the subfield of machine learning, have resulted in a fertile exchange of ideas, where approaches to inverse mol. design are being proposed and employed at a rapid pace. Among these, deep generative models have been applied to numerous classes of materials: rational design of prospective drugs, synthetic routes to org. compds., and optimization of photovoltaics and redox flow batteries, as well as a variety of other solid-state materials.
- 17Yuan, W.; Jiang, D.; Nambiar, D. K.; Liew, L. P.; Hay, M. P.; Bloomstein, J.; Lu, P.; Turner, B.; Le, Q.-T.; Tibshirani, R.; Khatri, P.; Moloney, M. G.; Koong, A. C. Chemical Space Mimicry for Drug Discovery. J. Chem. Inf. Model. 2017, 57, 875– 882, DOI: 10.1021/acs.jcim.6b00754[ACS Full Text
], [CAS], Google Scholar
17https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXjs12gtrs%253D&md5=f237aa1f08c24e416756cc80f018b3d1Chemical Space Mimicry for Drug DiscoveryYuan, William; Jiang, Dadi; Nambiar, Dhanya K.; Liew, Lydia P.; Hay, Michael P.; Bloomstein, Joshua; Lu, Peter; Turner, Brandon; Le, Quynh-Thu; Tibshirani, Robert; Khatri, Purvesh; Moloney, Mark G.; Koong, Albert C.Journal of Chemical Information and Modeling (2017), 57 (4), 875-882CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)The authors describe a new library generation method, Machine-based Identification of Mols. Inside Characterized Space (MIMICS) that generates sets of mols. inspired by a text-based input. MIMICS-generated libraries were found to preserve distributions of properties while simultaneously increasing structural diversity. Newly identified MIMICS-generated compds. were found to be bioactive as inhibitors of specific components of the unfolded protein response (UPR) and the VEGFR2 pathway in cell-based assays, thus confirming that applicability of this methodol. towards drug design applications. Wider application of MIMICS could facilitate the efficient utilization of chem. space. - 18Segler, M. H.; Kogej, T.; Tyrchan, C.; Waller, M. P. Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent. Sci. 2018, 4, 120– 131, DOI: 10.1021/acscentsci.7b00512[ACS Full Text
], [CAS], Google Scholar
18https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXitVCjsLfP&md5=708f40422c7a911c629525ce5b66088bGenerating Focused Molecule Libraries for Drug Discovery with Recurrent Neural NetworksSegler, Marwin H. S.; Kogej, Thierry; Tyrchan, Christian; Waller, Mark P.ACS Central Science (2018), 4 (1), 120-131CODEN: ACSCII; ISSN:2374-7951. (American Chemical Society)In de novo drug design, computational strategies are used to generate novel mols. with good affinity to the desired biol. target. In this work, we show that recurrent neural networks can be trained as generative models for mol. structures, similar to statistical language models in natural language processing. We demonstrate that the properties of the generated mols. correlate very well with the properties of the mols. used to train the model. In order to enrich libraries with mols. active toward a given biol. target, we propose to fine-tune the model with small sets of mols., which are known to be active against that target. Against Staphylococcus aureus, the model reproduced 14% of 6051 hold-out test mols. that medicinal chemists designed, whereas against Plasmodium falciparum (Malaria), it reproduced 28% of 1240 test mols. When coupled with a scoring function, our model can perform the complete de novo drug design cycle to generate large sets of novel mols. for drug discovery. - 19Elton, D. C.; Boukouvalas, Z.; Fuge, M. D.; Chung, P. W. Deep learning for molecular design─a review of the state of the art. Mol. Syst. Des. Eng. 2019, 4, 828– 849, DOI: 10.1039/C9ME00039A[Crossref], [CAS], Google Scholar19https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXhtVWktLjN&md5=a0925fdace3a12af31b2f6fc1edef89bDeep learning for molecular design-a review of the state of the artElton, Daniel C.; Boukouvalas, Zois; Fuge, Mark D.; Chung, Peter W.Molecular Systems Design & Engineering (2019), 4 (4), 828-849CODEN: MSDEBG; ISSN:2058-9689. (Royal Society of Chemistry)In the space of only a few years, deep generative modeling has revolutionized how we think of artificial creativity, yielding autonomous systems which produce original images, music, and text. Inspired by these successes, researchers are now applying deep generative modeling techniques to the generation and optimization of mols.-in our review we found 45 papers on the subject published in the past two years. These works point to a future where such systems will be used to generate lead mols., greatly reducing resources spent downstream synthesizing and characterizing bad leads in the lab. In this review we survey the increasingly complex landscape of models and representation schemes that have been proposed. The four classes of techniques we describe are recursive neural networks, autoencoders, generative adversarial networks, and reinforcement learning. After first discussing some of the math. fundamentals of each technique, we draw high level connections and comparisons with other techniques and expose the pros and cons of each. Several important high level themes emerge as a result of this work, including the shift away from the SMILES string representation of mols. towards more sophisticated representations such as graph grammars and 3D representations, the importance of reward function design, the need for better stds. for benchmarking and testing, and the benefits of adversarial training and reinforcement learning over max. likelihood based training.
- 20Schwalbe-Koda, D.; Ǵomez-Bombarelli, R. In Machine Learning Meets Quantum Physics; Schütt, K. T., Chmiela, S., von Lilienfeld, O. A., Tkatchenko, A., Tsuda, K., Müller, K.-R., Eds.; Springer International Publishing: Cham, 2020; pp 445– 467.
- 21Zhavoronkov, A.; Vanhaelen, Q.; Oprea, T. I. Will Artificial Intelligence for Drug Discovery Impact Clinical Pharmacology?. Clin. Pharmacol. Ther. (N. Y., NY, U. S.) 2020, 107, 780– 785, DOI: 10.1002/cpt.1795
- 22Bian, Y.; Xie, X.-Q. Generative chemistry: drug discovery with deep learning gener- ative models. J. Mol. Model. 2021, 27, 71, DOI: 10.1007/s00894-021-04674-8[Crossref], [PubMed], [CAS], Google Scholar22https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3MXhsValsLrP&md5=bf1851d6e96138674bc169295356eee6Generative chemistry: drug discovery with deep learning generative modelsBian, Yuemin; Xie, Xiang-QunJournal of Molecular Modeling (2021), 27 (3), 71CODEN: JMMOFK; ISSN:0948-5023. (Springer)A review. The de novo design of mol. structures using deep learning generative models introduces an encouraging soln. to drug discovery in the face of the continuously increased cost of new drug development. From the generation of original texts, images, and videos, to the scratching of novel mol. structures the creativity of deep learning generative models exhibits the height machine intelligence can achieve. The purpose of this paper is to review the latest advances in generative chem. which relies on generative modeling to expedite the drug discovery process. This review starts with a brief history of artificial intelligence in drug discovery to outline this emerging paradigm. Commonly used chem. databases, mol. representations, and tools in cheminformatics and machine learning are covered as the infrastructure for generative chem. The detailed discussions on utilizing cutting-edge generative architectures, including recurrent neural network, variational autoencoder, adversarial autoencoder, and generative adversarial network for compd. generation are focused. Challenges and future perspectives follow.
- 23Chen, H.; Engkvist, O.; Wang, Y.; Olivecrona, M.; Blaschke, T. The rise of deep learning in drug discovery. Drug Discovery Today 2018, 23, 1241– 1250, DOI: 10.1016/j.drudis.2018.01.039[Crossref], [PubMed], [CAS], Google Scholar23https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC1MvjvFyqtQ%253D%253D&md5=d6cbdd98ede30181802cca1786cd5a95The rise of deep learning in drug discoveryChen Hongming; Engkvist Ola; Olivecrona Marcus; Blaschke Thomas; Wang YinhaiDrug discovery today (2018), 23 (6), 1241-1250 ISSN:.Over the past decade, deep learning has achieved remarkable success in various artificial intelligence research areas. Evolved from the previous research on artificial neural networks, this technology has shown superior performance to other machine learning algorithms in areas such as image and voice recognition, natural language processing, among others. The first wave of applications of deep learning in pharmaceutical research has emerged in recent years, and its utility has gone beyond bioactivity predictions and has shown promise in addressing diverse problems in drug discovery. Examples will be discussed covering bioactivity prediction, de novo molecular design, synthesis prediction and biological image analysis.
- 24Vamathevan, J.; Clark, D.; Czodrowski, P.; Dunham, I.; Ferran, E.; Lee, G.; Li, B.; Madabhushi, A.; Shah, P.; Spitzer, M.; Zhao, S. Applications of machine learning in drug discovery and development. Nat. Rev. Drug Discovery 2019, 18, 463– 477, DOI: 10.1038/s41573-019-0024-5[Crossref], [PubMed], [CAS], Google Scholar24https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXosF2rtrY%253D&md5=211782aeea3d8b9f50368f89177a70d2Applications of machine learning in drug discovery and developmentVamathevan, Jessica; Clark, Dominic; Czodrowski, Paul; Dunham, Ian; Ferran, Edgardo; Lee, George; Li, Bin; Madabhushi, Anant; Shah, Parantu; Spitzer, Michaela; Zhao, ShanrongNature Reviews Drug Discovery (2019), 18 (6), 463-477CODEN: NRDDAG; ISSN:1474-1776. (Nature Research)A review. Drug discovery and development pipelines are long, complex and depend on numerous factors. Machine learning (ML) approaches provide a set of tools that can improve discovery and decision making for well-specified questions with abundant, high-quality data. Opportunities to apply ML occur in all stages of drug discovery. Examples include target validation, identification of prognostic biomarkers and anal. of digital pathol. data in clin. trials. Applications have ranged in context and methodol., with some approaches yielding accurate predictions and insights. The challenges of applying ML lie primarily with the lack of interpretability and repeatability of ML-generated results, which may limit their application. In all areas, systematic and comprehensive high-dimensional data still need to be generated. With ongoing efforts to tackle these issues, as well as increasing awareness of the factors needed to validate ML approaches, the application of ML can promote data-driven decision making and has the potential to speed up the process and reduce failure rates in drug discovery and development.
- 25Engel, T., Gasteiger, J., Eds. Chemoinformatics: basic concepts and methods; Wiley-VCH: Weinheim, 2018; OCLC: 1012130305.
- 26Kim, S.; Chen, J.; Cheng, T.; Gindulyte, A.; He, J.; He, S.; Li, Q.; Shoemaker, B. A.; Thiessen, P. A.; Yu, B.; Zaslavsky, L.; Zhang, J.; Bolton, E. E. PubChem 2019 update: improved access to chemical data. Nucleic Acids Res. 2019, 47, D1102– D1109, DOI: 10.1093/nar/gky1033
- 27Ash, S.; Cline, M.; Homer, R. W.; Hurst, T.; Smith, G. SYBYL Line Notation (SLN): A Versatile Language for Chemical Structure Representation. J. Chem. Inf. Comput. Sci. 1997, 37, 71– 79, DOI: 10.1021/ci960109j[ACS Full Text
], [CAS], Google Scholar
27https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK2sXksVCitA%253D%253D&md5=9c53b87aec0a69043773cd771c662152SYBYL Line Notation (SLN): A Versatile Language for Chemical Structure RepresentationAsh, Sheila; Cline, Malcolm A.; Homer, R. Webster; Hurst, Tad; Smith, Gregory B.Journal of Chemical Information and Computer Sciences (1997), 37 (1), 71-79CODEN: JCISD8; ISSN:0095-2338. (American Chemical Society)SYBYL Line Notation (SLN) is an ASCII language used to represent chem. structures, including common org. mols., macromols., polymers, and combinatorial libraries. SLN is also used to express substructural (2D) queries and includes a complete facility for Markush representation. This concise language is ideal for database storage of chem. entities as well as for network communication of structures and queries. - 28Koniver, D. A.; Wiswesser, W. J.; Usdin, E. Wiswesser Line Notation: Simplified Techniques for Converting Chemical Structures to WLN. Science 1972, 176, 1437– 1439, DOI: 10.1126/science.176.4042.1437[Crossref], [PubMed], [CAS], Google Scholar28https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaE38Xks1Srt7c%253D&md5=e7b61344874233d0114df299b83e3a9dWiswesser Line Notation. Simplified techniques for converting chemical structures to WLNKoniver, Deena A.; Wiswesser, William J.; Usdin, EarlScience (Washington, DC, United States) (1972), 176 (4042), 1437-9CODEN: SCIEAS; ISSN:0036-8075.Techniques were developed for the generation of Wiswesser Line Notations (WLN), which require knowledge neither of rules for manual conversion of structures to line notations nor of computer programing. The desired WLN are obtained simply by drawing the structures of the compds. of interest on a tablet, which is linked to an appropriately programmed computer.
- 29Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Model. 1988, 28, 31– 36, DOI: 10.1021/ci00057a005[ACS Full Text
], [CAS], Google Scholar
29https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaL1cXnsVeqsA%253D%253D&md5=04592975f9dd3c0ce3c1ad618ba2b17dSMILES, a chemical language and information system. 1. Introduction to methodology and encoding rulesWeininger, DavidJournal of Chemical Information and Computer Sciences (1988), 28 (1), 31-6CODEN: JCISD8; ISSN:0095-2338.The SMILES (simplified mol. input line entry system) chem. notation system is described for information processing. The system is based on principles of mol. graph theory and it allows structure specification by use of a very small and natural grammar well suited for high-speed machine processing. The system is easy to use, has high machine compatibility, and allows many computer applications, including notation generation, const. speed database retrieval, substructure searching, and property prediction models. - 30O’Boyle, N. M. Towards a Universal SMILES representation - A standard method to generate canonical SMILES based on the InChI. J. Cheminf. 2012, 4, 22, DOI: 10.1186/1758-2946-4-22[Crossref], [CAS], Google Scholar30https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XhvVSiur7I&md5=eb59d742c5dec35b5d2c90417acd223dTowards a universal SMILES representation - a standard method to generate canonical SMILES based on the InChIO'Boyle, Noel M.Journal of Cheminformatics (2012), 4 (), 22CODEN: JCOHB3; ISSN:1758-2946. (Chemistry Central Ltd.)Background: There are two line notations of chem. structures that have established themselves in the field: the SMILES string and the InChI string. The InChI aims to provide a unique, or canonical, identifier for chem. structures, while SMILES strings are widely used for storage and interchange of chem. structures, but no std. exists to generate a canonical SMILES string. Results: I describe how to use the InChI canonicalisation to derive a canonical SMILES string in a straightforward way, either incorporating the InChI normalizations (Inchified SMILES) or not (Universal SMILES). This is the first description of a method to generate canonical SMILES that takes stereochem. into account. When tested on the 1.1 m compds. in the ChEMBL database, and a 1 m compd. subset of the PubChem Substance database, no canonicalisation failures were found with Inchified SMILES. Using Universal SMILES, 99.79% of the ChEMBL database was canonicalised successfully and 99.77% of the PubChem subset. Conclusions: The InChI canonicalisation algorithm can successfully be used as the basis for a common std. for canonical SMILES. While challenges remain - such as the development of a std. arom. model for SMILES - the ability to create the same SMILES using different toolkits will mean that for the first time it will be possible to easily compare the chem. models used by different toolkits.
- 31Bjerrum, E. J. SMILES enumeration as data augmentation for neural network modeling of molecules. arXiv (Machine Learning) , May 17, 2017, 703.07076, ver. 2.Google ScholarThere is no corresponding record for this reference.
- 32Bjerrum, E. J.; Sattarov, B. Improving chemical autoencoder latent space and molec- ular de novo generation diversity with heteroencoders. Biomolecules 2018, 8, 131, DOI: 10.3390/biom8040131[Crossref], [CAS], Google Scholar32https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXit12hurjL&md5=c0e3ae0b5ec0126d70e59045568b1a90Improving chemical autoencoder latent space and molecular de novo generation diversity with heteroencodersBjerrum, Esben Jannik; Sattarov, BorisBiomolecules (2018), 8 (4), 131/1-131/17CODEN: BIOMHC; ISSN:2218-273X. (MDPI AG)Chem. autoencoders are attractive models as they combine chem. space navigation with possibilities for de novo mol. generation in areas of interest. This enables them to produce focused chem. libraries around a single lead compd. for employment early in a drug discovery project. Here, it is shown that the choice of chem. representation, such as strings from the simplified mol.-input line-entry system (SMILES), has a large influence on the properties of the latent space. It is further explored to what extent translating between different chem. representations influences the latent space similarity to the SMILES strings or circular fingerprints. By employing SMILES enumeration for either the encoder or decoder, it is found that the decoder has the largest influence on the properties of the latent space. Training a sequence to sequence heteroencoder based on recurrent neural networks (RNNs) with long short-term memory cells (LSTM) to predict different enumerated SMILES strings from the same canonical SMILES string gives the largest similarity between latent space distance and mol. similarity measured as circular fingerprints similarity. Using the output from the code layer in quant. structure activity relationship (QSAR) of five mol. datasets shows that heteroencoder derived vectors markedly outperforms autoencoder derived vectors as well as models built using ECFP4 fingerprints, underlining the increased chem. relevance of the latent space.
- 33Arús-Pous, J.; Johansson, S. V.; Prykhodko, O.; Bjerrum, E. J.; Tyrchan, C.; Rey- mond, J.-L.; Chen, H.; Engkvist, O. Randomized SMILES strings improve the quality of molecular generative models. J. Cheminf. 2019, 11, 71, DOI: 10.1186/s13321-019-0393-0
- 34Moret, M.; Friedrich, L.; Grisoni, F.; Merk, D.; Schneider, G. Generative molecular design in low data regimes. Nature Machine Intelligence 2020, 2, 171– 180, DOI: 10.1038/s42256-020-0160-y
- 35van Deursen, R.; Ertl, P.; Tetko, I. V.; Godin, G. GEN: highly efficient SMILES explorer using autodidactic generative examination networks. J. Cheminf. 2020, 12, 22, DOI: 10.1186/s13321-020-00425-8
- 36Prykhodko, O.; Johansson, S. V.; Kotsias, P.-C.; Arús-Pous, J.; Bjerrum, E. J.; En- gkvist, O.; Chen, H. A de novo molecular generation method using latent vector based generative adversarial network. J. Cheminf. 2019, 11, 74, DOI: 10.1186/s13321-019-0397-9
- 37Heller, S. R.; McNaught, A.; Pletnev, I.; Stein, S.; Tchekhovskoi, D. InChI, the IUPAC International Chemical Identifier. J. Cheminf. 2015, 7, 23, DOI: 10.1186/s13321-015-0068-4[Crossref], [CAS], Google Scholar37https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC2MbpslOrtQ%253D%253D&md5=4acc4f470f8cdb9b4f84558fd3302470InChI, the IUPAC International Chemical IdentifierHeller Stephen R; Stein Stephen; Tchekhovskoi Dmitrii; McNaught Alan; Pletnev IgorJournal of cheminformatics (2015), 7 (), 23 ISSN:1758-2946.This paper documents the design, layout and algorithms of the IUPAC International Chemical Identifier, InChI.
- 38Winter, R.; Montanari, F.; Nóe, F.; Clevert, D.-A. Learning continuous and data- driven molecular descriptors by translating equivalent chemical representations. Chem. Sci. 2019, 10, 1692– 1701, DOI: 10.1039/C8SC04175J[Crossref], [PubMed], [CAS], Google Scholar38https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXit1aqsbnO&md5=57678211dbeac5a8135e2d38c41eeee2Learning continuous and data-driven molecular descriptors by translating equivalent chemical representationsWinter, Robin; Montanari, Floriane; Noe, Frank; Clevert, Djork-ArneChemical Science (2019), 10 (6), 1692-1701CODEN: CSHCCN; ISSN:2041-6520. (Royal Society of Chemistry)There has been a recent surge of interest in using machine learning across chem. space in order to predict properties of mols. or design mols. and materials with the desired properties. Most of this work relies on defining clever feature representations, in which the chem. graph structure is encoded in a uniform way such that predictions across chem. space can be made. In this work, we propose to exploit the powerful ability of deep neural networks to learn a feature representation from low-level encodings of a huge corpus of chem. structures. Our model borrows ideas from neural machine translation: it translates between two semantically equiv. but syntactically different representations of mol. structures, compressing the meaningful information both representations have in common in a low-dimensional representation vector. Once the model is trained, this representation can be extd. for any new mol. and utilized as a descriptor. In fair benchmarks with respect to various human-engineered mol. fingerprints and graph-convolution models, our method shows competitive performance in modeling quant. structure-activity relationships in all analyzed datasets. Addnl., we show that our descriptor significantly outperforms all baseline mol. fingerprints in two ligand-based virtual screening tasks. Overall, our descriptors show the most consistent performances in all expts. The continuity of the descriptor space and the existence of the decoder that permits deducing a chem. structure from an embedding vector allow for exploration of the space and open up new opportunities for compd. optimization and idea generation.
- 39O’Boyle, N.; Dalke, A. DeepSMILES: An Adaptation of SMILES for Use in Machine- Learning of Chemical Structures; preprint, ChemRxiv , September 19, 2018, ver. 1. DOI: 10.26434/chemrxiv.7097960.v1 .
- 40Krenn, M.; Häse, F.; Nigam, A.; Friederich, P.; Aspuru-Guzik, A. Self-referencing em- bedded strings (SELFIES): A 100% robust molecular string representation. Machine Learning: Science and Technology 2020, 1, 045024, DOI: 10.1088/2632-2153/aba947
- 41Faulon, J.-L., Bender, A., Eds. Handbook of chemoinformatics algorithms; Chapman & Hall/CRC mathematical and computational biology series; Chapman & Hall/CRC: Boca Raton, FL, 2010; Chapter 1. OCLC: ocn226357322.
- 42Wishart, D. S. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 2018, 46, D1074– D1082, DOI: 10.1093/nar/gkx1037[Crossref], [PubMed], [CAS], Google Scholar42https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXitlGisbvI&md5=986b28c7ea546596a26dd3ba38f05feeDrugBank 5.0: a major update to the DrugBank database for 2018Wishart, David S.; Feunang, Yannick D.; Guo, An C.; Lo, Elvis J.; Marcu, Ana; Grant, Jason R.; Sajed, Tanvir; Johnson, Daniel; Li, Carin; Sayeeda, Zinat; Assempour, Nazanin; Iynkkaran, Ithayavani; Liu, Yifeng; Maciejewski, Adam; Gale, Nicola; Wilson, Alex; Chin, Lucy; Cummings, Ryan; Le, Diana; Pon, Allison; Knox, Craig; Wilson, MichaelNucleic Acids Research (2018), 46 (D1), D1074-D1082CODEN: NARHAD; ISSN:1362-4962. (Oxford University Press)DrugBank is a web-enabled database contg. comprehensivemol. information about drugs, their mechanisms, their interactions and their targets. First described in 2006, Drug- Bank has continued to evolve over the past 12 years in response to marked improvements to web stds. and changing needs for drug research and development. This year's update, DrugBank 5.0, represents the most significant upgrade to the database in more than 10 years. In many cases, existing data content has grown by 100% or more over the last update. For instance, the total no. of investigational drugs in the database has grown by almost 300%, the no. of drug-drug interactions has grown by nearly 600% and the no. of SNP-assocd. drug effects has grown more than 3000%. Significant improvements have been made to the quantity, quality and consistency of drug indications, drug binding data as well as drug-drug and drug-food interactions. A great deal of brand new data have also been added to DrugBank 5.0. This includes information on the influence of hundreds of drugs on metabolite levels (pharmacometabolomics), gene expression levels (pharmacotranscriptomics) and protein expression levels (pharmacoproteomics). New data have also been added on the status of hundreds of newdrug clin. trials and existing drug repurposing trials. Many other important improvements in the content, interface and performance of the DrugBank website have been made and these should greatly enhance its ease of use, utility and potential applications in many areas of pharmacol. research, pharmaceutical science and drug education.
- 43Gaulton, A.; Bellis, L. J.; Bento, A. P.; Chambers, J.; Davies, M.; Hersey, A.; Light, Y.; McGlinchey, S.; Michalovich, D.; Al-Lazikani, B.; Overington, J. P. ChEMBL: a large- scale bioactivity database for drug discovery. Nucleic Acids Res. 2012, 40, D1100– D1107, DOI: 10.1093/nar/gkr777[Crossref], [PubMed], [CAS], Google Scholar43https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXhs12htbjN&md5=aedf7793e1ca54b6a4fa272ea3ef7d0eChEMBL: a large-scale bioactivity database for drug discoveryGaulton, Anna; Bellis, Louisa J.; Bento, A. Patricia; Chambers, Jon; Davies, Mark; Hersey, Anne; Light, Yvonne; McGlinchey, Shaun; Michalovich, David; Al-Lazikani, Bissan; Overington, John P.Nucleic Acids Research (2012), 40 (D1), D1100-D1107CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)ChEMBL is an Open Data database contg. binding, functional and ADMET information for a large no. of drug-like bioactive compds. These data are manually abstracted from the primary published literature on a regular basis, then further curated and standardized to maximize their quality and utility across a wide range of chem. biol. and drug-discovery research problems. Currently, the database contains 5.4 million bioactivity measurements for more than 1 million compds. and 5200 protein targets. Access is available through a web-based interface, data downloads and web services at: https://www.ebi.ac.uk/chembldb.
- 44Landrum, G. RDKit: open-source cheminformatics software , 2016.
- 45Steinbeck, C.; Hoppe, C.; Kuhn, S.; Floris, M.; Guha, R.; Willighagen, E. Recent De- velopments of the Chemistry Development Kit (CDK) - An Open-Source Java Library for Chemo- and Bioinformatics. Curr. Pharm. Des. 2006, 12, 2111– 2120, DOI: 10.2174/138161206777585274[Crossref], [PubMed], [CAS], Google Scholar45https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XmslWqsL0%253D&md5=4e7e47ffe75b600ee3f81309a4bbb609Recent developments of the Chemistry Development Kit (CDK) - an open-source Java library for chemo- and bioinformaticsSteinbeck, Christoph; Hoppe, Christian; Kuhn, Stefan; Floris, Matteo; Guha, Rajarshi; Willighagen, Egon L.Current Pharmaceutical Design (2006), 12 (17), 2111-2120CODEN: CPDEFP; ISSN:1381-6128. (Bentham Science Publishers Ltd.)The Chem. Development Kit (CDK) provides methods for common tasks in mol. informatics, including 2D and 3D rendering of chem. structures, I/O routines, SMILES parsing and generation, ring searches, isomorphism checking, structure diagram generation, etc. Implemented in Java, it is used both for server-side computational services, possibly equipped with a web interface, as well as for applications and client-side applets. This article introduces the CDK's new QSAR capabilities and the recently introduced interface to statistical software.
- 46Sun, J.; Jeliazkova, N.; Chupakhin, V.; Golib-Dzib, J.-F.; Engkvist, O.; Carlsson, L.; Wegner, J.; Ceulemans, H.; Georgiev, I.; Jeliazkov, V.; Kochev, N.; Ashby, T. J.; Chen, H. ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomics. J. Cheminf. 2017, 9, 17, DOI: 10.1186/s13321-017-0222-2[Crossref], [CAS], Google Scholar46https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXlsFGhsbc%253D&md5=1bd39132077c5c91f19bae5ea47c1b27ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomicsSun, Jiangming; Jeliazkova, Nina; Chupakin, Vladimir; Golib-Dzib, Jose-Felipe; Engkvist, Ola; Carlsson, Lars; Wegner, Joerg; Ceulemans, Hugo; Georgiev, Ivan; Jeliazkov, Vedrin; Kochev, Nikolay; Ashby, Thomas J.; Chen, HongmingJournal of Cheminformatics (2017), 9 (), 17/1-17/9CODEN: JCOHB3; ISSN:1758-2946. (Chemistry Central Ltd.)Chemogenomics data generally refers to the activity data of chem. compds. on an array of protein targets and represents an important source of information for building in silico target prediction models. The increasing vol. of chemogenomics data offers exciting opportunities to build models based on Big Data. Prepg. a high quality data set is a vital step in realizing this goal and this work aims to compile such a comprehensive chemogenomics dataset. This dataset comprises over 70 million SAR data points from publicly available databases (PubChem and ChEMBL) including structure, target information and activity annotations. Our aspiration is to create a useful chemogenomics resource reflecting industry-scale data not only for building predictive models of in silico polypharmacol. and offtarget effects but also for the validation of chemoinformatics approaches in general.
- 47Irwin, J. J.; Sterling, T.; Mysinger, M. M.; Bolstad, E. S.; Coleman, R. G. ZINC: A Free Tool to Discover Chemistry for Biology. J. Chem. Inf. Model. 2012, 52, 1757– 1768, DOI: 10.1021/ci3001277[ACS Full Text
], [CAS], Google Scholar
47https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XmvFGnsrg%253D&md5=97f2ede64afc6b5e3ea2f279e38e32a0ZINC: A Free Tool to Discover Chemistry for BiologyIrwin, John J.; Sterling, Teague; Mysinger, Michael M.; Bolstad, Erin S.; Coleman, Ryan G.Journal of Chemical Information and Modeling (2012), 52 (7), 1757-1768CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)ZINC is a free public resource for ligand discovery. The database contains over twenty million com. available mols. in biol. relevant representations that may be downloaded in popular ready-to-dock formats and subsets. The Web site also enables searches by structure, biol. activity, phys. property, vendor, catalog no., name, and CAS no. Small custom subsets may be created, edited, shared, docked, downloaded, and conveyed to a vendor for purchase. The database is maintained and curated for a high purchasing success rate and is freely available at zinc.docking.org. - 48Shivanyuk, A.; Ryabukhin, S.; Tolmachev, A.; Bogolyubsky, A.; Mykytenko, D.; Chupryna, A.; Heilman, W.; Kostyuk, A. Enamine real database: Making chemical diversity real. Chem. Today 2007, 25, 58– 59[CAS], Google Scholar48https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXktVSns7w%253D&md5=648ca29e7a3a36591a42d88349789e80Enamine real database: making chemical diversity realShivanyuk, Alexander N.; Ryabukhin, Sergey V.; Bogolyubsky, Andrey V.; Mykytenko, Dmytro M.; Chupryna, Alexander A.; Heilman, William; Kostyuk, Alexander N.; Tolmachev, Andrey A.Chimica Oggi (2007), 25 (6), 58-59CODEN: CHOGDS; ISSN:0392-839X. (Tekno Scienze)The Enamine REAL DataBase (RDB) covers rigorously validated chem. space of over 29,000,000 virtual HTS compds., over 10,000,000 of which comply to drug likeness Rule-of-5 stds. The high efficiency of our RDB methodol. is based on 30 optimized reactions, 54 optimized chem. procedures applied to 18,000 proprietary in house and 9000 purchased building blocks and our efficient algorithms for calcg. the synthetic feasibility of all virtual structures. Optimized schemes for RDB prodn. allows the synthesis of 20,000 compds. a month with an av. feasibility rate of 65%.
- 49Huang, R.; Xia, M.; Nguyen, D.-T.; Zhao, T.; Sakamuru, S.; Zhao, J.; Shahane, S. A.; Rossoshek, A.; Simeonov, A. Tox21Challenge to build predictive models of nuclear receptor and stress response pathways as mediated by exposure to environmental chemicals and drugs. Front. Environ. Sci. 2016, 3, 85, DOI: 10.3389/fenvs.2015.00085
- 50Ramakrishnan, R.; Hartmann, M.; Tapavicza, E.; Von Lilienfeld, O. A. Electronic spectra from TDDFT and machine learning in chemical space. J. Chem. Phys. 2015, 143, 084111, DOI: 10.1063/1.4928757[Crossref], [PubMed], [CAS], Google Scholar50https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXhsVSlt73O&md5=b439f44c75bb8f99906d3c920bfe7c6fElectronic spectra from TDDFT and machine learning in chemical spaceRamakrishnan, Raghunathan; Hartmann, Mia; Tapavicza, Enrico; von Lilienfeld, O. AnatoleJournal of Chemical Physics (2015), 143 (8), 084111/1-084111/8CODEN: JCPSA6; ISSN:0021-9606. (American Institute of Physics)Due to its favorable computational efficiency, time-dependent (TD) d. functional theory (DFT) enables the prediction of electronic spectra in a high-throughput manner across chem. space. Its predictions, however, can be quite inaccurate. We resolve this issue with machine learning models trained on deviations of ref. second-order approx. coupled-cluster (CC2) singles and doubles spectra from TDDFT counterparts, or even from DFT gap. We applied this approach to low-lying singlet-singlet vertical electronic spectra of over 20 000 synthetically feasible small org. mols. with up to eight CONF atoms. The prediction errors decay monotonously as a function of training set size. For a training set of 10 000 mols., CC2 excitation energies can be reproduced to within ±0.1 eV for the remaining mols. Anal. of our spectral database via chromophore counting suggests that even higher accuracies can be achieved. Based on the evidence collected, we discuss open challenges assocd. with data-driven modeling of high-lying spectra and transition intensities. (c) 2015 American Institute of Physics.
- 51Ramakrishnan, R.; Dral, P. O.; Rupp, M.; Von Lilienfeld, O. A. Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data 2014, 1, 140022, DOI: 10.1038/sdata.2014.22[Crossref], [PubMed], [CAS], Google Scholar51https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXks1aisLo%253D&md5=feaffe204e7139a5fcd685bc2c6841fcQuantum chemistry structures and properties of 134 kilo moleculesRamakrishnan, Raghunathan; Dral, Pavlo O.; Rupp, Matthias; von Lilienfeld, O. AnatoleScientific Data (2014), 1 (), 140022CODEN: SDCABS; ISSN:2052-4463. (Nature Publishing Group)Computational de novo design of new drugs and materials requires rigorous and unbiased exploration of chem. compd. space. However, large uncharted territories persist due to its size scaling combinatorially with mol. size. We report computed geometric, energetic, electronic, and thermodn. properties for 134k stable small org. mols. made up of CHONF. These mols. correspond to the subset of all 133,885 species with up to nine heavy atoms (CONF) out of the GDB-17 chem. universe of 166 billion org. mols. We report geometries minimal in energy, corresponding harmonic frequencies, dipole moments, polarizabilities, along with energies, enthalpies, and free energies of atomization. All properties were calcd. at the B3LYP/6-31G(2df,p) level of quantum chem. Furthermore, for the predominant stoichiometry, C7H10O2, there are 6,095 constitutional isomers among the 134k mols. We report energies, enthalpies, and free energies of atomization at the more accurate G4MP2 level of theory for all of them. As such, this data set provides quantum chem. properties for a relevant, consistent, and comprehensive chem. space of small org. mols. This database may serve the benchmarking of existing methods, development of new methods, such as hybrid quantum mechanics/machine learning, and systematic identification of structure-property relationships.
- 52Wang, R.; Fang, X.; Lu, Y.; Wang, S. The PDBbind database: Collection of binding affinities for protein- ligand complexes with known three-dimensional structures. J. Med. Chem. 2004, 47, 2977– 2980, DOI: 10.1021/jm030580l[ACS Full Text
], [CAS], Google Scholar
52https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2cXjs1Sjs74%253D&md5=86e609172307402d8b0d4589b1270a2fThe PDBbind database: Collection of binding affinities for protein-ligand complexes with known three-dimensional structuresWang, Renxiao; Fang, Xueliang; Lu, Yipin; Wang, ShaomengJournal of Medicinal Chemistry (2004), 47 (12), 2977-2980CODEN: JMCMAR; ISSN:0022-2623. (American Chemical Society)We have screened the entire Protein Data Bank (Release No. 103, Jan. 2003) and identified 5671 protein-ligand complexes out of 19 621 exptl. structures. A systematic examn. of the primary refs. of these entries has led to a collection of binding affinity data (Kd, Ki, and IC50) for a total of 1359 complexes. The outcomes of this project have been organized into a Web-accessible database named the PDBbind database. - 53Cho, K.; Van Merrïenboer, B.; Bahdanau, D.; Bengio, Y. On the properties of neural machine translation: Encoder-decoder approaches. arXiv (Computation and Language) , October 7, 2014, 1409.1259, ver. 2.Google ScholarThere is no corresponding record for this reference.
- 54Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural computation 1997, 9, 1735– 1780, DOI: 10.1162/neco.1997.9.8.1735[Crossref], [PubMed], [CAS], Google Scholar54https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADyaK1c%252FhvVahsQ%253D%253D&md5=5da426ddc18e5bc1972e520bbcc33becLong short-term memoryHochreiter S; Schmidhuber JNeural computation (1997), 9 (8), 1735-80 ISSN:0899-7667.Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient-based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O(1). Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.
- 55Goodfellow, I. J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. arXiv (Machine Learning) , June 10, 2014, 1406.2661, ver. 1.Google ScholarThere is no corresponding record for this reference.
- 56Kingma, D. P.; Welling, M. Auto-encoding variational bayes. arXiv (Machine Learning) , May 1, 2014, 1312.6114, ver. 10.Google ScholarThere is no corresponding record for this reference.
- 57Makhzani, A.; Shlens, J.; Jaitly, N.; Goodfellow, I.; Frey, B. Adversarial autoencoders. arXiv (Machine Learning) , May 25, 2016, 1511.05644, ver. 2..Google ScholarThere is no corresponding record for this reference.
- 58Olivecrona, M.; Blaschke, T.; Engkvist, O.; Chen, H. Molecular de-novo design through deep reinforcement learning. J. Cheminf. 2017, 9, 48, DOI: 10.1186/s13321-017-0235-x[Crossref], [CAS], Google Scholar58https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC1M7mtVKisg%253D%253D&md5=68f5a1219ef81fbd52a5c8911cacbbfcMolecular de-novo design through deep reinforcement learningOlivecrona Marcus; Blaschke Thomas; Engkvist Ola; Chen HongmingJournal of cheminformatics (2017), 9 (1), 48 ISSN:1758-2946.This work introduces a method to tune a sequence-based generative model for molecular de novo design that through augmented episodic likelihood can learn to generate structures with certain specified desirable properties. We demonstrate how this model can execute a range of tasks such as generating analogues to a query structure and generating compounds predicted to be active against a biological target. As a proof of principle, the model is first trained to generate molecules that do not contain sulphur. As a second example, the model is trained to generate analogues to the drug Celecoxib, a technique that could be used for scaffold hopping or library expansion starting from a single molecule. Finally, when tuning the model towards generating compounds predicted to be active against the dopamine receptor type 2, the model generates structures of which more than 95% are predicted to be active, including experimentally confirmed actives that have not been included in either the generative model nor the activity prediction model. Graphical abstract .
- 59Gupta, A.; Müller, A. T.; Huisman, B. J.; Fuchs, J. A.; Schneider, P.; Schneider, G. Generative Recurrent Networks for De Novo Drug Design. Mol. Inf. 2018, 37, 1700111, DOI: 10.1002/minf.201700111
- 60Guimaraes, G. L.; Sanchez-Lengeling, B.; Outeiral, C.; Farias, P. L. C.; Aspuru- Guzik, A. Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models. arXiv (Machine Learning) , February 7, 2018, 1705.10843, ver. 3.Google ScholarThere is no corresponding record for this reference.
- 61Lim, J.; Ryu, S.; Kim, J. W.; Kim, W. Y. Molecular generative model based on conditional variational autoencoder for de novo molecular design. J. Cheminf. 2018, 10, 31, DOI: 10.1186/s13321-018-0286-7[Crossref], [CAS], Google Scholar61https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXmtFWhtrw%253D&md5=8a1262077f9d6ffa5ec0385ed1a69f6fMolecular generative model based on conditional variational autoencoder for de novo molecular designLim, Jaechang; Ryu, Seongok; Kim, Jin Woo; Kim, Woo YounJournal of Cheminformatics (2018), 10 (), 31/1-31/9CODEN: JCOHB3; ISSN:1758-2946. (Chemistry Central Ltd.)We propose a mol. generative model based on the conditional variational autoencoder for de novo mol. design. It is specialized to control multiple mol. properties simultaneously by imposing them on a latent space. As a proof of concept, we demonstrate that it can be used to generate drug-like mols. with five target properties. We were also able to adjust a single property without changing the others and to manipulate it beyond the range of the dataset.
- 62Merk, D.; Grisoni, F.; Friedrich, L.; Schneider, G. Tuning artificial intelligence on the de novo design of natural-product-inspired retinoid X receptor modulators. Commun. Chem. 2018, 1, 68, DOI: 10.1038/s42004-018-0068-1
- 63Merk, D.; Friedrich, L.; Grisoni, F.; Schneider, G. De Novo Design of Bioactive Small Molecules by Artificial Intelligence. Mol. Inf. 2018, 37, 1700153, DOI: 10.1002/minf.201700153
- 64Polykovskiy, D.; Zhebrak, A.; Vetrov, D.; Ivanenkov, Y.; Aladinskiy, V.; Mamoshina, P.; Bozdaganyan, M.; Aliper, A.; Zhavoronkov, A.; Kadurin, A. Entangled Conditional Adversarial Autoencoder for de Novo Drug Discovery. Mol. Pharmaceutics 2018, 15, 4398– 4405, DOI: 10.1021/acs.molpharmaceut.8b00839[ACS Full Text
], [CAS], Google Scholar
64https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXhs1ensL%252FM&md5=58796965d266a5b78ca53c0f11c73999Entangled Conditional Adversarial Autoencoder for de Novo Drug DiscoveryPolykovskiy, Daniil; Zhebrak, Alexander; Vetrov, Dmitry; Ivanenkov, Yan; Aladinskiy, Vladimir; Mamoshina, Polina; Bozdaganyan, Marine; Aliper, Alexander; Zhavoronkov, Alex; Kadurin, ArturMolecular Pharmaceutics (2018), 15 (10), 4398-4405CODEN: MPOHBP; ISSN:1543-8384. (American Chemical Society)Modern computational approaches and machine learning techniques accelerate the invention of new drugs. Generative models can discover novel mol. structures within hours, while conventional drug discovery pipelines require months of work. In this article, we propose a new generative architecture, entangled conditional adversarial autoencoder that generates mol. structures based on various properties, such as activity against a specific protein, soly., or ease of synthesis. We apply the proposed model to generate a novel inhibitor of Janus kinase 3, implicated in rheumatoid arthritis, psoriasis and vitiligo. The discovered mol. was tested in vitro and showed good activity and selectivity. - 65Li, Y.; Vinyals, O.; Dyer, C.; Pascanu, R.; Battaglia, P. Learning Deep Generative Models of Graphs. arXiv (Machine Learning) , March 8, 2018, 1803.03324, ver. 1.Google ScholarThere is no corresponding record for this reference.
- 66Liu, Q.; Allamanis, M.; Brockschmidt, M.; Gaunt, A. Constrained graph variational autoencoders for molecule design. Adv. Neural Inf. Process. Syst. 2018, 7795– 7804Google ScholarThere is no corresponding record for this reference.
- 67Mercado, R.; Rastemo, T.; Lindelof, E.; Klambauer, G.; Engkvist, O.; Chen, H.; Bjerrum, E. J. Graph networks for molecular design. Mach. Learn.: Sci. Technol. 2021, 2, 025023, DOI: 10.1088/2632-2153/abcf91
- 68De Cao, N.; Kipf, T. MolGAN: An implicit generative model for small molecular graphs. arXiv (Machine Learning) , May 30, 2018, 1805.11973, ver. 1.Google ScholarThere is no corresponding record for this reference.
- 69Simonovsky, M.; Komodakis, N. Graphvae: Towards generation of small graphs us- ing variational autoencoders. International Conference on Artificial Neural Networks. 2018, 11139, 412– 422, DOI: 10.1007/978-3-030-01418-6_41
- 70Ma, T.; Chen, J.; Xiao, C. Constrained Generation of Semantically Valid Graphs via Regularizing Variational Autoencoders. Adv. Neural Inf. Process. Syst. 2018, 7113– 7124Google ScholarThere is no corresponding record for this reference.
- 71Hawkins, P. C. D. Conformation Generation: The State of the Art. J. Chem. Inf. Model. 2017, 57, 1747– 1756, DOI: 10.1021/acs.jcim.7b00221[ACS Full Text
], [CAS], Google Scholar
71https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhtFagu7bE&md5=776818e66798d1987e015440a28e208eConformation Generation: The State of the ArtHawkins, Paul C. D.Journal of Chemical Information and Modeling (2017), 57 (8), 1747-1756CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)The generation of conformations for small mols. is a problem of continuing interest in cheminformatics and computational drug discovery. This review will present an overview of methods used to sample conformational space, focusing on those methods designed for org. mols. commonly of interest in drug discovery. Different approaches to both the sampling of conformational space and the scoring of conformational stability will be compared and contrasted, with an emphasis on those methods suitable for conformer sampling of large nos. of drug-like mols. Particular attention will be devoted to the appropriate utilization of information from exptl. solid-state structures in validating and evaluating the performance of these tools. The review will conclude with some areas worthy of further investigation. - 72Skalic, M.; Jiḿenez, J.; Sabbadin, D.; De Fabritiis, G. Shape-Based Generative Mod- eling for de Novo Drug Design. J. Chem. Inf. Model. 2019, 59, 1205– 1214, DOI: 10.1021/acs.jcim.8b00706[ACS Full Text
], [CAS], Google Scholar
72https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXjtVShu78%253D&md5=8eb58a7d5aa780431d285a63dfed5765Shape-based generative modeling for de novo drug designSkalic, Miha; Jimenez, Jose; Sabbadin, Davide; De Fabritiis, GianniJournal of Chemical Information and Modeling (2019), 59 (3), 1205-1214CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)In this work, we propose a machine learning approach to generate novel mols. starting from a seed compd., its three-dimensional (3D) shape, and its pharmacophoric features. The pipeline draws inspiration from generative models used in image anal. and represents a first example of the de novo design of lead-like mols. guided by shape-based features. A variational autoencoder is used to perturb the 3D representation of a compd., followed by a system of convolutional and recurrent neural networks that generate a sequence of SMILES tokens. The generative design of novel scaffolds and functional groups can cover unexplored regions of chem. space that still possess lead-like properties. - 73Gebauer, N.; Gastegger, M.; Schütt, K. T. Symmetry-adapted generation of 3d point sets for the targeted discovery of molecules. NeurIPS . 2019.Google ScholarThere is no corresponding record for this reference.
- 74Ragoza, M.; Masuda, T.; Koes, D. R. Learning a Continuous Representation of 3D Molecular Structures with Deep Generative Models. arXiv (Quantitative Methods) , November 15, 2020, 2010.08687, ver. 3.Google ScholarThere is no corresponding record for this reference.
- 75Preuer, K.; Renz, P.; Unterthiner, T.; Hochreiter, S.; Klambauer, G. Fŕechet ChemNet Distance: A Metric for Generative Models for Molecules in Drug Discovery. J. Chem. Inf. Model. 2018, 58, 1736– 1741, DOI: 10.1021/acs.jcim.8b00234[ACS Full Text
], [CAS], Google Scholar
75https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXhsFejsLrL&md5=dc8e10eb0f85a7f27b48da91dcb21a27Fr´echet ChemNet Distance: A Metric for Generative Models for Molecules in Drug DiscoveryPreuer, Kristina; Renz, Philipp; Unterthiner, Thomas; Hochreiter, Sepp; Klambauer, GuenterJournal of Chemical Information and Modeling (2018), 58 (9), 1736-1741CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)The new wave of successful generative models in machine learning has increased the interest in deep learning driven de novo drug design. However, method comparison is difficult because of various flaws of the currently employed evaluation metrics. The authors propose an evaluation metric for generative models called Fre´chet ChemblNet distance (FCD). The FCD's advantage over previous metrics is that it can detect whether generated mols. are diverse and have similar chem. and biol. properties as real mols. - 76Arús-Pous, J.; Blaschke, T.; Ulander, S.; Reymond, J.-L.; Chen, H.; Engkvist, O. Exploring the GDB-13 chemical space using deep generative models. J. Cheminf. 2019, 11, 1– 14, DOI: 10.1186/s13321-019-0341-z
- 77Brown, N.; Fiscato, M.; Segler, M. H.; Vaucher, A. C. GuacaMol: benchmarking models for de novo molecular design. J. Chem. Inf. Model. 2019, 59, 1096– 1108, DOI: 10.1021/acs.jcim.8b00839[ACS Full Text
], [CAS], Google Scholar
77https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXltVWrsbY%253D&md5=d3fb616b81a4b146cf77950a1c92e4d1GuacaMol: Benchmarking Models for de Novo Molecular DesignBrown, Nathan; Fiscato, Marco; Segler, Marwin H. S.; Vaucher, Alain C.Journal of Chemical Information and Modeling (2019), 59 (3), 1096-1108CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)De novo design seeks to generate mols. with required property profiles by virtual design-make-test cycles. With the emergence of deep learning and neural generative models in many application areas, models for mol. design based on neural networks appeared recently and show promising results. However, the new models have not been profiled on consistent tasks, and comparative studies to well-established algorithms have only seldom been performed. To standardize the assessment of both classical and neural models for de novo mol. design, we propose an evaluation framework, GuacaMol, based on a suite of standardized benchmarks. The benchmark tasks encompass measuring the fidelity of the models to reproduce the property distribution of the training sets, the ability to generate novel mols., the exploration and exploitation of chem. space, and a variety of single and multiobjective optimization tasks. The benchmarking open-source Python code and a leaderboard can be found on https://benevolent.ai/guacamol. - 78Polykovskiy, D.; Zhebrak, A.; Sanchez-Lengeling, B.; Golovanov, S.; Tatanov, O.; Belyaev, S.; Kurbanov, R.; Artamonov, A.; Aladinskiy, V.; Veselov, M. Molecular sets (MOSES): a benchmarking platform for molecular generation models. Front. Pharmacol. 2020, 11, 11, DOI: 10.3389/fphar.2020.565644
- 79Renz, P.; Van Rompaey, D.; Wegner, J. K.; Hochreiter, S.; Klambauer, G. On fail- ure modes in molecule generation and optimization. Drug Discovery Today: Technol. 2019, 32–33, 55– 63, DOI: 10.1016/j.ddtec.2020.09.003[Crossref], [PubMed], [CAS], Google Scholar79https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BB3svkt1yqtg%253D%253D&md5=ca243ca3af904c3fa64f9e0e2c2b2c2cOn failure modes in molecule generation and optimizationRenz Philipp; Hochreiter Sepp; Klambauer Gunter; Van Rompaey Dries; Wegner Jorg KurtDrug discovery today. Technologies (2019), 32-33 (), 55-63 ISSN:.There has been a wave of generative models for molecules triggered by advances in the field of Deep Learning. These generative models are often used to optimize chemical compounds towards particular properties or a desired biological activity. The evaluation of generative models remains challenging and suggested performance metrics or scoring functions often do not cover all relevant aspects of drug design projects. In this work, we highlight some unintended failure modes in molecular generation and optimization and how these evade detection by current performance metrics.
- 80Cieplinski, T.; Danel, T.; Podlewska, S.; Jastrzebski, S. We should at least be able to Design Molecules that Dock Well. arXiv (Biomolecules) December 28, 2020, 2006.16955, ver. 3.Google ScholarThere is no corresponding record for this reference.
- 81Zhang, J.; Mercado, R.; Engkvist, O.; Chen, H. Comparative study of deep generative models on chemical space coverage. ChemRxiv , May 2, 2021, ver. 3. DOI: 10.26434/chemrxiv.13234289.v3 .
- 82Blaschke, T.; Arús-Pous, J.; Chen, H.; Margreitter, C.; Tyrchan, C.; Engkvist, O.; Papadopoulos, K.; Patronov, A. REINVENT 2.0: An AI Tool for De Novo Drug Design. J. Chem. Inf. Model. 2020, 60, 5918, DOI: 10.1021/acs.jcim.0c00915[ACS Full Text
], [CAS], Google Scholar
82https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXitFOnsbbJ&md5=b06445c0516f122adff2e7c82d7ca70cREINVENT 2.0: An AI Tool for De Novo Drug DesignBlaschke, Thomas; Arus-Pous, Josep; Chen, Hongming; Margreitter, Christian; Tyrchan, Christian; Engkvist, Ola; Papadopoulos, Kostas; Patronov, AtanasJournal of Chemical Information and Modeling (2020), 60 (12), 5918-5922CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)In the past few years, we have witnessed a renaissance of the field of mol. de novo drug design. The advancements in deep learning and artificial intelligence (AI) have triggered an avalanche of ideas on how to translate such techniques to a variety of domains including the field of drug design. A range of architectures have been devised to find the optimal way of generating chem. compds. by using either graph- or string (SMILES)-based representations. With this application note, we aim to offer the community a prodn.-ready tool for de novo design, called REINVENT. It can be effectively applied on drug discovery projects that are striving to resolve either exploration or exploitation problems while navigating the chem. space. It can facilitate the idea generation process by bringing to the researcher's attention the most promising compds. REINVENT's code is publicly available at https://github.com/MolecularAI/Reinvent. - 83Bung, N.; Krishnan, S. R.; Bulusu, G.; Roy, A. De novo design of new chemical entities for SARS-CoV-2 using artificial intelligence. Future Med. Chem. 2021, 13, 575, DOI: 10.4155/fmc-2020-0262[Crossref], [PubMed], [CAS], Google Scholar83https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3MXmtlOkurY%253D&md5=4ad8a669ca9c653df25f57c508fdefa0De novo design of new chemical entities for SARS-CoV-2 using artificial intelligenceBung, Navneet; Krishnan, Sowmya R.; Bulusu, Gopalakrishnan; Roy, ArijitFuture Medicinal Chemistry (2021), 13 (6), 575-585CODEN: FMCUA7; ISSN:1756-8919. (Newlands Press Ltd.)The novel coronavirus SARS-CoV-2 has severely affected the health and economy of several countries. Multiple studies are in progress to design novel therapeutics against the potential target proteins in SARS-CoV-2, including 3CL protease, an essential protein for virus replication. In this study we employed deep neural network-based generative and predictive models for de novo design of small mols. capable of inhibiting the 3CL protease. The generative model was optimized using transfer learning and reinforcement learning to focus around the chem. space corresponding to the protease inhibitors. Multiple physicochem. property filters and virtual screening score were used for the final screening. We have identified 33 potential compds. as ideal candidates for further synthesis and testing against SARS-CoV-2.
- 84Li, Y.; Zhang, L.; Liu, Z. Multi-objective de novo drug design with conditional graph generative model. J. Cheminf. 2018, 10, 33, DOI: 10.1186/s13321-018-0287-6[Crossref], [CAS], Google Scholar84https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXmtFWiu7s%253D&md5=6cef99e5a789668f55ba72e4e65da160Multi-objective de novo drug design with conditional graph generative modelLi, Yibo; Zhang, Liangren; Liu, ZhenmingJournal of Cheminformatics (2018), 10 (), 33/1-33/24CODEN: JCOHB3; ISSN:1758-2946. (Chemistry Central Ltd.)Recently, deep generative models have revealed itself as a promising way of performing de novo mol. design. However, previous research has focused mainly on generating SMILES strings instead of mol. graphs. Although available, current graph generative models are are often too general and computationally expensive. In this work, a new de novo mol. design framework is proposed based on a type of sequential graph generators that do not use atom level recurrent units. Compared with previous graph generative models, the proposed method is much more tuned for mol. generation and has been scaled up to cover significantly larger mols. in the ChEMBL database. It is shown that the graph-based model outperforms SMILES based models in a variety of metrics, esp. in the rate of valid outputs. For the application of drug design tasks, conditional graph generative model is employed. This method offers highe flexibility and is suitable for generation based on multiple objectives. The results have demonstrated that this approach can be effectively applied to solve several drug design problems, including the generation of compds. contg. a given scaffold, compds. with specific drug-likeness and synthetic accessibility requirements, as well as dual inhibitors against JNK3 and GSK-3ss.
- 85Blaschke, T.; Engkvist, O.; Bajorath, J.; Chen, H. Memory-assisted reinforcement learning for diverse molecular de novo design. J. Cheminf. 2020, 12, 1– 17, DOI: 10.1186/s13321-020-00473-0
- 86Zhavoronkov, A. Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat. Biotechnol. 2019, 37, 1038– 1040, DOI: 10.1038/s41587-019-0224-x[Crossref], [PubMed], [CAS], Google Scholar86https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXhs12gurnM&md5=b15262b61b9172ab2bc37e534a70f010Deep learning enables rapid identification of potent DDR1 kinase inhibitorsZhavoronkov, Alex; Ivanenkov, Yan A.; Aliper, Alex; Veselov, Mark S.; Aladinskiy, Vladimir A.; Aladinskaya, Anastasiya V.; Terentiev, Victor A.; Polykovskiy, Daniil A.; Kuznetsov, Maksim D.; Asadulaev, Arip; Volkov, Yury; Zholus, Artem; Shayakhmetov, Rim R.; Zhebrak, Alexander; Minaeva, Lidiya I.; Zagribelnyy, Bogdan A.; Lee, Lennart H.; Soll, Richard; Madge, David; Xing, Li; Guo, Tao; Aspuru-Guzik, AlanNature Biotechnology (2019), 37 (9), 1038-1040CODEN: NABIF9; ISSN:1087-0156. (Nature Research)We have developed a deep generative model, generative tensorial reinforcement learning (GENTRL), for de novo small-mol. design. GENTRL optimizes synthetic feasibility, novelty, and biol. activity. We used GENTRL to discover potent inhibitors of discoidin domain receptor 1 (DDR1), a kinase target implicated in fibrosis and other diseases, in 21 days. Four compds. were active in biochem. assays, and two were validated in cell-based assays. One lead candidate was tested and demonstrated favorable pharmacokinetics in mice.
- 87Popova, M.; Shvets, M.; Oliva, J.; Isayev, O. MolecularRNN: Generating real- istic molecular graphs with optimized properties. arXiv (Machine Learning) , May 31, 2019, 1905.13372, ver. 1.Google ScholarThere is no corresponding record for this reference.
- 88Sanchez-Lengeling, B.; Outeiral, C.; Guimaraes, G. L.; Aspuru-Guzik, A. Optimizing distributions over molecular space. An objective-reinforced generative adversarial network for inverse-design chemistry (ORGANIC). ChemRxiv , August 18, 2017, ver. 3. DOI: 10.26434/chemrxiv.5309668.v3 .
- 89Putin, E.; Asadulaev, A.; Vanhaelen, Q.; Ivanenkov, Y.; Aladinskaya, A. V.; Aliper, A.; Zhavoronkov, A. Adversarial Threshold Neural Computer for Molecular de Novo De- sign. Mol. Pharmaceutics 2018, 15, 4386– 4397, DOI: 10.1021/acs.molpharmaceut.7b01137[ACS Full Text
], [CAS], Google Scholar
89https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXlt1Kks7c%253D&md5=a83d9ef283d643d2b8e81f050c4e7be2Adversarial threshold neural computer for molecular de novo designPutin, Evgeny; Asadulaev, Arip; Vanhaelen, Quentin; Ivanenkov, Yan; Aladinskaya, Anastasia V.; Aliper, Alex; Zhavoronkov, AlexMolecular Pharmaceutics (2018), 15 (10), 4386-4397CODEN: MPOHBP; ISSN:1543-8384. (American Chemical Society)In this article, we propose the deep neural network Adversarial Threshold Neural Computer (ATNC). The ATNC model is intended for the de novo design of novel small-mol. org. structures. The model is based on generative adversarial network architecture and reinforcement learning. ATNC uses a Differentiable Neural Computer as a generator and has a new specific block, called adversarial threshold (AT). AT acts as a filter between the agent (generator) and the environment (discriminator + objective reward functions). Furthermore, to generate more diverse mols. we introduce a new objective reward function named Internal Diversity Clustering (IDC). In this work, ATNC is tested and compared with the Org. model. Both models were trained on the SMILES string representation of the mols., using four objective functions (internal similarity, Muegge druglikeness filter, presence or absence of sp3-rich fragments, and IDC). The SMILES representations of 15K druglike mols. from the ChemDiv collection were used as a training data set. For the different functions, ATNC outperforms Org. Combined with the IDC, ATNC generates 72% of valid and 77% of unique SMILES strings, while Org. generates only 7% of valid and 86% of unique SMILES strings. For each set of mols. generated by ATNC and Org., we analyzed distributions of four mol. descriptors (no. of atoms, mol. wt., logP, and tpsa) and calcd. five chem. statistical features (internal diversity, no. of unique heterocycles, no. of clusters, no. of singletons, and no. of compds. that have not been passed through medicinal chem. filters). Anal. of key mol. descriptors and chem. statistical features demonstrated that the mols. generated by ATNC elicited better druglikeness properties. We also performed in vitro validation of the mols. generated by ATNC; results indicated that ATNC is an effective method for producing hit compds. - 90Putin, E.; Asadulaev, A.; Ivanenkov, Y.; Aladinskiy, V.; Sanchez-Lengeling, B.; Aspuru-Guzik, A.; Zhavoronkov, A. Reinforced Adversarial Neural Computer for de Novo Molecular Design. J. Chem. Inf. Model. 2018, 58, 1194– 1204, DOI: 10.1021/acs.jcim.7b00690[ACS Full Text
], [CAS], Google Scholar
90https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXpsVChtrs%253D&md5=b10c44dcadf9fb1afc4e65cc7469730fReinforced Adversarial Neural Computer for de Novo Molecular DesignPutin, Evgeny; Asadulaev, Arip; Ivanenkov, Yan; Aladinskiy, Vladimir; Sanchez-Lengeling, Benjamin; Aspuru-Guzik, Alan; Zhavoronkov, AlexJournal of Chemical Information and Modeling (2018), 58 (6), 1194-1204CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)In silico modeling is a crucial milestone in modern drug design and development. Although computer-aided approaches in this field are well-studied, the application of deep learning methods in this research area is at the beginning. In this work, we present an original deep neural network (DNN) architecture named RANC (Reinforced Adversarial Neural Computer) for the de novo design of novel small-mol. org. structures based on the generative adversarial network (GAN) paradigm and reinforcement learning (RL). As a generator RANC uses a differentiable neural computer (DNC), a category of neural networks, with increased generation capabilities due to the addn. of an explicit memory bank, which can mitigate common problems found in adversarial settings. The comparative results have shown that RANC trained on the SMILES string representation of the mols. outperforms its first DNN-based counterpart Org. by several metrics relevant to drug discovery: the no. of unique structures, passing medicinal chem. filters (MCFs), Muegge criteria, and high QED scores. RANC is able to generate structures that match the distributions of the key chem. features/descriptors (e.g., MW, logP, TPSA) and lengths of the SMILES strings in the training data set. Therefore, RANC can be reasonably regarded as a promising starting point to develop novel mols. with activity against different biol. targets or pathways. In addn., this approach allows scientists to save time and covers a broad chem. space populated with novel and diverse compds. - 91You, J.; Liu, B.; Ying, Z.; Pande, V.; Leskovec, J. Graph convolutional policy network for goal-directed molecular graph generation. Adv. Neural Inf. Process. Syst. 2018, 31, 6410– 6421Google ScholarThere is no corresponding record for this reference.
- 92Karimi, M.; Hasanzadeh, A.; Shen, Y. Network-principled deep generative models for designing drug combinations as graph sets. Bioinformatics 2020, 36, i445– i454, DOI: 10.1093/bioinformatics/btaa317[Crossref], [PubMed], [CAS], Google Scholar92https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXis1ymt7zE&md5=b737e2db6e7e76d0753debee66246ed3Network-principled deep generative models for designing drug combinations as graph setsKarimi, Mostafa; Hasanzadeh, Arman; Shen, YangBioinformatics (2020), 36 (Suppl._1), i445-i454CODEN: BOINFP; ISSN:1367-4811. (Oxford University Press)Motivation: Combination therapy has shown to improve therapeutic efficacy while reducing side effects. Importantly, it has become an indispensable strategy to overcome resistance in antibiotics, antimicrobials and anticancer drugs. Facing enormous chem. space and unclear design principles for small-mol. combinations, computational drug-combination design has not seen generative models to meet its potential to accelerate resistance-overcoming drug combination discovery. Results: We have developed the first deep generative model for drug combination design, by jointly embedding graph-structured domain knowledge and iteratively training a reinforcement learning-based chem. graph-set designer. First, we have developed hierarchical variational graph auto-encoders trained end-to-end to jointly embed gene-gene, gene-disease and disease-disease networks. Novel attentional pooling is introduced here for learning disease representations from assocd. genes' representations. Second, targeting diseases in learned representations, we have recast the drug-combination design problem as graph-set generation and developed a deep learning-based model with novel rewards. Specifically, besides chem. validity rewards, we have introduced novel generative adversarial award, being generalized sliced Wasserstein, for chem. diverse mols. with distributions similar to known drugs. We have also designed a network principle-based reward for disease-specific drug combinations. Numerical results indicate that, compared to state-of-the-art graph embedding methods, hierarchical variational graph auto-encoder learns more informative and generalizable disease representations. Results also show that the deep generative models generate drug combinations following the principle across diseases. Case studies on four diseases show that network-principled drug combinations tend to have low toxicity. The generated drug combinations collectively cover the disease module similar to FDA-approved drug combinations and could potentially suggest novel systems pharmacol. strategies. Ourmethod allows for examg. and following network-based principle or hypothesis to efficiently generate disease-specific drug combinations in a vast chem. combinatorial space.
- 93Griffiths, R.-R.; Hernández-Lobato, J. M. Constrained Bayesian optimization for auto- matic chemical design using variational autoencoders. Chem. Sci. 2020, 11, 577– 586, DOI: 10.1039/C9SC04026A[Crossref], [PubMed], [CAS], Google Scholar93https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXitFOis7bK&md5=628684b606d9d93ccf21b674438acd6aConstrained Bayesian optimization for automatic chemical design using variational autoencodersGriffiths, Ryan-Rhys; Hernandez-Lobato, Jose MiguelChemical Science (2020), 11 (2), 577-586CODEN: CSHCCN; ISSN:2041-6520. (Royal Society of Chemistry)Automatic Chem. Design is a framework for generating novel mols. with optimized properties. The original scheme, featuring Bayesian optimization over the latent space of a variational autoencoder, suffers from the pathol. that it tends to produce invalid mol. structures. First, we demonstrate empirically that this pathol. arises when the Bayesian optimization scheme queries latent space points far away from the data on which the variational autoencoder has been trained. Secondly, by reformulating the search procedure as a constrained Bayesian optimization problem, we show that the effects of this pathol. can be mitigated, yielding marked improvements in the validity of the generated mols. We posit that constrained Bayesian optimization is a good approach for solving this kind of training set mismatch in many generative tasks involving Bayesian optimization over the latent space of a variational autoencoder.
- 94Blaschke, T.; Olivecrona, M.; Engkvist, O.; Bajorath, J.; Chen, H. Application of Generative Autoencoder in De Novo Molecular Design. Mol. Inf. 2018, 37, 1700123, DOI: 10.1002/minf.201700123
- 95Kusner, M. J.; Paige, B.; Hernández-Lobato, J. M. Grammar variational autoencoder. Proc. 34th Int. Conf. Mach. Learn. 2017, 70, 1945– 1954Google ScholarThere is no corresponding record for this reference.
- 96Dai, H.; Tian, Y.; Dai, B.; Skiena, S.; Song, L. Syntax-directed variational autoencoder for structured data. arXiv (Machine Learning) , February 24, 2018, 1802.08786, ver 1.Google ScholarThere is no corresponding record for this reference.
- 97Jin, W.; Barzilay, R.; Jaakkola, T. Junction tree variational autoencoder for molecular graph generation. Proc. 35th Int. Conf. Mach. Learn. 2018, 50, 2323– 2332Google ScholarThere is no corresponding record for this reference.
- 98Samanta, B.; De, A.; Jana, G.; Chattaraj, P. K.; Ganguly, N.; Rodriguez, M. G. NeVAE: A Deep Generative Model for Molecular Graphs. Proceedings of the AAAI Conference on Artificial Intelligence 2019, 33, 1110– 1117, DOI: 10.1609/aaai.v33i01.33011110
- 99Bresson, X.; Laurent, T. A Two-Step Graph Convolutional Decoder for Molecule Generation. arXiv (Machine Learning) , June 15, 2019, 1906.03412, ver 2.Google ScholarThere is no corresponding record for this reference.
- 100Maziarka, L.; Pocha, A.; Kaczmarczyk, J.; Rataj, K.; Danel, T.; Warcho-l, M. Mol- CycleGAN: a generative model for molecular optimization. J. Cheminf. 2020, 12, 2, DOI: 10.1186/s13321-019-0404-1[Crossref], [CAS], Google Scholar100https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXotFarsg%253D%253D&md5=243c1aeef3e517ee774fcfabebf80260Mol-CycleGAN: a generative model for molecular optimizationMaziarka, Lukasz; Pocha, Agnieszka; Kaczmarczyk, Jan; Rataj, Krzysztof; Danel, Tomasz; Warchol, MichalJournal of Cheminformatics (2020), 12 (1), 2CODEN: JCOHB3; ISSN:1758-2946. (SpringerOpen)Designing a mol. with desired properties is one of the biggest challenges in drug development, as it requires optimization of chem. compd. structures with respect to many complex properties. To improve the compd. design process, we introduce Mol-CycleGAN-a CycleGAN-based model that generates optimized compds. with high structural similarity to the original ones. Namely, given a mol. our model generates a structurally similar one with an optimized value of the considered property. We evaluate the performance of the model on selected optimization objectives related to structural properties (presence of halogen groups, no. of arom. rings) and to a physicochem. property (penalized logP). In the task of optimization of penalized logP of drug-like mols. our model significantly outperforms previous results.
- 101Sattarov, B.; Baskin, I. I.; Horvath, D.; Marcou, G.; Bjerrum, E. J.; Varnek, A. De Novo Molecular Design by Combining Deep Deep Autoencoder Recurrent Neural Networks with Generative Topographic Mapping. J. Chem. Inf. Model. 2019, 59, 1182– 1196, DOI: 10.1021/acs.jcim.8b00751[ACS Full Text
], [CAS], Google Scholar
101https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXjtlCisbc%253D&md5=44d2f043cc64e112b2caa7851ed1eb4eDe Novo Molecular Design by Combining Deep Autoencoder Recurrent Neural Networks with Generative Topographic MappingSattarov, Boris; Baskin, Igor I.; Horvath, Dragos; Marcou, Gilles; Bjerrum, Esben Jannik; Varnek, AlexandreJournal of Chemical Information and Modeling (2019), 59 (3), 1182-1196CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Here we show that Generative Topog. Mapping (GTM) can be used to explore the latent space of the SMILES-based autoencoders and generate focused mol. libraries of interest. We have built a sequence-to-sequence neural network with Bidirectional Long Short-Term Memory layers and trained it on the SMILES strings from ChEMBL23. Very high reconstruction rates of the test set mols. were achieved (>98%), which are comparable to the ones reported in related publications. Using GTM, we have visualized the autoencoder latent space on the two-dimensional topog. map. Targeted map zones can be used for generating novel mol. structures by sampling assocd. latent space points and decoding them to SMILES. The sampling method based on a genetic algorithm was introduced to optimize compd. properties "on the fly". The generated focused mol. libraries were shown to contain original and a priori feasible compds. which, pending actual synthesis and testing, showed encouraging behavior in independent structure-based affinity estn. procedures (pharmacophore matching, docking). - 102Winter, R.; Montanari, F.; Steffen, A.; Briem, H.; Nóe, F.; Clevert, D.-A. Efficient multi-objective molecular optimization in a continuous latent space. Chem. Sci. 2019, 10, 8016– 8024, DOI: 10.1039/C9SC01928F[Crossref], [PubMed], [CAS], Google Scholar102https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXhtlaltrjO&md5=c63599dc8a30df8f805ce5457ecc053aEfficient multi-objective molecular optimization in a continuous latent spaceWinter, Robin; Montanari, Floriane; Steffen, Andreas; Briem, Hans; Noe, Frank; Clevert, Djork-ArneChemical Science (2019), 10 (34), 8016-8024CODEN: CSHCCN; ISSN:2041-6520. (Royal Society of Chemistry)One of the main challenges in small mol. drug discovery is finding novel chem. compds. with desirable properties. In this work, we propose a novel method that combines in silico prediction of mol. properties such as biol. activity or pharmacokinetics with an in silico optimization algorithm, namely Particle Swarm Optimization. Our method takes a starting compd. as input and proposes new mols. with more desirable (predicted) properties. It navigates a machine-learned continuous representation of a drug-like chem. space guided by a defined objective function. The objective function combines multiple in silico prediction models, defined desirability ranges and substructure constraints. We demonstrate that our proposed method is able to consistently find more desirable mols. for the studied tasks in relatively short time. We hope that our method can support medicinal chemists in accelerating and improving the lead optimization process.
- 103Chenthamarakshan, V.; Das, P.; Hoffman, C. S.; Strobelt, H.; Padhi, I.; Lim, W. K.; Hoover, B.; Manica, M.; Born, J.; Laino, T.; Mojsilovic, A. CogMol: Target-Specific and Selective Drug Design for COVID-19 Using Deep Generative Models. NeurIPS 2020 2020.Google ScholarThere is no corresponding record for this reference.
- 104Kotsias, P.-C.; Arús-Pous, J.; Chen, H.; Engkvist, O.; Tyrchan, C.; Bjerrum, E. J. Direct steering of de novo molecular generation with descriptor conditional recurrent neural networks. Nature Machine Intelligence 2020, 2, 254– 265, DOI: 10.1038/s42256-020-0174-5
- 105Shayakhmetov, R.; Kuznetsov, M.; Zhebrak, A.; Kadurin, A.; Nikolenko, S.; Aliper, A.; Polykovskiy, D. Molecular Generation for Desired Transcriptome Changes With Ad- versarial Autoencoders. Front. Pharmacol. 2020, 11, 269, DOI: 10.3389/fphar.2020.00269[Crossref], [PubMed], [CAS], Google Scholar105https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXhvFKksbbE&md5=36064731a6cb6395596e418d379f2782Molecular generation for desired transcriptome changes with adversarial autoencodersShayakhmetov, Rim; Kuznetsov, Maksim; Zhebrak, Alexander; Kadurin, Artur; Nikolenko, Sergey; Aliper, Alexander; Polykovskiy, DaniilFrontiers in Pharmacology (2020), 11 (), 00269CODEN: FPRHAU; ISSN:1663-9812. (Frontiers Media S.A.)Gene expression profiles are useful for assessing the efficacy and side effects of drugs. In this paper, we propose a new generative model that infers drug mols. that could induce a desired change in gene expression. Our model-the Bidirectional Adversarial Autoencoder-explicitly separates cellular processes captured in gene expression changes into two feature sets: those related and unrelated to the drug incubation. The model uses related features to produce a drug hypothesis. We have validated our model on the LINCS L1000 dataset by generating mol. structures in the SMILES format for the desired transcriptional response. In the expts., we have shown that the proposed model can generate novel mol. structures that could induce a given gene expression change or predict a gene expression difference after incubation of a given mol. structure.
- 106Ḿendez-Lucio, O.; Baillif, B.; Clevert, D.-A.; Rouquíe, D.; Wichard, J. De novo gener- ation of hit-like molecules from gene expression signatures using artificial intelligence. Nat. Commun. 2020, 11, 1– 10, DOI: 10.1038/s41467-019-13807-w
- 107Born, J.; Manica, M.; Oskooei, A.; Cadow, J.; Rodŕıguez Mart́ınez, M. PaccMannRL: Designing Anticancer Drugs From Transcriptomic Data via Reinforcement Learning. In Research in Computational Molecular Biology; Springer: Cham, 2020; pp 231– 233.
- 108Jin, W.; Yang, K.; Barzilay, R.; Jaakkola, T. Learning Multimodal Graph-to-Graph Translation for Molecular Optimization. arXiv (Machine Learning) , January 28, 2019, 1812.01070, ver. 3.Google ScholarThere is no corresponding record for this reference.
- 109Masuda, T.; Ragoza, M.; Koes, D. R. Generating 3D Molecular Structures Conditional on a Receptor Binding Site with Deep Generative Models. arXiv (Chemical Physics) , November 23, 2020, 2010.14442, ver. 3.Google ScholarThere is no corresponding record for this reference.
- 110Kang, S.; Cho, K. Conditional Molecular Design with Deep Generative Models. J. Chem. Inf. Model. 2019, 59, 43– 52, DOI: 10.1021/acs.jcim.8b00263[ACS Full Text
], [CAS], Google Scholar
110https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXhtlantb3N&md5=d2c3a3ff1f2189698828775e89c2a885Conditional Molecular Design with Deep Generative ModelsKang, Seokho; Cho, KyunghyunJournal of Chemical Information and Modeling (2019), 59 (1), 43-52CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Although machine learning has been successfully used to propose novel mols. that satisfy desired properties, it is still challenging to explore a large chem. space efficiently. In this paper, we present a conditional mol. design method that facilitates generating new mols. with desired properties. The proposed model, which simultaneously performs both property prediction and mol. generation, is built as a semisupervised variational autoencoder trained on a set of existing mols. with only a partial annotation. We generate new mols. with desired properties by sampling from the generative distribution estd. by the model. We demonstrate the effectiveness of the proposed model by evaluating it on drug-like mols. The model improves the performance of property prediction by exploiting unlabeled mols. and efficiently generates novel mols. fulfilling various target conditions. - 111Lim, J.; Hwang, S.-Y.; Moon, S.; Kim, S.; Kim, W. Y. Scaffold-based molecular design with a graph generative model. Chem. Sci. 2020, 11, 1153– 1164, DOI: 10.1039/C9SC04503A[Crossref], [CAS], Google Scholar111https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXit1Ortr%252FO&md5=be36a65abcce15f18b4c1e529bffd905Scaffold-based molecular design with a graph generative modelLim, Jaechang; Hwang, Sang-Yeon; Moon, Seokhyun; Kim, Seungsu; Kim, Woo YounChemical Science (2020), 11 (4), 1153-1164CODEN: CSHCCN; ISSN:2041-6520. (Royal Society of Chemistry)Searching for new mols. in areas like drug discovery often starts from the core structures of known mols. Such a method has called for a strategy of designing deriv. compds. retaining a particular scaffold as a substructure. On this account, our present work proposes a graph generative model that targets its use in scaffold-based mol. design. Our model accepts a mol. scaffold as input and extends it by sequentially adding atoms and bonds. The generated mols. are then guaranteed to contain the scaffold with certainty, and their properties can be controlled by conditioning the generation process on desired properties. The learned rule of extending mols. can well generalize to arbitrary kinds of scaffolds, including those unseen during learning. In the conditional generation of mols., our model can simultaneously control multiple chem. properties despite the search space constrained by fixing the substructure. As a demonstration, we applied our model to designing inhibitors of the epidermal growth factor receptor and show that our model can employ a simple semi-supervised extension to broaden its applicability to situations where only a small amt. of data is available.
- 112Varnek, A., Ed. Tutorials in chemoinformatics; John Wiley & Sons, Inc: Hoboken, NJ, 2017.
- 113Engel, T., Gasteiger, J., Eds. Applied chemoinformatics: achievements and future opportunities; Wiley-VCH: Weinheim, 2018; OCLC: 1034693178.
- 114Kadurin, A.; Aliper, A.; Kazennov, A.; Mamoshina, P.; Vanhaelen, Q.; Khrabrov, K.; Zhavoronkov, A. The cornucopia of meaningful leads: Applying deep adversarial au- toencoders for new molecule development in oncology. Oncotarget 2017, 8, 10883– 10890, DOI: 10.18632/oncotarget.14073[Crossref], [PubMed], [CAS], Google Scholar114https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC1c%252FpvFKruw%253D%253D&md5=677ef0264494eb8a7ef8c6584c1202abThe cornucopia of meaningful leads: Applying deep adversarial autoencoders for new molecule development in oncologyKadurin Artur; Khrabrov Kuzma; Kadurin Artur; Aliper Alexander; Kazennov Andrey; Mamoshina Polina; Vanhaelen Quentin; Zhavoronkov Alex; Kadurin Artur; Kadurin Artur; Kazennov Andrey; Zhavoronkov Alex; Mamoshina Polina; Zhavoronkov AlexOncotarget (2017), 8 (7), 10883-10890 ISSN:.Recent advances in deep learning and specifically in generative adversarial networks have demonstrated surprising results in generating new images and videos upon request even using natural language as input. In this paper we present the first application of generative adversarial autoencoders (AAE) for generating novel molecular fingerprints with a defined set of parameters. We developed a 7-layer AAE architecture with the latent middle layer serving as a discriminator. As an input and output the AAE uses a vector of binary fingerprints and concentration of the molecule. In the latent layer we also introduced a neuron responsible for growth inhibition percentage, which when negative indicates the reduction in the number of tumor cells after the treatment. To train the AAE we used the NCI-60 cell line assay data for 6252 compounds profiled on MCF-7 cell line. The output of the AAE was used to screen 72 million compounds in PubChem and select candidate molecules with potential anti-cancer properties. This approach is a proof of concept of an artificially-intelligent drug discovery engine, where AAEs are used to generate new molecular fingerprints with the desired molecular properties.
- 115Alpaydin, E. Introduction to machine learning, 2nd ed.; Adaptive computation and machine learning; MIT Press: Cambridge, Mass, 2010; OCLC: ocn317698631.Google ScholarThere is no corresponding record for this reference.
- 116Raschka, S. Python machine learning: unlock deeper insights into machine learning with this vital guide to cutting-edge predictive analytics; Community experience distilled; Packt Publishing Open Source: Birmingham, UK; Mumbai, 2016.Google ScholarThere is no corresponding record for this reference.
- 117Frazier, P. I. A Tutorial on Bayesian Optimization. arXiv (Machine Learning) , July 8, 2018, 1807.02811, ver. 1.Google ScholarThere is no corresponding record for this reference.
- 118Shahriari, B.; Swersky, K.; Wang, Z.; Adams, R. P.; De Freitas, N. Taking the human out of the loop: A review of Bayesian optimization. Proc. IEEE 2016, 104, 148– 175, DOI: 10.1109/JPROC.2015.2494218
- 119Das, P.; Sercu, T.; Wadhawan, K.; Padhi, I.; Gehrmann, S.; Cipcigan, F.; Chen- thamarakshan, V.; Strobelt, H.; Santos, C. D.; Chen, P.-Y.; Yang, Y. Y.; Tan, J.; Hedrick, J.; Crain, J.; Mojsilovic, A. Accelerating antimicrobial discovery with controllable deep generative models and molecular dynamics. arXiv (Machine Learning) , February 26, 2020, 2005.11248, ver. 2.Google ScholarThere is no corresponding record for this reference.
- 120Kingma, D. P.; Mohamed, S.; Rezende, D. J.; Welling, M. Semi-supervised learning with deep generative models. Adv. Neural Inf. Process. Syst. 2014, 3581– 3589Google ScholarThere is no corresponding record for this reference.
- 121Gao, W.; Coley, C. W. The synthesizability of molecules proposed by generative mod- els. J. Chem. Inf. Model. 2020, 60, 5714– 5723, DOI: 10.1021/acs.jcim.0c00174[ACS Full Text
], [CAS], Google Scholar
121https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXmsFKrurw%253D&md5=5f8ff43137b0489f41b5c58518bec270The Synthesizability of Molecules Proposed by Generative ModelsGao, Wenhao; Coley, Connor W.Journal of Chemical Information and Modeling (2020), 60 (12), 5714-5723CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)The discovery of functional mols. is an expensive and time-consuming process, exemplified by the rising costs of small mol. therapeutic discovery. One class of techniques of growing interest for early stage drug discovery is de novo mol. generation and optimization, catalyzed by the development of new deep learning approaches. These techniques can suggest novel mol. structures intended to maximize a multiobjective function, e.g., suitability as a therapeutic against a particular target, without relying on brute-force exploration of a chem. space. However, the utility of these approaches is stymied by ignorance of synthesizability. To highlight the severity of this issue, we use a data-driven computer-aided synthesis planning program to quantify how often mols. proposed by state-of-the-art generative models cannot be readily synthesized. Our anal. demonstrates that there are several tasks for which these models generate unrealistic mol. structures despite performing well on popular quant. benchmarks. Synthetic complexity heuristics can successfully bias generation toward synthetically tractable chem. space, although doing so necessarily detracts from the primary objective. This anal. suggests that to improve the utility of these models in real discovery workflows, new algorithm development is warranted. - 122Horwood, J.; Noutahi, E. Molecular Design in Synthetically Accessible Chemical Space via Deep Reinforcement Learning. ACS Omega 2020, 5, 32984– 32994, DOI: 10.1021/acsomega.0c04153[ACS Full Text
], [CAS], Google Scholar
122https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXisFOmtL3P&md5=1037d12603e7a52332c11e66e8885838Molecular Design in Synthetically Accessible Chemical Space via Deep Reinforcement LearningHorwood, Julien; Noutahi, EmmanuelACS Omega (2020), 5 (51), 32984-32994CODEN: ACSODF; ISSN:2470-1343. (American Chemical Society)The fundamental goal of generative drug design is to propose optimized mols. that meet predefined activity, selectivity, and pharmacokinetic criteria. Despite recent progress, we argue that existing generative methods are limited in their ability to favorably shift the distributions of mol. properties during optimization. We instead propose a novel Reinforcement Learning framework for mol. design in which an agent learns to directly optimize through a space of synthetically accessible drug-like mols. This becomes possible by defining transitions in our Markov decision process as chem. reactions and allows us to leverage synthetic routes as an inductive bias. We validate our method by demonstrating that it outperforms existing state-of-the-art approaches in the optimization of pharmacol. relevant objectives, while results on multi-objective optimization tasks suggest increased scalability to realistic pharmaceutical design problems. - 123Gottipati, S. K.; Sattarov, B.; Niu, S.; Pathak, Y.; Wei, H.; Liu, S.; Blackburn, S.; Thomas, K.; Coley, C.; Tang, J. Learning to navigate the synthetically accessible chemical space using reinforcement learning. Int. Conf. Mach. Learn. 2020, 3668– 3679Google ScholarThere is no corresponding record for this reference.
- 124Bradshaw, J.; Paige, B.; Kusner, M. J.; Segler, M.; Hernández-Lobato, J. M. Barking up the right tree: an approach to search over molecule synthesis DAGs. Adv. Neural Inf. Process. Syst. 2020, 6852– 6866Google ScholarThere is no corresponding record for this reference.
- 125Imrie, F.; Bradley, A. R.; van der Schaar, M.; Deane, C. M. Deep generative models for 3d linker design. J. Chem. Inf. Model. 2020, 60, 1983– 1995, DOI: 10.1021/acs.jcim.9b01120[ACS Full Text
], [CAS], Google Scholar
125https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXltlenurs%253D&md5=430c929761c5020d1013538e028ba1bcDeep Generative Models for 3D Linker DesignImrie, Fergus; Bradley, Anthony R.; van der Schaar, Mihaela; Deane, Charlotte M.Journal of Chemical Information and Modeling (2020), 60 (4), 1983-1995CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Rational compd. design remains a challenging problem for both computational methods and medicinal chemists. Computational generative methods have begun to show promising results for the design problem. However, they have not yet used the power of three-dimensional (3D) structural information. We have developed a novel graph-based deep generative model that combines state-of-the-art machine learning techniques with structural knowledge. Our method ("DeLinker") takes two fragments or partial structures and designs a mol. incorporating both. The generation process is protein-context-dependent, utilizing the relative distance and orientation between the partial structures. This 3D information is vital to successful compd. design, and we demonstrate its impact on the generation process and the limitations of omitting such information. In a large-scale evaluation, DeLinker designed 60% more mols. with high 3D similarity to the original mol. than a database baseline. When considering the more relevant problem of longer linkers with at least five atoms, the outperformance increased to 200%. We demonstrate the effectiveness and applicability of this approach on a diverse range of design problems: fragment linking, scaffold hopping, and proteolysis targeting chimera (PROTAC) design. As far as we are aware, this is the first mol. generative model to incorporate 3D structural information directly in the design process. The code is available at https://github.com/oxpig/DeLinker. - 126Yang, Y.; Zheng, S.; Su, S.; Zhao, C.; Xu, J.; Chen, H. SyntaLinker: automatic fragment linking with deep conditional transformer neural networks. Chem. Sci. 2020, 11, 8312– 8322, DOI: 10.1039/D0SC03126G[Crossref], [PubMed], [CAS], Google Scholar126https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXhsVagsrvK&md5=403cc7a09f96ecbe27afa3ce6bee51a2SyntaLinker: automatic fragment linking with deep conditional transformer neural networksYang, Yuyao; Zheng, Shuangjia; Su, Shimin; Zhao, Chao; Xu, Jun; Chen, HongmingChemical Science (2020), 11 (31), 8312-8322CODEN: CSHCCN; ISSN:2041-6520. (Royal Society of Chemistry)Linking fragments to generate a focused compd. library for a specific drug target is one of the challenges in fragment-based drug design (FBDD). Hereby, we propose a new program named SyntaLinker, which is based on a syntactic pattern recognition approach using deep conditional transformer neural networks. This state-of-the-art transformer can link mol. fragments automatically by learning from the knowledge of structures in medicinal chem. databases (e.g.ChEMBL database). Conventionally, linking mol. fragments was viewed as connecting substructures that were predefined by empirical rules. In SyntaLinker, however, the rules of linking fragments can be learned implicitly from known chem. structures by recognizing syntactic patterns embedded in SMILES notations. With deep conditional transformer neural networks, SyntaLinker can generate mol. structures based on a given pair of fragments and addnl. restrictions. Case studies have demonstrated the advantages and usefulness of SyntaLinker in FBDD.
- 127Tan, X. Automated design and optimization of multitarget schizophrenia drug candidates by deep learning. Eur. J. Med. Chem. 2020, 204, 112572, DOI: 10.1016/j.ejmech.2020.112572[Crossref], [PubMed], [CAS], Google Scholar127https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXhsVCksbnI&md5=dd01837fd3bdbb488e8781cea7d5103bAutomated design and optimization of multitarget schizophrenia drug candidates by deep learningTan, Xiaoqin; Jiang, Xiangrui; He, Yang; Zhong, Feisheng; Li, Xutong; Xiong, Zhaoping; Li, Zhaojun; Liu, Xiaohong; Cui, Chen; Zhao, Qingjie; Xie, Yuanchao; Yang, Feipu; Wu, Chunhui; Shen, Jingshan; Zheng, Mingyue; Wang, Zhen; Jiang, HualiangEuropean Journal of Medicinal Chemistry (2020), 204 (), 112572CODEN: EJMCA5; ISSN:0223-5234. (Elsevier Masson SAS)Complex neuropsychiatric diseases such as schizophrenia require drugs that can target multiple G protein-coupled receptors (GPCRs) to modulate complex neuropsychiatric functions. Here, we report an automated system comprising a deep recurrent neural network (RNN) and a multitask deep neural network (MTDNN) to design and optimize multitarget antipsychotic drugs. The system has successfully generated novel mol. structures with desired multiple target activities, among which high-ranking compd. 3 was synthesized, and demonstrated potent activities against dopamine D2, serotonin 5-HT1A and 5-HT2A receptors. Hit expansion based on the MTDNN was performed, 6 analogs of compd. 3 were evaluated exptl., among which compd. 8 not only exhibited specific polypharmacol. profiles but also showed antipsychotic effect in animal models with low potential for sedation and catalepsy, highlighting their suitability for further preclin. studies. The approach can be an efficient tool for designing lead compds. with multitarget profiles to achieve the desired efficacy in the treatment of complex neuropsychiatric diseases.
- 128Yang, Y.; Zhang, R.; Li, Z.; Mei, L.; Wan, S.; Ding, H.; Chen, Z.; Xing, J.; Feng, H.; Han, J.; Jiang, H.; Zheng, M.; Luo, C.; Zhou, B. Discovery of Highly Potent, Selec- tive, and Orally Efficacious p300/CBP Histone Acetyltransferases Inhibitors. J. Med. Chem. 2020, 63, 1337– 1360, DOI: 10.1021/acs.jmedchem.9b01721[ACS Full Text
], [CAS], Google Scholar
128https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXksVKnuw%253D%253D&md5=dd4b2562fd8451d2ce1cff0c3fe027faDiscovery of Highly Potent, Selective, and Orally Efficacious p300/CBP Histone Acetyltransferases InhibitorsYang, Yaxi; Zhang, Rukang; Li, Zhaojun; Mei, Lianghe; Wan, Shili; Ding, Hong; Chen, Zhifeng; Xing, Jing; Feng, Huijin; Han, Jie; Jiang, Hualiang; Zheng, Mingyue; Luo, Cheng; Zhou, BingJournal of Medicinal Chemistry (2020), 63 (3), 1337-1360CODEN: JMCMAR; ISSN:0022-2623. (American Chemical Society)P300 and CREB-binding protein (CBP) are ubiquitously expressed pleiotropic lysine acetyltransferases and play a key role as transcriptional co-activators that are essential for a multitude of cellular processes. Despite great importance, there is a lack of highly selective, potent, druglike p300/CBP inhibitors. Through the artificial-intelligence-assisted drug discovery pipeline and further optimization, we reported the discovery of novel, highly selective, potent small-mol. inhibitors of p300/CBP histone acetyltransferases (HAT) with desired druglike properties, exemplified by B026. Our data demonstrated that B026, with half maximal inhibitory concn. (IC50) values of 1.8 nM to p300 and 9.5 nM to CBP enzyme inhibitory activity, is the most potent, selective p300/CBP HAT inhibitor. Moreover, B026 achieves significant and dose-dependent tumor growth inhibition in an animal model of human cancer, suggesting that B026 is a highly promising p300/CBP HAT inhibitor and warrants extensive preclin. investigation as a potential clin. development candidate. - 129Grisoni, F.; Huisman, B.; Button, A.; Moret, M.; Atz, K.; Merk, D.; Schneider, G. Combining generative artificial intelligence and on-chip synthesis for de novo drug design. ChemRxiv , December 30, 2020, ver. 1. DOI: 10.26434/chemrxiv.13498587.v1 .
- 130Shaker, N.; Abou-Zleikha, M.; AlAmri, M.; Mehellou, Y. A Generative Deep Learning Approach for the Discovery of SARS CoV2 Protease Inhibitors. ChemRxiv , April 23, 2020, ver. 1. DOI: 10.26434/chemrxiv.12170337.v1 .
- 131Born, J.; Manica, M.; Cadow, J.; Markert, G.; Mill, N. A.; Filipavicius, M.; Mart́ınez, M. R. PaccMannRL on SARS-CoV-2: Designing antiviral candidates with conditional generative models. arXiv (Quantitative Methods) , July 6, 2020, 2005.13285, ver. 3.Google ScholarThere is no corresponding record for this reference.
- 132Hachmann, J.; Olivares-Amaya, R.; Atahan-Evrenk, S.; Amador-Bedolla, C.; Śanchez- Carrera, R. S.; Gold-Parker, A.; Vogt, L.; Brockway, A. M.; Aspuru-Guzik, A. The Harvard clean energy project: large-scale computational screening and design of or- ganic photovoltaics on the world community grid. J. Phys. Chem. Lett. 2011, 2, 2241– 2251, DOI: 10.1021/jz200866s[ACS Full Text
], [CAS], Google Scholar
132https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXhtVKht7rK&md5=68f7f2d3a8a5b5fe6cb2c3b677f444bfThe Harvard Clean Energy Project: Large-Scale Computational Screening and Design of Organic Photovoltaics on the World Community GridHachmann, Johannes; Olivares-Amaya, Roberto; Atahan-Evrenk, Sule; Amador-Bedolla, Carlos; Sanchez-Carrera, Roel S.; Gold-Parker, Aryeh; Vogt, Leslie; Brockway, Anna M.; Aspuru-Guzik, AlanJournal of Physical Chemistry Letters (2011), 2 (17), 2241-2251CODEN: JPCLCD; ISSN:1948-7185. (American Chemical Society)This Perspective introduces the Harvard Clean Energy Project (CEP), a theory-driven search for the next generation of org. solar cell materials. We give a broad overview of its setup and infrastructure, present first results, and outline upcoming developments. CEP has established an automated, high-throughput, in silico framework to study potential candidate structures for org. photovoltaics. The current project phase is concerned with the characterization of millions of mol. motifs using first-principles quantum chem. The scale of this study requires a correspondingly large computational resource, which is provided by distributed volunteer computing on IBM's World Community Grid. The results are compiled and analyzed in a ref. database and will be made available for public use. In addn. to finding specific candidates with certain properties, it is the goal of CEP to illuminate and understand the structure-property relations in the domain of org. electronics. Such insights can open the door to a rational and systematic design of future high-performance materials. The computational work in CEP is tightly embedded in a collaboration with experimentalists, who provide valuable input and feedback to the project. - 133Jørgensen, P. B.; Mesta, M.; Shil, S.; Lastra, J. M. G.; Wedel, K.; Thygesen, K. S.; Schmidt, M. N. Machine learning-based screening of complex molecules for polymer solar cells. J. Chem. Phys. 2018, 148, 241735, DOI: 10.1063/1.5023563[Crossref], [PubMed], [CAS], Google Scholar133https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BB3c%252FjsFOhuw%253D%253D&md5=50686fea495291fa2f75ac7b3ed3a95eMachine learning-based screening of complex molecules for polymer solar cellsJorgensen Peter Bjorn; Schmidt Mikkel N; Mesta Murat; Garcia Lastra Juan Maria; Shil Suranjan; Jacobsen Karsten Wedel; Thygesen Kristian SommerThe Journal of chemical physics (2018), 148 (24), 241735 ISSN:.Polymer solar cells admit numerous potential advantages including low energy payback time and scalable high-speed manufacturing, but the power conversion efficiency is currently lower than for their inorganic counterparts. In a Phenyl-C_61-Butyric-Acid-Methyl-Ester (PCBM)-based blended polymer solar cell, the optical gap of the polymer and the energetic alignment of the lowest unoccupied molecular orbital (LUMO) of the polymer and the PCBM are crucial for the device efficiency. Searching for new and better materials for polymer solar cells is a computationally costly affair using density functional theory (DFT) calculations. In this work, we propose a screening procedure using a simple string representation for a promising class of donor-acceptor polymers in conjunction with a grammar variational autoencoder. The model is trained on a dataset of 3989 monomers obtained from DFT calculations and is able to predict LUMO and the lowest optical transition energy for unseen molecules with mean absolute errors of 43 and 74 meV, respectively, without knowledge of the atomic positions. We demonstrate the merit of the model for generating new molecules with the desired LUMO and optical gap energies which increases the chance of finding suitable polymers by more than a factor of five in comparison to the randomised search used in gathering the training set.
- 134Yuan, Q.; Santana-Bonilla, A.; Zwijnenburg, M. A.; Jelfs, K. E. Molecular generation targeting desired electronic properties via deep generative models. Nanoscale 2020, 12, 6744– 6758, DOI: 10.1039/C9NR10687A[Crossref], [PubMed], [CAS], Google Scholar134https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXksFKktro%253D&md5=3efbe2d03b744fd333c0423f441e60d7Molecular generation targeting desired electronic properties via deep generative modelsYuan, Qi; Santana-Bonilla, Alejandro; Zwijnenburg, Martijn A.; Jelfs, Kim E.Nanoscale (2020), 12 (12), 6744-6758CODEN: NANOHL; ISSN:2040-3372. (Royal Society of Chemistry)As we seek to discover new functional materials, we need ways to explore the vast chem. space of precursor building blocks, not only generating large nos. of possible building blocks to investigate, but trying to find non-obvious options, that we might not suggest by chem. experience alone. Artificial intelligence techniques provide a possible avenue to generate large nos. of org. building blocks for functional materials, and can even do so from very small initial libraries of known building blocks. Specifically, we demonstrate the application of deep recurrent neural networks for the exploration of the chem. space of building blocks for a test case of donor-acceptor oligomers with specific electronic properties. The recurrent neural network learned how to produce novel donor-acceptor oligomers by trading off between selected at. substitutions, such as halogenation or methylation, and mol. features such as the oligomer's size. The electronic and structural properties of the generated oligomers can be tuned by sampling from different subsets of the training database, which enabled us to enrich the library of donor-acceptors towards desired properties. We generated approx. 1700 new donor-acceptor oligomers with a recurrent neural network tuned to target oligomers with a HOMO-LUMO gap <2 eV and a dipole moment <2 Debye, which could have potential application in org. photovoltaics.
- 135Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, L.; Polosukhin, I. Attention is all you need. arXiv (Computation and Language) , December 6, 2017, 1706.03762, ver. 5.Google ScholarThere is no corresponding record for this reference.
- 136Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv (Computation and Language) , May 24, 2019, 1810.04805, ver. 2.Google ScholarThere is no corresponding record for this reference.
- 137Brown, T. B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Nee- lakantan, A.; Shyam, P.; Sastry, G.; Askell, A., Language models are few-shot learners. arXiv (Computation and Language) , July 22, 2020, 2005.14165, ver. 4.Google ScholarThere is no corresponding record for this reference.
- 138Grechishnikova, D. Transformer neural network for protein-specific de novo drug gen- eration as a machine translation problem. Sci. Rep. 2021, 11, 1– 13, DOI: 10.1038/s41598-020-79682-4
- 139Zheng, S.; Lei, Z.; Ai, H.; Chen, H.; Deng, D.; Yang, Y. Deep Scaffold Hopping with Multi-modal Transformer Neural Networks. ChemRxiv , September 28, 2020, ver. 1. DOI: 10.26434/chemrxiv.13011767.v1 .
Cited By
This article is cited by 32 publications.
- Daiki Erikawa, Nobuaki Yasuo, Takamasa Suzuki, Shogo Nakamura, Masakazu Sekijima. Gargoyles: An Open Source Graph-Based Molecular Optimization Method Based on Deep Reinforcement Learning. ACS Omega 2023, Article ASAP.
- Giuseppe Lamanna, Pietro Delre, Gilles Marcou, Michele Saviano, Alexandre Varnek, Dragos Horvath, Giuseppe Felice Mangiatordi. GENERA: A Combined Genetic/Deep-Learning Algorithm for Multiobjective Target-Oriented De Novo Design. Journal of Chemical Information and Modeling 2023, 63 (16) , 5107-5119. https://doi.org/10.1021/acs.jcim.3c00963
- Xu Qian, Xiaowen Dai, Lin Luo, Mingde Lin, Yuan Xu, Yang Zhao, Dingfang Huang, Haodi Qiu, Li Liang, Haichun Liu, Yingbo Liu, Lingxi Gu, Tao Lu, Yadong Chen, Yanmin Zhang. An Interpretable Multitask Framework BiLAT Enables Accurate Prediction of Cyclin-Dependent Protein Kinase Inhibitors. Journal of Chemical Information and Modeling 2023, 63 (11) , 3350-3368. https://doi.org/10.1021/acs.jcim.3c00473
- Tobiasz Ciepliński, Tomasz Danel, Sabina Podlewska, Stanisław Jastrzȩbski. Generative Models Should at Least Be Able to Design Molecules That Dock Well: A New Benchmark. Journal of Chemical Information and Modeling 2023, 63 (11) , 3238-3247. https://doi.org/10.1021/acs.jcim.2c01355
- Yiming Wang, Kathleen J. Stebe, Cesar de la Fuente-Nunez, Ravi Radhakrishnan. Computational Design of Peptides for Biomaterials Applications. ACS Applied Bio Materials 2023, Article ASAP.
- William Bort, Daniyar Mazitov, Dragos Horvath, Fanny Bonachera, Arkadii Lin, Gilles Marcou, Igor Baskin, Timur Madzhidov, Alexandre Varnek. Inverse QSAR: Reversing Descriptor-Driven Prediction Pipeline Using Attention-Based Conditional Variational Autoencoder. Journal of Chemical Information and Modeling 2022, 62 (22) , 5471-5484. https://doi.org/10.1021/acs.jcim.2c01086
- Hanna Türk, Elisabetta Landini, Christian Kunkel, Johannes T. Margraf, Karsten Reuter. Assessing Deep Generative Models in Chemical Composition Space. Chemistry of Materials 2022, 34 (21) , 9455-9467. https://doi.org/10.1021/acs.chemmater.2c01860
- Chuan Li, Chenghui Wang, Ming Sun, Yan Zeng, Yuan Yuan, Qiaolin Gou, Guangchuan Wang, Yanzhi Guo, Xuemei Pu. Correlated RNN Framework to Quickly Generate Molecules with Desired Properties for Energetic Materials in the Low Data Regime. Journal of Chemical Information and Modeling 2022, 62 (20) , 4873-4887. https://doi.org/10.1021/acs.jcim.2c00997
- Bin Xi, Kin Fai Tse, Tsz Fung Kok, Ho Ming Chan, Man Kit Chan, Ho Yin Chan, Kwan Yue Clinton Wong, Shing Hei Robin Yuen, Junyi Zhu. Machine-Learning-Assisted Acceleration on High-Symmetry Materials Search: Space Group Predictions from Band Structures. The Journal of Physical Chemistry C 2022, 126 (29) , 12264-12273. https://doi.org/10.1021/acs.jpcc.2c03156
- Weixin Xie, Fanhao Wang, Yibo Li, Luhua Lai, Jianfeng Pei. Advances and Challenges in De Novo Drug Design Using Three-Dimensional Deep Generative Models. Journal of Chemical Information and Modeling 2022, 62 (10) , 2269-2279. https://doi.org/10.1021/acs.jcim.2c00042
- Teresa Maria Creanza, Giuseppe Lamanna, Pietro Delre, Marialessandra Contino, Nicola Corriero, Michele Saviano, Giuseppe Felice Mangiatordi, Nicola Ancona. DeLA-Drug: A Deep Learning Algorithm for Automated Design of Druglike Analogues. Journal of Chemical Information and Modeling 2022, 62 (6) , 1411-1424. https://doi.org/10.1021/acs.jcim.2c00205
- Ana L. Chávez-Hernández, José L. Medina-Franco. Natural products subsets: Generation and characterization. Artificial Intelligence in the Life Sciences 2023, 3 , 100066. https://doi.org/10.1016/j.ailsci.2023.100066
- Jonghwan Choi, Sangmin Seo, Sanghyun Park. COMA: efficient structure-constrained molecular generation using contractive and margin losses. Journal of Cheminformatics 2023, 15 (1) https://doi.org/10.1186/s13321-023-00679-y
- Linde Schoenmaker, Olivier J. M. Béquignon, Willem Jespers, Gerard J. P. van Westen. UnCorrupt SMILES: a novel approach to de novo design. Journal of Cheminformatics 2023, 15 (1) https://doi.org/10.1186/s13321-023-00696-x
- Morgan Thomas, Andreas Bender, Chris de Graaf. Integrating structure-based approaches in generative molecular design. Current Opinion in Structural Biology 2023, 79 , 102559. https://doi.org/10.1016/j.sbi.2023.102559
- Felix Potlitz, Andreas Link, Lukas Schulig. Advances in the discovery of new chemotypes through ultra-large library docking. Expert Opinion on Drug Discovery 2023, 18 (3) , 303-313. https://doi.org/10.1080/17460441.2023.2171984
- Alex Sebastião Constâncio, Denise Fukumi Tsunoda, Helena de Fátima Nunes Silva, Jocelaine Martins da Silveira, Deborah Ribeiro Carvalho, . Deception detection with machine learning: A systematic review and statistical analysis. PLOS ONE 2023, 18 (2) , e0281323. https://doi.org/10.1371/journal.pone.0281323
- Tomasz Danel, Jan Łęski, Sabina Podlewska, Igor T. Podolak. Docking-based generative approaches in the search for new drug candidates. Drug Discovery Today 2023, 28 (2) , 103439. https://doi.org/10.1016/j.drudis.2022.103439
- Mher Matevosyan, Vardan Harutyunyan, Narek Abelyan, Hamlet Khachatryan, Irina Tirosyan, Yeva Gabrielyan, Valter Sahakyan, Smbat Gevorgyan, Vahram Arakelov, Grigor Arakelov, Hovakim Zakaryan. Design of new chemical entities targeting both native and H275Y mutant influenza a virus by deep reinforcement learning. Journal of Biomolecular Structure and Dynamics 2022, 7819 , 1-15. https://doi.org/10.1080/07391102.2022.2158936
- Yueshan Li, Liting Zhang, Yifei Wang, Jun Zou, Ruicheng Yang, Xinling Luo, Chengyong Wu, Wei Yang, Chenyu Tian, Haixing Xu, Falu Wang, Xin Yang, Linli Li, Shengyong Yang. Generative deep learning enables the discovery of a potent and selective RIPK1 inhibitor. Nature Communications 2022, 13 (1) https://doi.org/10.1038/s41467-022-34692-w
- Parinaz Naseri, George Goussetis, Nelson J. G. Fonseca, Sean V. Hum. Synthesis of multi-band reflective polarizing metasurfaces using a generative adversarial network. Scientific Reports 2022, 12 (1) https://doi.org/10.1038/s41598-022-20851-y
- Mateusz K. Bieniek, Ben Cree, Rachael Pirie, Joshua T. Horton, Natalie J. Tatum, Daniel J. Cole. An open-source molecular builder and free energy preparation workflow. Communications Chemistry 2022, 5 (1) https://doi.org/10.1038/s42004-022-00754-9
- Lucian Chan, Rajendra Kumar, Marcel Verdonk, Carl Poelking. A multilevel generative framework with hierarchical self-contrasting for bias control and transparency in structure-based ligand design. Nature Machine Intelligence 2022, 4 (12) , 1130-1142. https://doi.org/10.1038/s42256-022-00564-7
- Chong Lu, Shien Liu, Weihua Shi, Jun Yu, Zhou Zhou, Xiaoxiao Zhang, Xiaoli Lu, Faji Cai, Ning Xia, Yikai Wang. Systemic evolutionary chemical space exploration for drug discovery. Journal of Cheminformatics 2022, 14 (1) https://doi.org/10.1186/s13321-022-00598-4
- Anthony Hughes, David Winkler, James Carr, P. Lee, Y. Yang, Majid Laleh, Mike Tan. Corrosion Inhibition, Inhibitor Environments, and the Role of Machine Learning. Corrosion and Materials Degradation 2022, 3 (4) , 672-693. https://doi.org/10.3390/cmd3040037
- Matthias Unterhuber, Karl-Patrik Kresoja, Philipp Lurz, Holger Thiele. Artificial intelligence in proteomics: new frontiers from risk prediction to treatment?. European Heart Journal 2022, 43 (43) , 4525-4527. https://doi.org/10.1093/eurheartj/ehac391
- Yiming Ma, Yue Niu, Huaiyu Yang, Jiayu Dai, Jiawei Lin, Huiqi Wang, Songgu Wu, Qiuxiang Yin, Ling Zhou, Junbo Gong. Prediction and design of cyclodextrin inclusion complexes formation via machine learning-based strategies. Chemical Engineering Science 2022, 261 , 117946. https://doi.org/10.1016/j.ces.2022.117946
- Keerthi Krishnan, Ryan Kassab, Steve Agajanian, Gennady Verkhivker. Interpretable Machine Learning Models for Molecular Design of Tyrosine Kinase Inhibitors Using Variational Autoencoders and Perturbation-Based Approach of Chemical Space Exploration. International Journal of Molecular Sciences 2022, 23 (19) , 11262. https://doi.org/10.3390/ijms231911262
- Kailasam N. Vennila, Kuppanagounder P. Elango. Multimodal generative neural networks and molecular dynamics based identification of PDK1 PIF-pocket modulators. Molecular Systems Design & Engineering 2022, 7 (9) , 1085-1092. https://doi.org/10.1039/D2ME00051B
- Wenfei Fan, Ruochun Jin, Ping Lu, Chao Tian, Ruiqi Xu. Towards event prediction in temporal graphs. Proceedings of the VLDB Endowment 2022, 15 (9) , 1861-1874. https://doi.org/10.14778/3538598.3538608
- Jia-Shun Cao, Run-Ze Xu, Jing-Yang Luo, Qian Feng, Fang Fang. Rapid quantification of intracellular polyhydroxyalkanoates via fluorescence techniques: A critical review. Bioresource Technology 2022, 350 , 126906. https://doi.org/10.1016/j.biortech.2022.126906
- Shuheng Huang, Hu Mei, Laichun Lu, Minyao Qiu, Xiaoqi Liang, Lei Xu, Zuyin Kuang, Yu Heng, Xianchao Pan. De Novo Molecular Design of Caspase-6 Inhibitors by a GRU-Based Recurrent Neural Network Combined with a Transfer Learning Approach. Pharmaceuticals 2021, 14 (12) , 1249. https://doi.org/10.3390/ph14121249
Abstract
Figure 1
Figure 1. Acetaminophen (center) under various molecular representations. Top-left: Sequence based representations. Prior to being fed to the models, these sequences are also usually one-hot encoded. Top-right: Graph-based representations. While connection matrices are a suitable input for standard architectures, graphs can also be directly handled using graph neural networks. Bottom: Three dimensional representations, images from PubChem. (26) Graphs may be enhanced by including 3D information as node attributes, such as internal distances and angles, or based on a coordinate system such as Cartesian space. Molecular surfaces can be voxelized into a 3D grid for easier processing.
Figure 2
Figure 2. Top-left: Three layer Recurrent Neural Network (RNN) both rolled and unrolled. In each layer, the output of a step, besides flowing to the next layer, also flows to the next step of the layer itself. These recurrent connections are depicted in the unfolded view of the network as vertical arrows. Top-right: Variational Autoencoder (VAE) where the input is encoded to the parameters of a statistical distribution, namely, the means (μ) and standard deviation (σ). In practice, these correspond to two vectors which, on the sampling step, are interpreted as a set of means and standard deviations. Bottom-left: Generative Adversarial Network (GAN) composed by a generator and a discriminator. Training seeks not a minimum but a useful equilibrium between the generator and the discriminator. Bottom-right: Adversarial Autoencoder (AAE) where the attached discriminator must discern between encoded points and samples drawn from a prior statistical distribution.
Figure 3
Figure 3. Three layer RNN, unfolded over four time-steps. In autoregressive sequence generation, the process is started with a special start token, here “G”. The model then predicts the next token, which is sampled and used as input for the next step. Generation ends when a stop token is predicted.
Figure 4
Figure 4. Left: In sequential graph generation, a graph is built by evaluating a current partial graph, adding a node/edge and repeating until the network outputs a stop signal. Right: In the one-shot generation of graphs, probabilities over the full adjacency matrix and node/edge attribute tensors are produced. The graph is then obtained by taking a sample or the argmax of these outputs.
Figure 5
Figure 5. Left: General procedure for the generation of 3D shapes as proposed by Skalic et al. (72) The convolutional decoder of a VAE is used to produce a 3D molecular shape which is converted to SMILES by a captioning network. Right: General process for generating molecules as 3D point sets, proposed by Gebauer et al. (73) It is conceptually similar to the sequential graph generation, operating on point sets with an internal coordinate system.
Figure 6
Figure 6. In transfer learning, a general model is first trained on a large data set and then fine-tuned toward generating the desired properties with a smaller, focused, data set.
Figure 7
Figure 7. Top: The model is first pretrained through maximum likelihood estimation, learning the structure of the output space along with general chemical rules. Then, using RL, the model is optimized for specific properties such as binding affinity or solubility. While similar in concept to transfer learning, the use of RL allows one to bias the model toward a wider range of objectives. Bottom: Directed generation with RL and GAN. This method leverages adversarial training to produce feasible molecules and RL to bias the generation toward desired properties.
Figure 8
Figure 8. Here, the latent space of an AE is used as a reversible and continuous molecular representation allowing for the application of various optimization algorithms.
Figure 9
Figure 9. Top: In conditioned generation, the desired properties are introduced as explicit inputs to the model. These properties are precomputed for each compound of the training set and used during training to induce a correlation between the two. This correlation is then leveraged during the generation process to target specific property values. Bottom: In the semisupervised case of conditioned generation, only part of the training set has the desired properties available. To overcome this, a predictor network is trained on the labeled instances and used to predict the properties of unlabeled ones.
References
ARTICLE SECTIONSThis article references 139 other publications.
- 1Polishchuk, P. G.; Madzhidov, T. I.; Varnek, A. Estimation of the size of drug-like chemical space based on GDB-17 data. J. Comput.-Aided Mol. Des. 2013, 27, 675, DOI: 10.1007/s10822-013-9672-4[Crossref], [PubMed], [CAS], Google Scholar1https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXhtlWjtbjM&md5=80a95002031f04319351c7e342bc1a55Estimation of the size of drug-like chemical space based on GDB-17 dataPolishchuk, P. G.; Madzhidov, T. I.; Varnek, A.Journal of Computer-Aided Molecular Design (2013), 27 (8), 675-679CODEN: JCADEQ; ISSN:0920-654X. (Springer)The goal of this paper is to est. the no. of realistic drug-like mols. which could ever be synthesized. Unlike previous studies based on exhaustive enumeration of mol. graphs or on combinatorial enumeration preselected fragments, we used results of constrained graphs enumeration by Reymond to establish a correlation between the no. of generated structures (M) and the no. of heavy atoms (N): logM = 0.584 × N × logN + 0.356. The no. of atoms limiting drug-like chem. space of mols. which follow Lipinsky's rules (N = 36) has been obtained from the anal. of the PubChem database. This results in M ≈ 1033 which is in between the nos. estd. by Ertl (1023) and by Bohacek (1060).
- 2Schneider, G. Automating drug discovery. Nat. Rev. Drug Discovery 2018, 17, 97– 113, DOI: 10.1038/nrd.2017.232[Crossref], [PubMed], [CAS], Google Scholar2https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhvFOntbbK&md5=4fc1e441a0a31d7421642eb91fbac8c7Automating drug discoverySchneider, GisbertNature Reviews Drug Discovery (2018), 17 (2), 97-113CODEN: NRDDAG; ISSN:1474-1776. (Nature Research)A review. Small mol. drug discovery can be viewed as a challenging multidimensional problem in which various characteristics of compds. including efficacy, pharmacokinetics and safety need to be optimized in parallel to provide drug candidates. Recent advances in areas such as microfluidics-assisted chem. synthesis and biol. testing, as well as artificial intelligence systems that improve a design hypothesis through feedback anal., are now providing a basis for the introduction of greater automation into aspects of this process. This could potentially accelerate time frames for compd. discovery and optimization and enable more effective searches of chem. space. However, such approaches also raise considerable conceptual, tech. and organizational challenges, as well as scepticism about the current hype around them. This article aims to identify the approaches and technologies that could be implemented robustly by medicinal chemists in the near future and to critically analyze the opportunities and challenges for their more widespread application.
- 3DiMasi, J. A.; Grabowski, H. G.; Hansen, R. W. Innovation in the pharmaceutical industry: New estimates of R&D costs. Journal of Health Economics 2016, 47, 20– 33, DOI: 10.1016/j.jhealeco.2016.01.012[Crossref], [PubMed], [CAS], Google Scholar3https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC28jls1egsw%253D%253D&md5=a78649853406ddbfe54abb0052ca964bInnovation in the pharmaceutical industry: New estimates of R&D costsDiMasi Joseph A; Grabowski Henry G; Hansen Ronald WJournal of health economics (2016), 47 (), 20-33 ISSN:.The research and development costs of 106 randomly selected new drugs were obtained from a survey of 10 pharmaceutical firms. These data were used to estimate the average pre-tax cost of new drug and biologics development. The costs of compounds abandoned during testing were linked to the costs of compounds that obtained marketing approval. The estimated average out-of-pocket cost per approved new compound is $1395 million (2013 dollars). Capitalizing out-of-pocket costs to the point of marketing approval at a real discount rate of 10.5% yields a total pre-approval cost estimate of $2558 million (2013 dollars). When compared to the results of the previous study in this series, total capitalized costs were shown to have increased at an annual rate of 8.5% above general price inflation. Adding an estimate of post-approval R&D costs increases the cost estimate to $2870 million (2013 dollars).
- 4Ruddigkeit, L.; van Deursen, R.; Blum, L. C.; Reymond, J.-L. Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17. J. Chem. Inf. Model. 2012, 52, 2864– 2875, DOI: 10.1021/ci300415d[ACS Full Text
], [CAS], Google Scholar
4https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XhsFClsL3J&md5=d0bf9a29f3e9ae1e57bb1c953a562cedEnumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17Ruddigkeit, Lars; van Deursen, Ruud; Blum, Lorenz C.; Reymond, Jean-LouisJournal of Chemical Information and Modeling (2012), 52 (11), 2864-2875CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Drug mols. consist of a few tens of atoms connected by covalent bonds. How many such mols. are possible in total and what is their structure. This question is of pressing interest in medicinal chem. to help solve the problems of drug potency, selectivity, and toxicity and reduce attrition rates by pointing to new mol. series. To better define the unknown chem. space, we have enumerated 166.4 billion mols. of up to 17 atoms of C, N, O, S, and halogens forming the chem. universe database GDB-17, covering a size range contg. many drugs and typical for lead compds. GDB-17 contains millions of isomers of known drugs, including analogs with high shape similarity to the parent drug. Compared to known mols. in PubChem, GDB-17 mols. are much richer in nonarom. heterocycles, quaternary centers, and stereoisomers, densely populate the third dimension in shape space, and represent many more scaffold types. - 5Walters, W. P. Virtual Chemical Libraries: Miniperspective. J. Med. Chem. 2019, 62, 1116– 1124, DOI: 10.1021/acs.jmedchem.8b01048[ACS Full Text
], [CAS], Google Scholar
5https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXhsFyqtbnM&md5=ede412935f56e350915f35e79f1b969fVirtual Chemical LibrariesWalters, W. PatrickJournal of Medicinal Chemistry (2019), 62 (3), 1116-1124CODEN: JMCMAR; ISSN:0022-2623. (American Chemical Society)A review. Advances in computer processing speed and storage capacity have enabled researchers to generate virtual chem. libraries contg. billions of mols. While these nos. appear large, they are only a small fraction of the no. of org. mols. that could potentially be synthesized. This review provides an overview of recent advances in the generation and use of virtual chem. libraries in medicinal chem. The authors also consider the practical implications of these libraries in drug discovery programs and highlight a no. of current and future challenges. - 6Hartenfeller, M.; Zettl, H.; Walter, M.; Rupp, M.; Reisen, F.; Proschak, E.; Weggen, S.; Stark, H.; Schneider, G. DOGS: reaction-driven de novo design of bioactive com- pounds. PLoS Comput. Biol. 2012, 8, e1002380 DOI: 10.1371/journal.pcbi.1002380[Crossref], [PubMed], [CAS], Google Scholar6https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38Xjt1Sjtr0%253D&md5=cdcf6905d52761614c33ab068b608ac5DOGS: reaction-driven de novo design of bioactive compoundsHartenfeller, Markus; Zettl, Heiko; Walter, Miriam; Rupp, Matthias; Reisen, Felix; Proschak, Ewgenij; Weggen, Sascha; Stark, Holger; Schneider, GisbertPLoS Computational Biology (2012), 8 (2), e1002380CODEN: PCBLBG; ISSN:1553-7358. (Public Library of Science)We present a computational method for the reaction-based de novo design of drug-like mols. The software DOGS (Design of Genuine Structures) features a ligand-based strategy for automated 'in silico' assembly of potentially novel bioactive compds. The quality of the designed compds. is assessed by a graph kernel method measuring their similarity to known bioactive ref. ligands in terms of structural and pharmacophoric features. We implemented a deterministic compd. construction procedure that explicitly considers compd. synthesizability, based on a compilation of 25'144 readily available synthetic building blocks and 58 established reaction principles. This enables the software to suggest a synthesis route for each designed compd. Two prospective case studies are presented together with details on the algorithm and its implementation. De novo designed ligand candidates for the human histamine H4 receptor and γ-secretase were synthesized as suggested by the software. The computational approach proved to be suitable for scaffold-hopping from known ligands to novel chemotypes and for generating bioactive mols. with drug-like properties.
- 7Spiegel, J.; Durrant, J. AutoGrow4: An open-source genetic algorithm for de novo drug design and lead optimization. J. Cheminf. 2020, 12, 25, DOI: 10.1186/s13321-020-00429-4[Crossref], [CAS], Google Scholar7https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXnsVWksrs%253D&md5=10cb3d13c137b5e93fc3f92568edba22AutoGrow4: an open-source genetic algorithm for de novo drug design and lead optimizationSpiegel, Jacob O.; Durrant, Jacob D.Journal of Cheminformatics (2020), 12 (1), 25CODEN: JCOHB3; ISSN:1758-2946. (SpringerOpen)Abstr.: We here present AutoGrow4, an open-source program for semi-automated computer-aided drug discovery. AutoGrow4 uses a genetic algorithm to evolve predicted ligands on demand and so is not limited to a virtual library of pre-enumerated compds. It is a useful tool for generating entirely novel drug-like mols. and for optimizing preexisting ligands. By leveraging recent computational and cheminformatics advancements, AutoGrow4 is faster, more stable, and more modular than previous versions. It implements new docking-program compatibility, chem. filters, multithreading options, and selection methods to support a wide range of user needs. To illustrate both de novo design and lead optimization, we here apply AutoGrow4 to the catalytic domain of poly(ADP-ribose) polymerase 1 (PARP-1), a well characterized DNA-damage-recognition protein. AutoGrow4 produces drug-like compds. with better predicted binding affinities than FDA-approved PARP-1 inhibitors (pos. controls). The predicted binding modes of the AutoGrow4 compds. mimic those of the known inhibitors, even when AutoGrow4 is seeded with random small mols. AutoGrow4 is available under the terms of the Apache License, Version 2.0. A copy can be downloaded free of charge from <a><a><a> not available: see fulltext].
- 8Jensen, J. H. A graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical space. Chem. Sci. 2019, 10, 3567– 3572, DOI: 10.1039/C8SC05372C[Crossref], [PubMed], [CAS], Google Scholar8https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXislClt78%253D&md5=108c19e322025e736330203e0d312237A graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical spaceJensen, Jan H.Chemical Science (2019), 10 (12), 3567-3572CODEN: CSHCCN; ISSN:2041-6520. (Royal Society of Chemistry)This paper presents a comparison of a graph-based genetic algorithm (GB-GA) and machine learning (ML) results for the optimization of log P values with a constraint for synthetic accessibility and shows that the GA is as good as or better than the ML approaches for this particular property. The mols. found by the GB-GA bear little resemblance to the mols. used to construct the initial mating pool, indicating that the GB-GA approach can traverse a relatively large distance in chem. space using relatively few (50) generations. The paper also introduces a new non-ML graph-based generative model (GB-GM) that can be parameterized using very small data sets and combined with a Monte Carlo tree search (MCTS) algorithm. The results are comparable to previously published results (Sci. Technol. Adv. Mater., 2017, 18, 972-976) using a recurrent neural network (RNN) generative model, and the GB-GM-based method is several orders of magnitude faster. The MCTS results seem more dependent on the compn. of the training set than the GA approach for this particular property. Our results suggest that the performance of new ML-based generative models should be compared to that of more traditional, and often simpler, approaches such a GA.
- 9Yoshikawa, N.; Terayama, K.; Sumita, M.; Homma, T.; Oono, K.; Tsuda, K. Population-based De Novo Molecule Generation, Using Grammatical Evolution. Chem. Lett. 2018, 47, 1431– 1434, DOI: 10.1246/cl.180665[Crossref], [CAS], Google Scholar9https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXitlKnsLvP&md5=ccb65d8a0d2bf9b350b4607ef2e70f67Population-based de novo molecule generation, using grammatical evolutionYoshikawa, Naruki; Terayama, Kei; Sumita, Masato; Homma, Teruki; Oono, Kenta; Tsuda, KojiChemistry Letters (2018), 47 (11), 1431-1434CODEN: CMLTAG; ISSN:0366-7022. (Chemical Society of Japan)Automatic mol. design with machine learning and simulations has shown a remarkable ability to generate new and promising drug candidates. We propose a new population-based approach using a grammatical evolution named ChemGE, that can update a large population of mols. concurrently and evaluate with multiple simulators in parallel. In computational expts., ChemGE succeeded in finding hundreds of candidate mols. whose affinity for thymidine kinase is better than that of known binding mols. in a database (DUD-E).
- 10Ǵomez-Bombarelli, R.; Wei, J. N.; Duvenaud, D.; Hernández-Lobato, J. M.; Śanchez- Lengeling, B.; Sheberla, D.; Aguilera-Iparraguirre, J.; Hirzel, T. D.; Adams, R. P.; Aspuru-Guzik, A. Automatic Chemical Design Using a Data-Driven Continuous Rep- resentation of Molecules. ACS Cent. Sci. 2018, 4, 268– 276, DOI: 10.1021/acscentsci.7b00572[ACS Full Text
], [CAS], Google Scholar
10https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXntlWquw%253D%253D&md5=322d9ff569fc9c8831e91d915104d985Automatic Chemical Design Using a Data-Driven Continuous Representation of MoleculesGomez-Bombarelli, Rafael; Wei, Jennifer N.; Duvenaud, David; Hernandez-Lobato, Jose Miguel; Sanchez-Lengeling, Benjamin; Sheberla, Dennis; Aguilera-Iparraguirre, Jorge; Hirzel, Timothy D.; Adams, Ryan P.; Aspuru-Guzik, AlanACS Central Science (2018), 4 (2), 268-276CODEN: ACSCII; ISSN:2374-7951. (American Chemical Society)We report a method to convert discrete representations of mols. to and from a multidimensional continuous representation. This model allows us to generate new mols. for efficient exploration and optimization through open-ended spaces of chem. compds. A deep neural network was trained on hundreds of thousands of existing chem. structures to construct three coupled functions: an encoder, a decoder, and a predictor. The encoder converts the discrete representation of a mol. into a real-valued continuous vector, and the decoder converts these continuous vectors back to discrete mol. representations. The predictor ests. chem. properties from the latent continuous vector representation of the mol. Continuous representations of mols. allow us to automatically generate novel chem. structures by performing simple operations in the latent space, such as decoding random vectors, perturbing known chem. structures, or interpolating between mols. Continuous representations also allow the use of powerful gradient-based optimization to efficiently guide the search for optimized functional compds. We demonstrate our method in the domain of drug-like mols. and also in a set of mols. with fewer that nine heavy atoms. - 11Gawehn, E.; Hiss, J. A.; Schneider, G. Deep learning in drug discovery. Mol. Inf. 2016, 35, 3– 14, DOI: 10.1002/minf.201501008[Crossref], [PubMed], [CAS], Google Scholar11https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXitV2rs7bE&md5=bd2553359c4b824c5272388462772659Deep Learning in Drug DiscoveryGawehn, Erik; Hiss, Jan A.; Schneider, GisbertMolecular Informatics (2016), 35 (1), 3-14CODEN: MIONBS; ISSN:1868-1743. (Wiley-VCH Verlag GmbH & Co. KGaA)Artificial neural networks had their first heyday in mol. informatics and drug discovery approx. two decades ago. Currently, we are witnessing renewed interest in adapting advanced neural network architectures for pharmaceutical research by borrowing from the field of "deep learning". Compared with some of the other life sciences, their application in drug discovery is still limited. Here, we provide an overview of this emerging field of mol. informatics, present the basic concepts of prominent deep learning methods and offer motivation to explore these techniques for their usefulness in computer-assisted drug discovery and design. We specifically emphasize deep neural networks, restricted Boltzmann machine networks and convolutional networks.
- 12Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press, 2016.Google ScholarThere is no corresponding record for this reference.
- 13Chollet, F. Deep learning with Python; Manning Publications Co: Shelter Island, NY, 2018.Google ScholarThere is no corresponding record for this reference.
- 14Foster, D.; Safari, A. O. M. C. Generative deep learning: teaching machines to paint, write, compose, and play; O’Reilly Media, 2019.Google ScholarThere is no corresponding record for this reference.
- 15White, D.; Wilson, R. C. Generative models for chemical structures. J. Chem. Inf. Model. 2010, 50, 1257– 1274, DOI: 10.1021/ci9004089[ACS Full Text
], [CAS], Google Scholar
15https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXos1ersr4%253D&md5=f694ddd48c404ebea7792e4a52458ebbGenerative Models for Chemical StructuresWhite, David; Wilson, Richard C.Journal of Chemical Information and Modeling (2010), 50 (7), 1257-1274CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)We apply recently developed techniques for pattern recognition to construct a generative model for chem. structure. This approach can be viewed as ligand-based de novo design. We construct a statistical model describing the structural variations present in a set of mols. which may be sampled to generate new structurally similar examples. We prevent the possibility of generating chem. invalid mols., according to our implicit hydrogen model, by projecting samples onto the nearest chem. valid mol. By populating the input set with mols. that are active against a target, we show how new mols. may be generated that will likely also be active against the target. - 16Sanchez-Lengeling, B.; Aspuru-Guzik, A. Inverse molecular design using machine learning: Generative models for matter engineering. Science 2018, 361, 360– 365, DOI: 10.1126/science.aat2663[Crossref], [PubMed], [CAS], Google Scholar16https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXhtlyitr3L&md5=779c4a42ba1e84d99d13ad1b32b9529aInverse molecular design using machine learning: Generative models for matter engineeringSanchez-Lengeling, Benjamin; Aspuru-Guzik, AlanScience (Washington, DC, United States) (2018), 361 (6400), 360-365CODEN: SCIEAS; ISSN:0036-8075. (American Association for the Advancement of Science)The discovery of new materials can bring enormous societal and technol. progress. In this context, exploring completely the large space of potential materials is computationally intractable. Here, we review methods for achieving inverse design, which aims to discover tailored materials from the starting point of a particular desired functionality. Recent advances from the rapidly growing field of artificial intelligence, mostly from the subfield of machine learning, have resulted in a fertile exchange of ideas, where approaches to inverse mol. design are being proposed and employed at a rapid pace. Among these, deep generative models have been applied to numerous classes of materials: rational design of prospective drugs, synthetic routes to org. compds., and optimization of photovoltaics and redox flow batteries, as well as a variety of other solid-state materials.
- 17Yuan, W.; Jiang, D.; Nambiar, D. K.; Liew, L. P.; Hay, M. P.; Bloomstein, J.; Lu, P.; Turner, B.; Le, Q.-T.; Tibshirani, R.; Khatri, P.; Moloney, M. G.; Koong, A. C. Chemical Space Mimicry for Drug Discovery. J. Chem. Inf. Model. 2017, 57, 875– 882, DOI: 10.1021/acs.jcim.6b00754[ACS Full Text
], [CAS], Google Scholar
17https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXjs12gtrs%253D&md5=f237aa1f08c24e416756cc80f018b3d1Chemical Space Mimicry for Drug DiscoveryYuan, William; Jiang, Dadi; Nambiar, Dhanya K.; Liew, Lydia P.; Hay, Michael P.; Bloomstein, Joshua; Lu, Peter; Turner, Brandon; Le, Quynh-Thu; Tibshirani, Robert; Khatri, Purvesh; Moloney, Mark G.; Koong, Albert C.Journal of Chemical Information and Modeling (2017), 57 (4), 875-882CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)The authors describe a new library generation method, Machine-based Identification of Mols. Inside Characterized Space (MIMICS) that generates sets of mols. inspired by a text-based input. MIMICS-generated libraries were found to preserve distributions of properties while simultaneously increasing structural diversity. Newly identified MIMICS-generated compds. were found to be bioactive as inhibitors of specific components of the unfolded protein response (UPR) and the VEGFR2 pathway in cell-based assays, thus confirming that applicability of this methodol. towards drug design applications. Wider application of MIMICS could facilitate the efficient utilization of chem. space. - 18Segler, M. H.; Kogej, T.; Tyrchan, C.; Waller, M. P. Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent. Sci. 2018, 4, 120– 131, DOI: 10.1021/acscentsci.7b00512[ACS Full Text
], [CAS], Google Scholar
18https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXitVCjsLfP&md5=708f40422c7a911c629525ce5b66088bGenerating Focused Molecule Libraries for Drug Discovery with Recurrent Neural NetworksSegler, Marwin H. S.; Kogej, Thierry; Tyrchan, Christian; Waller, Mark P.ACS Central Science (2018), 4 (1), 120-131CODEN: ACSCII; ISSN:2374-7951. (American Chemical Society)In de novo drug design, computational strategies are used to generate novel mols. with good affinity to the desired biol. target. In this work, we show that recurrent neural networks can be trained as generative models for mol. structures, similar to statistical language models in natural language processing. We demonstrate that the properties of the generated mols. correlate very well with the properties of the mols. used to train the model. In order to enrich libraries with mols. active toward a given biol. target, we propose to fine-tune the model with small sets of mols., which are known to be active against that target. Against Staphylococcus aureus, the model reproduced 14% of 6051 hold-out test mols. that medicinal chemists designed, whereas against Plasmodium falciparum (Malaria), it reproduced 28% of 1240 test mols. When coupled with a scoring function, our model can perform the complete de novo drug design cycle to generate large sets of novel mols. for drug discovery. - 19Elton, D. C.; Boukouvalas, Z.; Fuge, M. D.; Chung, P. W. Deep learning for molecular design─a review of the state of the art. Mol. Syst. Des. Eng. 2019, 4, 828– 849, DOI: 10.1039/C9ME00039A[Crossref], [CAS], Google Scholar19https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXhtVWktLjN&md5=a0925fdace3a12af31b2f6fc1edef89bDeep learning for molecular design-a review of the state of the artElton, Daniel C.; Boukouvalas, Zois; Fuge, Mark D.; Chung, Peter W.Molecular Systems Design & Engineering (2019), 4 (4), 828-849CODEN: MSDEBG; ISSN:2058-9689. (Royal Society of Chemistry)In the space of only a few years, deep generative modeling has revolutionized how we think of artificial creativity, yielding autonomous systems which produce original images, music, and text. Inspired by these successes, researchers are now applying deep generative modeling techniques to the generation and optimization of mols.-in our review we found 45 papers on the subject published in the past two years. These works point to a future where such systems will be used to generate lead mols., greatly reducing resources spent downstream synthesizing and characterizing bad leads in the lab. In this review we survey the increasingly complex landscape of models and representation schemes that have been proposed. The four classes of techniques we describe are recursive neural networks, autoencoders, generative adversarial networks, and reinforcement learning. After first discussing some of the math. fundamentals of each technique, we draw high level connections and comparisons with other techniques and expose the pros and cons of each. Several important high level themes emerge as a result of this work, including the shift away from the SMILES string representation of mols. towards more sophisticated representations such as graph grammars and 3D representations, the importance of reward function design, the need for better stds. for benchmarking and testing, and the benefits of adversarial training and reinforcement learning over max. likelihood based training.
- 20Schwalbe-Koda, D.; Ǵomez-Bombarelli, R. In Machine Learning Meets Quantum Physics; Schütt, K. T., Chmiela, S., von Lilienfeld, O. A., Tkatchenko, A., Tsuda, K., Müller, K.-R., Eds.; Springer International Publishing: Cham, 2020; pp 445– 467.
- 21Zhavoronkov, A.; Vanhaelen, Q.; Oprea, T. I. Will Artificial Intelligence for Drug Discovery Impact Clinical Pharmacology?. Clin. Pharmacol. Ther. (N. Y., NY, U. S.) 2020, 107, 780– 785, DOI: 10.1002/cpt.1795
- 22Bian, Y.; Xie, X.-Q. Generative chemistry: drug discovery with deep learning gener- ative models. J. Mol. Model. 2021, 27, 71, DOI: 10.1007/s00894-021-04674-8[Crossref], [PubMed], [CAS], Google Scholar22https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3MXhsValsLrP&md5=bf1851d6e96138674bc169295356eee6Generative chemistry: drug discovery with deep learning generative modelsBian, Yuemin; Xie, Xiang-QunJournal of Molecular Modeling (2021), 27 (3), 71CODEN: JMMOFK; ISSN:0948-5023. (Springer)A review. The de novo design of mol. structures using deep learning generative models introduces an encouraging soln. to drug discovery in the face of the continuously increased cost of new drug development. From the generation of original texts, images, and videos, to the scratching of novel mol. structures the creativity of deep learning generative models exhibits the height machine intelligence can achieve. The purpose of this paper is to review the latest advances in generative chem. which relies on generative modeling to expedite the drug discovery process. This review starts with a brief history of artificial intelligence in drug discovery to outline this emerging paradigm. Commonly used chem. databases, mol. representations, and tools in cheminformatics and machine learning are covered as the infrastructure for generative chem. The detailed discussions on utilizing cutting-edge generative architectures, including recurrent neural network, variational autoencoder, adversarial autoencoder, and generative adversarial network for compd. generation are focused. Challenges and future perspectives follow.
- 23Chen, H.; Engkvist, O.; Wang, Y.; Olivecrona, M.; Blaschke, T. The rise of deep learning in drug discovery. Drug Discovery Today 2018, 23, 1241– 1250, DOI: 10.1016/j.drudis.2018.01.039[Crossref], [PubMed], [CAS], Google Scholar23https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC1MvjvFyqtQ%253D%253D&md5=d6cbdd98ede30181802cca1786cd5a95The rise of deep learning in drug discoveryChen Hongming; Engkvist Ola; Olivecrona Marcus; Blaschke Thomas; Wang YinhaiDrug discovery today (2018), 23 (6), 1241-1250 ISSN:.Over the past decade, deep learning has achieved remarkable success in various artificial intelligence research areas. Evolved from the previous research on artificial neural networks, this technology has shown superior performance to other machine learning algorithms in areas such as image and voice recognition, natural language processing, among others. The first wave of applications of deep learning in pharmaceutical research has emerged in recent years, and its utility has gone beyond bioactivity predictions and has shown promise in addressing diverse problems in drug discovery. Examples will be discussed covering bioactivity prediction, de novo molecular design, synthesis prediction and biological image analysis.
- 24Vamathevan, J.; Clark, D.; Czodrowski, P.; Dunham, I.; Ferran, E.; Lee, G.; Li, B.; Madabhushi, A.; Shah, P.; Spitzer, M.; Zhao, S. Applications of machine learning in drug discovery and development. Nat. Rev. Drug Discovery 2019, 18, 463– 477, DOI: 10.1038/s41573-019-0024-5[Crossref], [PubMed], [CAS], Google Scholar24https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXosF2rtrY%253D&md5=211782aeea3d8b9f50368f89177a70d2Applications of machine learning in drug discovery and developmentVamathevan, Jessica; Clark, Dominic; Czodrowski, Paul; Dunham, Ian; Ferran, Edgardo; Lee, George; Li, Bin; Madabhushi, Anant; Shah, Parantu; Spitzer, Michaela; Zhao, ShanrongNature Reviews Drug Discovery (2019), 18 (6), 463-477CODEN: NRDDAG; ISSN:1474-1776. (Nature Research)A review. Drug discovery and development pipelines are long, complex and depend on numerous factors. Machine learning (ML) approaches provide a set of tools that can improve discovery and decision making for well-specified questions with abundant, high-quality data. Opportunities to apply ML occur in all stages of drug discovery. Examples include target validation, identification of prognostic biomarkers and anal. of digital pathol. data in clin. trials. Applications have ranged in context and methodol., with some approaches yielding accurate predictions and insights. The challenges of applying ML lie primarily with the lack of interpretability and repeatability of ML-generated results, which may limit their application. In all areas, systematic and comprehensive high-dimensional data still need to be generated. With ongoing efforts to tackle these issues, as well as increasing awareness of the factors needed to validate ML approaches, the application of ML can promote data-driven decision making and has the potential to speed up the process and reduce failure rates in drug discovery and development.
- 25Engel, T., Gasteiger, J., Eds. Chemoinformatics: basic concepts and methods; Wiley-VCH: Weinheim, 2018; OCLC: 1012130305.
- 26Kim, S.; Chen, J.; Cheng, T.; Gindulyte, A.; He, J.; He, S.; Li, Q.; Shoemaker, B. A.; Thiessen, P. A.; Yu, B.; Zaslavsky, L.; Zhang, J.; Bolton, E. E. PubChem 2019 update: improved access to chemical data. Nucleic Acids Res. 2019, 47, D1102– D1109, DOI: 10.1093/nar/gky1033
- 27Ash, S.; Cline, M.; Homer, R. W.; Hurst, T.; Smith, G. SYBYL Line Notation (SLN): A Versatile Language for Chemical Structure Representation. J. Chem. Inf. Comput. Sci. 1997, 37, 71– 79, DOI: 10.1021/ci960109j[ACS Full Text
], [CAS], Google Scholar
27https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK2sXksVCitA%253D%253D&md5=9c53b87aec0a69043773cd771c662152SYBYL Line Notation (SLN): A Versatile Language for Chemical Structure RepresentationAsh, Sheila; Cline, Malcolm A.; Homer, R. Webster; Hurst, Tad; Smith, Gregory B.Journal of Chemical Information and Computer Sciences (1997), 37 (1), 71-79CODEN: JCISD8; ISSN:0095-2338. (American Chemical Society)SYBYL Line Notation (SLN) is an ASCII language used to represent chem. structures, including common org. mols., macromols., polymers, and combinatorial libraries. SLN is also used to express substructural (2D) queries and includes a complete facility for Markush representation. This concise language is ideal for database storage of chem. entities as well as for network communication of structures and queries. - 28Koniver, D. A.; Wiswesser, W. J.; Usdin, E. Wiswesser Line Notation: Simplified Techniques for Converting Chemical Structures to WLN. Science 1972, 176, 1437– 1439, DOI: 10.1126/science.176.4042.1437[Crossref], [PubMed], [CAS], Google Scholar28https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaE38Xks1Srt7c%253D&md5=e7b61344874233d0114df299b83e3a9dWiswesser Line Notation. Simplified techniques for converting chemical structures to WLNKoniver, Deena A.; Wiswesser, William J.; Usdin, EarlScience (Washington, DC, United States) (1972), 176 (4042), 1437-9CODEN: SCIEAS; ISSN:0036-8075.Techniques were developed for the generation of Wiswesser Line Notations (WLN), which require knowledge neither of rules for manual conversion of structures to line notations nor of computer programing. The desired WLN are obtained simply by drawing the structures of the compds. of interest on a tablet, which is linked to an appropriately programmed computer.
- 29Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Model. 1988, 28, 31– 36, DOI: 10.1021/ci00057a005[ACS Full Text
], [CAS], Google Scholar
29https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaL1cXnsVeqsA%253D%253D&md5=04592975f9dd3c0ce3c1ad618ba2b17dSMILES, a chemical language and information system. 1. Introduction to methodology and encoding rulesWeininger, DavidJournal of Chemical Information and Computer Sciences (1988), 28 (1), 31-6CODEN: JCISD8; ISSN:0095-2338.The SMILES (simplified mol. input line entry system) chem. notation system is described for information processing. The system is based on principles of mol. graph theory and it allows structure specification by use of a very small and natural grammar well suited for high-speed machine processing. The system is easy to use, has high machine compatibility, and allows many computer applications, including notation generation, const. speed database retrieval, substructure searching, and property prediction models. - 30O’Boyle, N. M. Towards a Universal SMILES representation - A standard method to generate canonical SMILES based on the InChI. J. Cheminf. 2012, 4, 22, DOI: 10.1186/1758-2946-4-22[Crossref], [CAS], Google Scholar30https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XhvVSiur7I&md5=eb59d742c5dec35b5d2c90417acd223dTowards a universal SMILES representation - a standard method to generate canonical SMILES based on the InChIO'Boyle, Noel M.Journal of Cheminformatics (2012), 4 (), 22CODEN: JCOHB3; ISSN:1758-2946. (Chemistry Central Ltd.)Background: There are two line notations of chem. structures that have established themselves in the field: the SMILES string and the InChI string. The InChI aims to provide a unique, or canonical, identifier for chem. structures, while SMILES strings are widely used for storage and interchange of chem. structures, but no std. exists to generate a canonical SMILES string. Results: I describe how to use the InChI canonicalisation to derive a canonical SMILES string in a straightforward way, either incorporating the InChI normalizations (Inchified SMILES) or not (Universal SMILES). This is the first description of a method to generate canonical SMILES that takes stereochem. into account. When tested on the 1.1 m compds. in the ChEMBL database, and a 1 m compd. subset of the PubChem Substance database, no canonicalisation failures were found with Inchified SMILES. Using Universal SMILES, 99.79% of the ChEMBL database was canonicalised successfully and 99.77% of the PubChem subset. Conclusions: The InChI canonicalisation algorithm can successfully be used as the basis for a common std. for canonical SMILES. While challenges remain - such as the development of a std. arom. model for SMILES - the ability to create the same SMILES using different toolkits will mean that for the first time it will be possible to easily compare the chem. models used by different toolkits.
- 31Bjerrum, E. J. SMILES enumeration as data augmentation for neural network modeling of molecules. arXiv (Machine Learning) , May 17, 2017, 703.07076, ver. 2.Google ScholarThere is no corresponding record for this reference.
- 32Bjerrum, E. J.; Sattarov, B. Improving chemical autoencoder latent space and molec- ular de novo generation diversity with heteroencoders. Biomolecules 2018, 8, 131, DOI: 10.3390/biom8040131[Crossref], [CAS], Google Scholar32https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXit12hurjL&md5=c0e3ae0b5ec0126d70e59045568b1a90Improving chemical autoencoder latent space and molecular de novo generation diversity with heteroencodersBjerrum, Esben Jannik; Sattarov, BorisBiomolecules (2018), 8 (4), 131/1-131/17CODEN: BIOMHC; ISSN:2218-273X. (MDPI AG)Chem. autoencoders are attractive models as they combine chem. space navigation with possibilities for de novo mol. generation in areas of interest. This enables them to produce focused chem. libraries around a single lead compd. for employment early in a drug discovery project. Here, it is shown that the choice of chem. representation, such as strings from the simplified mol.-input line-entry system (SMILES), has a large influence on the properties of the latent space. It is further explored to what extent translating between different chem. representations influences the latent space similarity to the SMILES strings or circular fingerprints. By employing SMILES enumeration for either the encoder or decoder, it is found that the decoder has the largest influence on the properties of the latent space. Training a sequence to sequence heteroencoder based on recurrent neural networks (RNNs) with long short-term memory cells (LSTM) to predict different enumerated SMILES strings from the same canonical SMILES string gives the largest similarity between latent space distance and mol. similarity measured as circular fingerprints similarity. Using the output from the code layer in quant. structure activity relationship (QSAR) of five mol. datasets shows that heteroencoder derived vectors markedly outperforms autoencoder derived vectors as well as models built using ECFP4 fingerprints, underlining the increased chem. relevance of the latent space.
- 33Arús-Pous, J.; Johansson, S. V.; Prykhodko, O.; Bjerrum, E. J.; Tyrchan, C.; Rey- mond, J.-L.; Chen, H.; Engkvist, O. Randomized SMILES strings improve the quality of molecular generative models. J. Cheminf. 2019, 11, 71, DOI: 10.1186/s13321-019-0393-0
- 34Moret, M.; Friedrich, L.; Grisoni, F.; Merk, D.; Schneider, G. Generative molecular design in low data regimes. Nature Machine Intelligence 2020, 2, 171– 180, DOI: 10.1038/s42256-020-0160-y
- 35van Deursen, R.; Ertl, P.; Tetko, I. V.; Godin, G. GEN: highly efficient SMILES explorer using autodidactic generative examination networks. J. Cheminf. 2020, 12, 22, DOI: 10.1186/s13321-020-00425-8
- 36Prykhodko, O.; Johansson, S. V.; Kotsias, P.-C.; Arús-Pous, J.; Bjerrum, E. J.; En- gkvist, O.; Chen, H. A de novo molecular generation method using latent vector based generative adversarial network. J. Cheminf. 2019, 11, 74, DOI: 10.1186/s13321-019-0397-9
- 37Heller, S. R.; McNaught, A.; Pletnev, I.; Stein, S.; Tchekhovskoi, D. InChI, the IUPAC International Chemical Identifier. J. Cheminf. 2015, 7, 23, DOI: 10.1186/s13321-015-0068-4[Crossref], [CAS], Google Scholar37https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC2MbpslOrtQ%253D%253D&md5=4acc4f470f8cdb9b4f84558fd3302470InChI, the IUPAC International Chemical IdentifierHeller Stephen R; Stein Stephen; Tchekhovskoi Dmitrii; McNaught Alan; Pletnev IgorJournal of cheminformatics (2015), 7 (), 23 ISSN:1758-2946.This paper documents the design, layout and algorithms of the IUPAC International Chemical Identifier, InChI.
- 38Winter, R.; Montanari, F.; Nóe, F.; Clevert, D.-A. Learning continuous and data- driven molecular descriptors by translating equivalent chemical representations. Chem. Sci. 2019, 10, 1692– 1701, DOI: 10.1039/C8SC04175J[Crossref], [PubMed], [CAS], Google Scholar38https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXit1aqsbnO&md5=57678211dbeac5a8135e2d38c41eeee2Learning continuous and data-driven molecular descriptors by translating equivalent chemical representationsWinter, Robin; Montanari, Floriane; Noe, Frank; Clevert, Djork-ArneChemical Science (2019), 10 (6), 1692-1701CODEN: CSHCCN; ISSN:2041-6520. (Royal Society of Chemistry)There has been a recent surge of interest in using machine learning across chem. space in order to predict properties of mols. or design mols. and materials with the desired properties. Most of this work relies on defining clever feature representations, in which the chem. graph structure is encoded in a uniform way such that predictions across chem. space can be made. In this work, we propose to exploit the powerful ability of deep neural networks to learn a feature representation from low-level encodings of a huge corpus of chem. structures. Our model borrows ideas from neural machine translation: it translates between two semantically equiv. but syntactically different representations of mol. structures, compressing the meaningful information both representations have in common in a low-dimensional representation vector. Once the model is trained, this representation can be extd. for any new mol. and utilized as a descriptor. In fair benchmarks with respect to various human-engineered mol. fingerprints and graph-convolution models, our method shows competitive performance in modeling quant. structure-activity relationships in all analyzed datasets. Addnl., we show that our descriptor significantly outperforms all baseline mol. fingerprints in two ligand-based virtual screening tasks. Overall, our descriptors show the most consistent performances in all expts. The continuity of the descriptor space and the existence of the decoder that permits deducing a chem. structure from an embedding vector allow for exploration of the space and open up new opportunities for compd. optimization and idea generation.
- 39O’Boyle, N.; Dalke, A. DeepSMILES: An Adaptation of SMILES for Use in Machine- Learning of Chemical Structures; preprint, ChemRxiv , September 19, 2018, ver. 1. DOI: 10.26434/chemrxiv.7097960.v1 .
- 40Krenn, M.; Häse, F.; Nigam, A.; Friederich, P.; Aspuru-Guzik, A. Self-referencing em- bedded strings (SELFIES): A 100% robust molecular string representation. Machine Learning: Science and Technology 2020, 1, 045024, DOI: 10.1088/2632-2153/aba947
- 41Faulon, J.-L., Bender, A., Eds. Handbook of chemoinformatics algorithms; Chapman & Hall/CRC mathematical and computational biology series; Chapman & Hall/CRC: Boca Raton, FL, 2010; Chapter 1. OCLC: ocn226357322.
- 42Wishart, D. S. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 2018, 46, D1074– D1082, DOI: 10.1093/nar/gkx1037[Crossref], [PubMed], [CAS], Google Scholar42https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXitlGisbvI&md5=986b28c7ea546596a26dd3ba38f05feeDrugBank 5.0: a major update to the DrugBank database for 2018Wishart, David S.; Feunang, Yannick D.; Guo, An C.; Lo, Elvis J.; Marcu, Ana; Grant, Jason R.; Sajed, Tanvir; Johnson, Daniel; Li, Carin; Sayeeda, Zinat; Assempour, Nazanin; Iynkkaran, Ithayavani; Liu, Yifeng; Maciejewski, Adam; Gale, Nicola; Wilson, Alex; Chin, Lucy; Cummings, Ryan; Le, Diana; Pon, Allison; Knox, Craig; Wilson, MichaelNucleic Acids Research (2018), 46 (D1), D1074-D1082CODEN: NARHAD; ISSN:1362-4962. (Oxford University Press)DrugBank is a web-enabled database contg. comprehensivemol. information about drugs, their mechanisms, their interactions and their targets. First described in 2006, Drug- Bank has continued to evolve over the past 12 years in response to marked improvements to web stds. and changing needs for drug research and development. This year's update, DrugBank 5.0, represents the most significant upgrade to the database in more than 10 years. In many cases, existing data content has grown by 100% or more over the last update. For instance, the total no. of investigational drugs in the database has grown by almost 300%, the no. of drug-drug interactions has grown by nearly 600% and the no. of SNP-assocd. drug effects has grown more than 3000%. Significant improvements have been made to the quantity, quality and consistency of drug indications, drug binding data as well as drug-drug and drug-food interactions. A great deal of brand new data have also been added to DrugBank 5.0. This includes information on the influence of hundreds of drugs on metabolite levels (pharmacometabolomics), gene expression levels (pharmacotranscriptomics) and protein expression levels (pharmacoproteomics). New data have also been added on the status of hundreds of newdrug clin. trials and existing drug repurposing trials. Many other important improvements in the content, interface and performance of the DrugBank website have been made and these should greatly enhance its ease of use, utility and potential applications in many areas of pharmacol. research, pharmaceutical science and drug education.
- 43Gaulton, A.; Bellis, L. J.; Bento, A. P.; Chambers, J.; Davies, M.; Hersey, A.; Light, Y.; McGlinchey, S.; Michalovich, D.; Al-Lazikani, B.; Overington, J. P. ChEMBL: a large- scale bioactivity database for drug discovery. Nucleic Acids Res. 2012, 40, D1100– D1107, DOI: 10.1093/nar/gkr777[Crossref], [PubMed], [CAS], Google Scholar43https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXhs12htbjN&md5=aedf7793e1ca54b6a4fa272ea3ef7d0eChEMBL: a large-scale bioactivity database for drug discoveryGaulton, Anna; Bellis, Louisa J.; Bento, A. Patricia; Chambers, Jon; Davies, Mark; Hersey, Anne; Light, Yvonne; McGlinchey, Shaun; Michalovich, David; Al-Lazikani, Bissan; Overington, John P.Nucleic Acids Research (2012), 40 (D1), D1100-D1107CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)ChEMBL is an Open Data database contg. binding, functional and ADMET information for a large no. of drug-like bioactive compds. These data are manually abstracted from the primary published literature on a regular basis, then further curated and standardized to maximize their quality and utility across a wide range of chem. biol. and drug-discovery research problems. Currently, the database contains 5.4 million bioactivity measurements for more than 1 million compds. and 5200 protein targets. Access is available through a web-based interface, data downloads and web services at: https://www.ebi.ac.uk/chembldb.
- 44Landrum, G. RDKit: open-source cheminformatics software , 2016.
- 45Steinbeck, C.; Hoppe, C.; Kuhn, S.; Floris, M.; Guha, R.; Willighagen, E. Recent De- velopments of the Chemistry Development Kit (CDK) - An Open-Source Java Library for Chemo- and Bioinformatics. Curr. Pharm. Des. 2006, 12, 2111– 2120, DOI: 10.2174/138161206777585274[Crossref], [PubMed], [CAS], Google Scholar45https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XmslWqsL0%253D&md5=4e7e47ffe75b600ee3f81309a4bbb609Recent developments of the Chemistry Development Kit (CDK) - an open-source Java library for chemo- and bioinformaticsSteinbeck, Christoph; Hoppe, Christian; Kuhn, Stefan; Floris, Matteo; Guha, Rajarshi; Willighagen, Egon L.Current Pharmaceutical Design (2006), 12 (17), 2111-2120CODEN: CPDEFP; ISSN:1381-6128. (Bentham Science Publishers Ltd.)The Chem. Development Kit (CDK) provides methods for common tasks in mol. informatics, including 2D and 3D rendering of chem. structures, I/O routines, SMILES parsing and generation, ring searches, isomorphism checking, structure diagram generation, etc. Implemented in Java, it is used both for server-side computational services, possibly equipped with a web interface, as well as for applications and client-side applets. This article introduces the CDK's new QSAR capabilities and the recently introduced interface to statistical software.
- 46Sun, J.; Jeliazkova, N.; Chupakhin, V.; Golib-Dzib, J.-F.; Engkvist, O.; Carlsson, L.; Wegner, J.; Ceulemans, H.; Georgiev, I.; Jeliazkov, V.; Kochev, N.; Ashby, T. J.; Chen, H. ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomics. J. Cheminf. 2017, 9, 17, DOI: 10.1186/s13321-017-0222-2[Crossref], [CAS], Google Scholar46https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXlsFGhsbc%253D&md5=1bd39132077c5c91f19bae5ea47c1b27ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomicsSun, Jiangming; Jeliazkova, Nina; Chupakin, Vladimir; Golib-Dzib, Jose-Felipe; Engkvist, Ola; Carlsson, Lars; Wegner, Joerg; Ceulemans, Hugo; Georgiev, Ivan; Jeliazkov, Vedrin; Kochev, Nikolay; Ashby, Thomas J.; Chen, HongmingJournal of Cheminformatics (2017), 9 (), 17/1-17/9CODEN: JCOHB3; ISSN:1758-2946. (Chemistry Central Ltd.)Chemogenomics data generally refers to the activity data of chem. compds. on an array of protein targets and represents an important source of information for building in silico target prediction models. The increasing vol. of chemogenomics data offers exciting opportunities to build models based on Big Data. Prepg. a high quality data set is a vital step in realizing this goal and this work aims to compile such a comprehensive chemogenomics dataset. This dataset comprises over 70 million SAR data points from publicly available databases (PubChem and ChEMBL) including structure, target information and activity annotations. Our aspiration is to create a useful chemogenomics resource reflecting industry-scale data not only for building predictive models of in silico polypharmacol. and offtarget effects but also for the validation of chemoinformatics approaches in general.
- 47Irwin, J. J.; Sterling, T.; Mysinger, M. M.; Bolstad, E. S.; Coleman, R. G. ZINC: A Free Tool to Discover Chemistry for Biology. J. Chem. Inf. Model. 2012, 52, 1757– 1768, DOI: 10.1021/ci3001277[ACS Full Text
], [CAS], Google Scholar
47https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XmvFGnsrg%253D&md5=97f2ede64afc6b5e3ea2f279e38e32a0ZINC: A Free Tool to Discover Chemistry for BiologyIrwin, John J.; Sterling, Teague; Mysinger, Michael M.; Bolstad, Erin S.; Coleman, Ryan G.Journal of Chemical Information and Modeling (2012), 52 (7), 1757-1768CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)ZINC is a free public resource for ligand discovery. The database contains over twenty million com. available mols. in biol. relevant representations that may be downloaded in popular ready-to-dock formats and subsets. The Web site also enables searches by structure, biol. activity, phys. property, vendor, catalog no., name, and CAS no. Small custom subsets may be created, edited, shared, docked, downloaded, and conveyed to a vendor for purchase. The database is maintained and curated for a high purchasing success rate and is freely available at zinc.docking.org. - 48Shivanyuk, A.; Ryabukhin, S.; Tolmachev, A.; Bogolyubsky, A.; Mykytenko, D.; Chupryna, A.; Heilman, W.; Kostyuk, A. Enamine real database: Making chemical diversity real. Chem. Today 2007, 25, 58– 59[CAS], Google Scholar48https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXktVSns7w%253D&md5=648ca29e7a3a36591a42d88349789e80Enamine real database: making chemical diversity realShivanyuk, Alexander N.; Ryabukhin, Sergey V.; Bogolyubsky, Andrey V.; Mykytenko, Dmytro M.; Chupryna, Alexander A.; Heilman, William; Kostyuk, Alexander N.; Tolmachev, Andrey A.Chimica Oggi (2007), 25 (6), 58-59CODEN: CHOGDS; ISSN:0392-839X. (Tekno Scienze)The Enamine REAL DataBase (RDB) covers rigorously validated chem. space of over 29,000,000 virtual HTS compds., over 10,000,000 of which comply to drug likeness Rule-of-5 stds. The high efficiency of our RDB methodol. is based on 30 optimized reactions, 54 optimized chem. procedures applied to 18,000 proprietary in house and 9000 purchased building blocks and our efficient algorithms for calcg. the synthetic feasibility of all virtual structures. Optimized schemes for RDB prodn. allows the synthesis of 20,000 compds. a month with an av. feasibility rate of 65%.
- 49Huang, R.; Xia, M.; Nguyen, D.-T.; Zhao, T.; Sakamuru, S.; Zhao, J.; Shahane, S. A.; Rossoshek, A.; Simeonov, A. Tox21Challenge to build predictive models of nuclear receptor and stress response pathways as mediated by exposure to environmental chemicals and drugs. Front. Environ. Sci. 2016, 3, 85, DOI: 10.3389/fenvs.2015.00085
- 50Ramakrishnan, R.; Hartmann, M.; Tapavicza, E.; Von Lilienfeld, O. A. Electronic spectra from TDDFT and machine learning in chemical space. J. Chem. Phys. 2015, 143, 084111, DOI: 10.1063/1.4928757[Crossref], [PubMed], [CAS], Google Scholar50https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXhsVSlt73O&md5=b439f44c75bb8f99906d3c920bfe7c6fElectronic spectra from TDDFT and machine learning in chemical spaceRamakrishnan, Raghunathan; Hartmann, Mia; Tapavicza, Enrico; von Lilienfeld, O. AnatoleJournal of Chemical Physics (2015), 143 (8), 084111/1-084111/8CODEN: JCPSA6; ISSN:0021-9606. (American Institute of Physics)Due to its favorable computational efficiency, time-dependent (TD) d. functional theory (DFT) enables the prediction of electronic spectra in a high-throughput manner across chem. space. Its predictions, however, can be quite inaccurate. We resolve this issue with machine learning models trained on deviations of ref. second-order approx. coupled-cluster (CC2) singles and doubles spectra from TDDFT counterparts, or even from DFT gap. We applied this approach to low-lying singlet-singlet vertical electronic spectra of over 20 000 synthetically feasible small org. mols. with up to eight CONF atoms. The prediction errors decay monotonously as a function of training set size. For a training set of 10 000 mols., CC2 excitation energies can be reproduced to within ±0.1 eV for the remaining mols. Anal. of our spectral database via chromophore counting suggests that even higher accuracies can be achieved. Based on the evidence collected, we discuss open challenges assocd. with data-driven modeling of high-lying spectra and transition intensities. (c) 2015 American Institute of Physics.
- 51Ramakrishnan, R.; Dral, P. O.; Rupp, M.; Von Lilienfeld, O. A. Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data 2014, 1, 140022, DOI: 10.1038/sdata.2014.22[Crossref], [PubMed], [CAS], Google Scholar51https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXks1aisLo%253D&md5=feaffe204e7139a5fcd685bc2c6841fcQuantum chemistry structures and properties of 134 kilo moleculesRamakrishnan, Raghunathan; Dral, Pavlo O.; Rupp, Matthias; von Lilienfeld, O. AnatoleScientific Data (2014), 1 (), 140022CODEN: SDCABS; ISSN:2052-4463. (Nature Publishing Group)Computational de novo design of new drugs and materials requires rigorous and unbiased exploration of chem. compd. space. However, large uncharted territories persist due to its size scaling combinatorially with mol. size. We report computed geometric, energetic, electronic, and thermodn. properties for 134k stable small org. mols. made up of CHONF. These mols. correspond to the subset of all 133,885 species with up to nine heavy atoms (CONF) out of the GDB-17 chem. universe of 166 billion org. mols. We report geometries minimal in energy, corresponding harmonic frequencies, dipole moments, polarizabilities, along with energies, enthalpies, and free energies of atomization. All properties were calcd. at the B3LYP/6-31G(2df,p) level of quantum chem. Furthermore, for the predominant stoichiometry, C7H10O2, there are 6,095 constitutional isomers among the 134k mols. We report energies, enthalpies, and free energies of atomization at the more accurate G4MP2 level of theory for all of them. As such, this data set provides quantum chem. properties for a relevant, consistent, and comprehensive chem. space of small org. mols. This database may serve the benchmarking of existing methods, development of new methods, such as hybrid quantum mechanics/machine learning, and systematic identification of structure-property relationships.
- 52Wang, R.; Fang, X.; Lu, Y.; Wang, S. The PDBbind database: Collection of binding affinities for protein- ligand complexes with known three-dimensional structures. J. Med. Chem. 2004, 47, 2977– 2980, DOI: 10.1021/jm030580l[ACS Full Text
], [CAS], Google Scholar
52https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2cXjs1Sjs74%253D&md5=86e609172307402d8b0d4589b1270a2fThe PDBbind database: Collection of binding affinities for protein-ligand complexes with known three-dimensional structuresWang, Renxiao; Fang, Xueliang; Lu, Yipin; Wang, ShaomengJournal of Medicinal Chemistry (2004), 47 (12), 2977-2980CODEN: JMCMAR; ISSN:0022-2623. (American Chemical Society)We have screened the entire Protein Data Bank (Release No. 103, Jan. 2003) and identified 5671 protein-ligand complexes out of 19 621 exptl. structures. A systematic examn. of the primary refs. of these entries has led to a collection of binding affinity data (Kd, Ki, and IC50) for a total of 1359 complexes. The outcomes of this project have been organized into a Web-accessible database named the PDBbind database. - 53Cho, K.; Van Merrïenboer, B.; Bahdanau, D.; Bengio, Y. On the properties of neural machine translation: Encoder-decoder approaches. arXiv (Computation and Language) , October 7, 2014, 1409.1259, ver. 2.Google ScholarThere is no corresponding record for this reference.
- 54Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural computation 1997, 9, 1735– 1780, DOI: 10.1162/neco.1997.9.8.1735[Crossref], [PubMed], [CAS], Google Scholar54https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADyaK1c%252FhvVahsQ%253D%253D&md5=5da426ddc18e5bc1972e520bbcc33becLong short-term memoryHochreiter S; Schmidhuber JNeural computation (1997), 9 (8), 1735-80 ISSN:0899-7667.Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient-based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O(1). Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.
- 55Goodfellow, I. J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. arXiv (Machine Learning) , June 10, 2014, 1406.2661, ver. 1.Google ScholarThere is no corresponding record for this reference.
- 56Kingma, D. P.; Welling, M. Auto-encoding variational bayes. arXiv (Machine Learning) , May 1, 2014, 1312.6114, ver. 10.Google ScholarThere is no corresponding record for this reference.
- 57Makhzani, A.; Shlens, J.; Jaitly, N.; Goodfellow, I.; Frey, B. Adversarial autoencoders. arXiv (Machine Learning) , May 25, 2016, 1511.05644, ver. 2..Google ScholarThere is no corresponding record for this reference.
- 58Olivecrona, M.; Blaschke, T.; Engkvist, O.; Chen, H. Molecular de-novo design through deep reinforcement learning. J. Cheminf. 2017, 9, 48, DOI: 10.1186/s13321-017-0235-x[Crossref], [CAS], Google Scholar58https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC1M7mtVKisg%253D%253D&md5=68f5a1219ef81fbd52a5c8911cacbbfcMolecular de-novo design through deep reinforcement learningOlivecrona Marcus; Blaschke Thomas; Engkvist Ola; Chen HongmingJournal of cheminformatics (2017), 9 (1), 48 ISSN:1758-2946.This work introduces a method to tune a sequence-based generative model for molecular de novo design that through augmented episodic likelihood can learn to generate structures with certain specified desirable properties. We demonstrate how this model can execute a range of tasks such as generating analogues to a query structure and generating compounds predicted to be active against a biological target. As a proof of principle, the model is first trained to generate molecules that do not contain sulphur. As a second example, the model is trained to generate analogues to the drug Celecoxib, a technique that could be used for scaffold hopping or library expansion starting from a single molecule. Finally, when tuning the model towards generating compounds predicted to be active against the dopamine receptor type 2, the model generates structures of which more than 95% are predicted to be active, including experimentally confirmed actives that have not been included in either the generative model nor the activity prediction model. Graphical abstract .
- 59Gupta, A.; Müller, A. T.; Huisman, B. J.; Fuchs, J. A.; Schneider, P.; Schneider, G. Generative Recurrent Networks for De Novo Drug Design. Mol. Inf. 2018, 37, 1700111, DOI: 10.1002/minf.201700111
- 60Guimaraes, G. L.; Sanchez-Lengeling, B.; Outeiral, C.; Farias, P. L. C.; Aspuru- Guzik, A. Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models. arXiv (Machine Learning) , February 7, 2018, 1705.10843, ver. 3.Google ScholarThere is no corresponding record for this reference.
- 61Lim, J.; Ryu, S.; Kim, J. W.; Kim, W. Y. Molecular generative model based on conditional variational autoencoder for de novo molecular design. J. Cheminf. 2018, 10, 31, DOI: 10.1186/s13321-018-0286-7[Crossref], [CAS], Google Scholar61https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXmtFWhtrw%253D&md5=8a1262077f9d6ffa5ec0385ed1a69f6fMolecular generative model based on conditional variational autoencoder for de novo molecular designLim, Jaechang; Ryu, Seongok; Kim, Jin Woo; Kim, Woo YounJournal of Cheminformatics (2018), 10 (), 31/1-31/9CODEN: JCOHB3; ISSN:1758-2946. (Chemistry Central Ltd.)We propose a mol. generative model based on the conditional variational autoencoder for de novo mol. design. It is specialized to control multiple mol. properties simultaneously by imposing them on a latent space. As a proof of concept, we demonstrate that it can be used to generate drug-like mols. with five target properties. We were also able to adjust a single property without changing the others and to manipulate it beyond the range of the dataset.
- 62Merk, D.; Grisoni, F.; Friedrich, L.; Schneider, G. Tuning artificial intelligence on the de novo design of natural-product-inspired retinoid X receptor modulators. Commun. Chem. 2018, 1, 68, DOI: 10.1038/s42004-018-0068-1
- 63Merk, D.; Friedrich, L.; Grisoni, F.; Schneider, G. De Novo Design of Bioactive Small Molecules by Artificial Intelligence. Mol. Inf. 2018, 37, 1700153, DOI: 10.1002/minf.201700153
- 64Polykovskiy, D.; Zhebrak, A.; Vetrov, D.; Ivanenkov, Y.; Aladinskiy, V.; Mamoshina, P.; Bozdaganyan, M.; Aliper, A.; Zhavoronkov, A.; Kadurin, A. Entangled Conditional Adversarial Autoencoder for de Novo Drug Discovery. Mol. Pharmaceutics 2018, 15, 4398– 4405, DOI: 10.1021/acs.molpharmaceut.8b00839[ACS Full Text
], [CAS], Google Scholar
64https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXhs1ensL%252FM&md5=58796965d266a5b78ca53c0f11c73999Entangled Conditional Adversarial Autoencoder for de Novo Drug DiscoveryPolykovskiy, Daniil; Zhebrak, Alexander; Vetrov, Dmitry; Ivanenkov, Yan; Aladinskiy, Vladimir; Mamoshina, Polina; Bozdaganyan, Marine; Aliper, Alexander; Zhavoronkov, Alex; Kadurin, ArturMolecular Pharmaceutics (2018), 15 (10), 4398-4405CODEN: MPOHBP; ISSN:1543-8384. (American Chemical Society)Modern computational approaches and machine learning techniques accelerate the invention of new drugs. Generative models can discover novel mol. structures within hours, while conventional drug discovery pipelines require months of work. In this article, we propose a new generative architecture, entangled conditional adversarial autoencoder that generates mol. structures based on various properties, such as activity against a specific protein, soly., or ease of synthesis. We apply the proposed model to generate a novel inhibitor of Janus kinase 3, implicated in rheumatoid arthritis, psoriasis and vitiligo. The discovered mol. was tested in vitro and showed good activity and selectivity. - 65Li, Y.; Vinyals, O.; Dyer, C.; Pascanu, R.; Battaglia, P. Learning Deep Generative Models of Graphs. arXiv (Machine Learning) , March 8, 2018, 1803.03324, ver. 1.Google ScholarThere is no corresponding record for this reference.
- 66Liu, Q.; Allamanis, M.; Brockschmidt, M.; Gaunt, A. Constrained graph variational autoencoders for molecule design. Adv. Neural Inf. Process. Syst. 2018, 7795– 7804Google ScholarThere is no corresponding record for this reference.
- 67Mercado, R.; Rastemo, T.; Lindelof, E.; Klambauer, G.; Engkvist, O.; Chen, H.; Bjerrum, E. J. Graph networks for molecular design. Mach. Learn.: Sci. Technol. 2021, 2, 025023, DOI: 10.1088/2632-2153/abcf91
- 68De Cao, N.; Kipf, T. MolGAN: An implicit generative model for small molecular graphs. arXiv (Machine Learning) , May 30, 2018, 1805.11973, ver. 1.Google ScholarThere is no corresponding record for this reference.
- 69Simonovsky, M.; Komodakis, N. Graphvae: Towards generation of small graphs us- ing variational autoencoders. International Conference on Artificial Neural Networks. 2018, 11139, 412– 422, DOI: 10.1007/978-3-030-01418-6_41
- 70Ma, T.; Chen, J.; Xiao, C. Constrained Generation of Semantically Valid Graphs via Regularizing Variational Autoencoders. Adv. Neural Inf. Process. Syst. 2018, 7113– 7124Google ScholarThere is no corresponding record for this reference.
- 71Hawkins, P. C. D. Conformation Generation: The State of the Art. J. Chem. Inf. Model. 2017, 57, 1747– 1756, DOI: 10.1021/acs.jcim.7b00221[ACS Full Text
], [CAS], Google Scholar
71https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhtFagu7bE&md5=776818e66798d1987e015440a28e208eConformation Generation: The State of the ArtHawkins, Paul C. D.Journal of Chemical Information and Modeling (2017), 57 (8), 1747-1756CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)The generation of conformations for small mols. is a problem of continuing interest in cheminformatics and computational drug discovery. This review will present an overview of methods used to sample conformational space, focusing on those methods designed for org. mols. commonly of interest in drug discovery. Different approaches to both the sampling of conformational space and the scoring of conformational stability will be compared and contrasted, with an emphasis on those methods suitable for conformer sampling of large nos. of drug-like mols. Particular attention will be devoted to the appropriate utilization of information from exptl. solid-state structures in validating and evaluating the performance of these tools. The review will conclude with some areas worthy of further investigation. - 72Skalic, M.; Jiḿenez, J.; Sabbadin, D.; De Fabritiis, G. Shape-Based Generative Mod- eling for de Novo Drug Design. J. Chem. Inf. Model. 2019, 59, 1205– 1214, DOI: 10.1021/acs.jcim.8b00706[ACS Full Text
], [CAS], Google Scholar
72https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXjtVShu78%253D&md5=8eb58a7d5aa780431d285a63dfed5765Shape-based generative modeling for de novo drug designSkalic, Miha; Jimenez, Jose; Sabbadin, Davide; De Fabritiis, GianniJournal of Chemical Information and Modeling (2019), 59 (3), 1205-1214CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)In this work, we propose a machine learning approach to generate novel mols. starting from a seed compd., its three-dimensional (3D) shape, and its pharmacophoric features. The pipeline draws inspiration from generative models used in image anal. and represents a first example of the de novo design of lead-like mols. guided by shape-based features. A variational autoencoder is used to perturb the 3D representation of a compd., followed by a system of convolutional and recurrent neural networks that generate a sequence of SMILES tokens. The generative design of novel scaffolds and functional groups can cover unexplored regions of chem. space that still possess lead-like properties. - 73Gebauer, N.; Gastegger, M.; Schütt, K. T. Symmetry-adapted generation of 3d point sets for the targeted discovery of molecules. NeurIPS . 2019.Google ScholarThere is no corresponding record for this reference.
- 74Ragoza, M.; Masuda, T.; Koes, D. R. Learning a Continuous Representation of 3D Molecular Structures with Deep Generative Models. arXiv (Quantitative Methods) , November 15, 2020, 2010.08687, ver. 3.Google ScholarThere is no corresponding record for this reference.
- 75Preuer, K.; Renz, P.; Unterthiner, T.; Hochreiter, S.; Klambauer, G. Fŕechet ChemNet Distance: A Metric for Generative Models for Molecules in Drug Discovery. J. Chem. Inf. Model. 2018, 58, 1736– 1741, DOI: 10.1021/acs.jcim.8b00234[ACS Full Text
], [CAS], Google Scholar
75https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXhsFejsLrL&md5=dc8e10eb0f85a7f27b48da91dcb21a27Fr´echet ChemNet Distance: A Metric for Generative Models for Molecules in Drug DiscoveryPreuer, Kristina; Renz, Philipp; Unterthiner, Thomas; Hochreiter, Sepp; Klambauer, GuenterJournal of Chemical Information and Modeling (2018), 58 (9), 1736-1741CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)The new wave of successful generative models in machine learning has increased the interest in deep learning driven de novo drug design. However, method comparison is difficult because of various flaws of the currently employed evaluation metrics. The authors propose an evaluation metric for generative models called Fre´chet ChemblNet distance (FCD). The FCD's advantage over previous metrics is that it can detect whether generated mols. are diverse and have similar chem. and biol. properties as real mols. - 76Arús-Pous, J.; Blaschke, T.; Ulander, S.; Reymond, J.-L.; Chen, H.; Engkvist, O. Exploring the GDB-13 chemical space using deep generative models. J. Cheminf. 2019, 11, 1– 14, DOI: 10.1186/s13321-019-0341-z
- 77Brown, N.; Fiscato, M.; Segler, M. H.; Vaucher, A. C. GuacaMol: benchmarking models for de novo molecular design. J. Chem. Inf. Model. 2019, 59, 1096– 1108, DOI: 10.1021/acs.jcim.8b00839[ACS Full Text
], [CAS], Google Scholar
77https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXltVWrsbY%253D&md5=d3fb616b81a4b146cf77950a1c92e4d1GuacaMol: Benchmarking Models for de Novo Molecular DesignBrown, Nathan; Fiscato, Marco; Segler, Marwin H. S.; Vaucher, Alain C.Journal of Chemical Information and Modeling (2019), 59 (3), 1096-1108CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)De novo design seeks to generate mols. with required property profiles by virtual design-make-test cycles. With the emergence of deep learning and neural generative models in many application areas, models for mol. design based on neural networks appeared recently and show promising results. However, the new models have not been profiled on consistent tasks, and comparative studies to well-established algorithms have only seldom been performed. To standardize the assessment of both classical and neural models for de novo mol. design, we propose an evaluation framework, GuacaMol, based on a suite of standardized benchmarks. The benchmark tasks encompass measuring the fidelity of the models to reproduce the property distribution of the training sets, the ability to generate novel mols., the exploration and exploitation of chem. space, and a variety of single and multiobjective optimization tasks. The benchmarking open-source Python code and a leaderboard can be found on https://benevolent.ai/guacamol. - 78Polykovskiy, D.; Zhebrak, A.; Sanchez-Lengeling, B.; Golovanov, S.; Tatanov, O.; Belyaev, S.; Kurbanov, R.; Artamonov, A.; Aladinskiy, V.; Veselov, M. Molecular sets (MOSES): a benchmarking platform for molecular generation models. Front. Pharmacol. 2020, 11, 11, DOI: 10.3389/fphar.2020.565644
- 79Renz, P.; Van Rompaey, D.; Wegner, J. K.; Hochreiter, S.; Klambauer, G. On fail- ure modes in molecule generation and optimization. Drug Discovery Today: Technol. 2019, 32–33, 55– 63, DOI: 10.1016/j.ddtec.2020.09.003[Crossref], [PubMed], [CAS], Google Scholar79https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BB3svkt1yqtg%253D%253D&md5=ca243ca3af904c3fa64f9e0e2c2b2c2cOn failure modes in molecule generation and optimizationRenz Philipp; Hochreiter Sepp; Klambauer Gunter; Van Rompaey Dries; Wegner Jorg KurtDrug discovery today. Technologies (2019), 32-33 (), 55-63 ISSN:.There has been a wave of generative models for molecules triggered by advances in the field of Deep Learning. These generative models are often used to optimize chemical compounds towards particular properties or a desired biological activity. The evaluation of generative models remains challenging and suggested performance metrics or scoring functions often do not cover all relevant aspects of drug design projects. In this work, we highlight some unintended failure modes in molecular generation and optimization and how these evade detection by current performance metrics.
- 80Cieplinski, T.; Danel, T.; Podlewska, S.; Jastrzebski, S. We should at least be able to Design Molecules that Dock Well. arXiv (Biomolecules) December 28, 2020, 2006.16955, ver. 3.Google ScholarThere is no corresponding record for this reference.
- 81Zhang, J.; Mercado, R.; Engkvist, O.; Chen, H. Comparative study of deep generative models on chemical space coverage. ChemRxiv , May 2, 2021, ver. 3. DOI: 10.26434/chemrxiv.13234289.v3 .
- 82Blaschke, T.; Arús-Pous, J.; Chen, H.; Margreitter, C.; Tyrchan, C.; Engkvist, O.; Papadopoulos, K.; Patronov, A. REINVENT 2.0: An AI Tool for De Novo Drug Design. J. Chem. Inf. Model. 2020, 60, 5918, DOI: 10.1021/acs.jcim.0c00915[ACS Full Text
], [CAS], Google Scholar
82https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXitFOnsbbJ&md5=b06445c0516f122adff2e7c82d7ca70cREINVENT 2.0: An AI Tool for De Novo Drug DesignBlaschke, Thomas; Arus-Pous, Josep; Chen, Hongming; Margreitter, Christian; Tyrchan, Christian; Engkvist, Ola; Papadopoulos, Kostas; Patronov, AtanasJournal of Chemical Information and Modeling (2020), 60 (12), 5918-5922CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)In the past few years, we have witnessed a renaissance of the field of mol. de novo drug design. The advancements in deep learning and artificial intelligence (AI) have triggered an avalanche of ideas on how to translate such techniques to a variety of domains including the field of drug design. A range of architectures have been devised to find the optimal way of generating chem. compds. by using either graph- or string (SMILES)-based representations. With this application note, we aim to offer the community a prodn.-ready tool for de novo design, called REINVENT. It can be effectively applied on drug discovery projects that are striving to resolve either exploration or exploitation problems while navigating the chem. space. It can facilitate the idea generation process by bringing to the researcher's attention the most promising compds. REINVENT's code is publicly available at https://github.com/MolecularAI/Reinvent. - 83Bung, N.; Krishnan, S. R.; Bulusu, G.; Roy, A. De novo design of new chemical entities for SARS-CoV-2 using artificial intelligence. Future Med. Chem. 2021, 13, 575, DOI: 10.4155/fmc-2020-0262[Crossref], [PubMed], [CAS], Google Scholar83https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3MXmtlOkurY%253D&md5=4ad8a669ca9c653df25f57c508fdefa0De novo design of new chemical entities for SARS-CoV-2 using artificial intelligenceBung, Navneet; Krishnan, Sowmya R.; Bulusu, Gopalakrishnan; Roy, ArijitFuture Medicinal Chemistry (2021), 13 (6), 575-585CODEN: FMCUA7; ISSN:1756-8919. (Newlands Press Ltd.)The novel coronavirus SARS-CoV-2 has severely affected the health and economy of several countries. Multiple studies are in progress to design novel therapeutics against the potential target proteins in SARS-CoV-2, including 3CL protease, an essential protein for virus replication. In this study we employed deep neural network-based generative and predictive models for de novo design of small mols. capable of inhibiting the 3CL protease. The generative model was optimized using transfer learning and reinforcement learning to focus around the chem. space corresponding to the protease inhibitors. Multiple physicochem. property filters and virtual screening score were used for the final screening. We have identified 33 potential compds. as ideal candidates for further synthesis and testing against SARS-CoV-2.
- 84Li, Y.; Zhang, L.; Liu, Z. Multi-objective de novo drug design with conditional graph generative model. J. Cheminf. 2018, 10, 33, DOI: 10.1186/s13321-018-0287-6[Crossref], [CAS], Google Scholar84https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXmtFWiu7s%253D&md5=6cef99e5a789668f55ba72e4e65da160Multi-objective de novo drug design with conditional graph generative modelLi, Yibo; Zhang, Liangren; Liu, ZhenmingJournal of Cheminformatics (2018), 10 (), 33/1-33/24CODEN: JCOHB3; ISSN:1758-2946. (Chemistry Central Ltd.)Recently, deep generative models have revealed itself as a promising way of performing de novo mol. design. However, previous research has focused mainly on generating SMILES strings instead of mol. graphs. Although available, current graph generative models are are often too general and computationally expensive. In this work, a new de novo mol. design framework is proposed based on a type of sequential graph generators that do not use atom level recurrent units. Compared with previous graph generative models, the proposed method is much more tuned for mol. generation and has been scaled up to cover significantly larger mols. in the ChEMBL database. It is shown that the graph-based model outperforms SMILES based models in a variety of metrics, esp. in the rate of valid outputs. For the application of drug design tasks, conditional graph generative model is employed. This method offers highe flexibility and is suitable for generation based on multiple objectives. The results have demonstrated that this approach can be effectively applied to solve several drug design problems, including the generation of compds. contg. a given scaffold, compds. with specific drug-likeness and synthetic accessibility requirements, as well as dual inhibitors against JNK3 and GSK-3ss.
- 85Blaschke, T.; Engkvist, O.; Bajorath, J.; Chen, H. Memory-assisted reinforcement learning for diverse molecular de novo design. J. Cheminf. 2020, 12, 1– 17, DOI: 10.1186/s13321-020-00473-0
- 86Zhavoronkov, A. Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat. Biotechnol. 2019, 37, 1038– 1040, DOI: 10.1038/s41587-019-0224-x[Crossref], [PubMed], [CAS], Google Scholar86https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXhs12gurnM&md5=b15262b61b9172ab2bc37e534a70f010Deep learning enables rapid identification of potent DDR1 kinase inhibitorsZhavoronkov, Alex; Ivanenkov, Yan A.; Aliper, Alex; Veselov, Mark S.; Aladinskiy, Vladimir A.; Aladinskaya, Anastasiya V.; Terentiev, Victor A.; Polykovskiy, Daniil A.; Kuznetsov, Maksim D.; Asadulaev, Arip; Volkov, Yury; Zholus, Artem; Shayakhmetov, Rim R.; Zhebrak, Alexander; Minaeva, Lidiya I.; Zagribelnyy, Bogdan A.; Lee, Lennart H.; Soll, Richard; Madge, David; Xing, Li; Guo, Tao; Aspuru-Guzik, AlanNature Biotechnology (2019), 37 (9), 1038-1040CODEN: NABIF9; ISSN:1087-0156. (Nature Research)We have developed a deep generative model, generative tensorial reinforcement learning (GENTRL), for de novo small-mol. design. GENTRL optimizes synthetic feasibility, novelty, and biol. activity. We used GENTRL to discover potent inhibitors of discoidin domain receptor 1 (DDR1), a kinase target implicated in fibrosis and other diseases, in 21 days. Four compds. were active in biochem. assays, and two were validated in cell-based assays. One lead candidate was tested and demonstrated favorable pharmacokinetics in mice.
- 87Popova, M.; Shvets, M.; Oliva, J.; Isayev, O. MolecularRNN: Generating real- istic molecular graphs with optimized properties. arXiv (Machine Learning) , May 31, 2019, 1905.13372, ver. 1.Google ScholarThere is no corresponding record for this reference.
- 88Sanchez-Lengeling, B.; Outeiral, C.; Guimaraes, G. L.; Aspuru-Guzik, A. Optimizing distributions over molecular space. An objective-reinforced generative adversarial network for inverse-design chemistry (ORGANIC). ChemRxiv , August 18, 2017, ver. 3. DOI: 10.26434/chemrxiv.5309668.v3 .
- 89Putin, E.; Asadulaev, A.; Vanhaelen, Q.; Ivanenkov, Y.; Aladinskaya, A. V.; Aliper, A.; Zhavoronkov, A. Adversarial Threshold Neural Computer for Molecular de Novo De- sign. Mol. Pharmaceutics 2018, 15, 4386– 4397, DOI: 10.1021/acs.molpharmaceut.7b01137[ACS Full Text
], [CAS], Google Scholar
89https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXlt1Kks7c%253D&md5=a83d9ef283d643d2b8e81f050c4e7be2Adversarial threshold neural computer for molecular de novo designPutin, Evgeny; Asadulaev, Arip; Vanhaelen, Quentin; Ivanenkov, Yan; Aladinskaya, Anastasia V.; Aliper, Alex; Zhavoronkov, AlexMolecular Pharmaceutics (2018), 15 (10), 4386-4397CODEN: MPOHBP; ISSN:1543-8384. (American Chemical Society)In this article, we propose the deep neural network Adversarial Threshold Neural Computer (ATNC). The ATNC model is intended for the de novo design of novel small-mol. org. structures. The model is based on generative adversarial network architecture and reinforcement learning. ATNC uses a Differentiable Neural Computer as a generator and has a new specific block, called adversarial threshold (AT). AT acts as a filter between the agent (generator) and the environment (discriminator + objective reward functions). Furthermore, to generate more diverse mols. we introduce a new objective reward function named Internal Diversity Clustering (IDC). In this work, ATNC is tested and compared with the Org. model. Both models were trained on the SMILES string representation of the mols., using four objective functions (internal similarity, Muegge druglikeness filter, presence or absence of sp3-rich fragments, and IDC). The SMILES representations of 15K druglike mols. from the ChemDiv collection were used as a training data set. For the different functions, ATNC outperforms Org. Combined with the IDC, ATNC generates 72% of valid and 77% of unique SMILES strings, while Org. generates only 7% of valid and 86% of unique SMILES strings. For each set of mols. generated by ATNC and Org., we analyzed distributions of four mol. descriptors (no. of atoms, mol. wt., logP, and tpsa) and calcd. five chem. statistical features (internal diversity, no. of unique heterocycles, no. of clusters, no. of singletons, and no. of compds. that have not been passed through medicinal chem. filters). Anal. of key mol. descriptors and chem. statistical features demonstrated that the mols. generated by ATNC elicited better druglikeness properties. We also performed in vitro validation of the mols. generated by ATNC; results indicated that ATNC is an effective method for producing hit compds. - 90Putin, E.; Asadulaev, A.; Ivanenkov, Y.; Aladinskiy, V.; Sanchez-Lengeling, B.; Aspuru-Guzik, A.; Zhavoronkov, A. Reinforced Adversarial Neural Computer for de Novo Molecular Design. J. Chem. Inf. Model. 2018, 58, 1194– 1204, DOI: 10.1021/acs.jcim.7b00690[ACS Full Text
], [CAS], Google Scholar
90https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXpsVChtrs%253D&md5=b10c44dcadf9fb1afc4e65cc7469730fReinforced Adversarial Neural Computer for de Novo Molecular DesignPutin, Evgeny; Asadulaev, Arip; Ivanenkov, Yan; Aladinskiy, Vladimir; Sanchez-Lengeling, Benjamin; Aspuru-Guzik, Alan; Zhavoronkov, AlexJournal of Chemical Information and Modeling (2018), 58 (6), 1194-1204CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)In silico modeling is a crucial milestone in modern drug design and development. Although computer-aided approaches in this field are well-studied, the application of deep learning methods in this research area is at the beginning. In this work, we present an original deep neural network (DNN) architecture named RANC (Reinforced Adversarial Neural Computer) for the de novo design of novel small-mol. org. structures based on the generative adversarial network (GAN) paradigm and reinforcement learning (RL). As a generator RANC uses a differentiable neural computer (DNC), a category of neural networks, with increased generation capabilities due to the addn. of an explicit memory bank, which can mitigate common problems found in adversarial settings. The comparative results have shown that RANC trained on the SMILES string representation of the mols. outperforms its first DNN-based counterpart Org. by several metrics relevant to drug discovery: the no. of unique structures, passing medicinal chem. filters (MCFs), Muegge criteria, and high QED scores. RANC is able to generate structures that match the distributions of the key chem. features/descriptors (e.g., MW, logP, TPSA) and lengths of the SMILES strings in the training data set. Therefore, RANC can be reasonably regarded as a promising starting point to develop novel mols. with activity against different biol. targets or pathways. In addn., this approach allows scientists to save time and covers a broad chem. space populated with novel and diverse compds. - 91You, J.; Liu, B.; Ying, Z.; Pande, V.; Leskovec, J. Graph convolutional policy network for goal-directed molecular graph generation. Adv. Neural Inf. Process. Syst. 2018, 31, 6410– 6421Google ScholarThere is no corresponding record for this reference.
- 92Karimi, M.; Hasanzadeh, A.; Shen, Y. Network-principled deep generative models for designing drug combinations as graph sets. Bioinformatics 2020, 36, i445– i454, DOI: 10.1093/bioinformatics/btaa317[Crossref], [PubMed], [CAS], Google Scholar92https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXis1ymt7zE&md5=b737e2db6e7e76d0753debee66246ed3Network-principled deep generative models for designing drug combinations as graph setsKarimi, Mostafa; Hasanzadeh, Arman; Shen, YangBioinformatics (2020), 36 (Suppl._1), i445-i454CODEN: BOINFP; ISSN:1367-4811. (Oxford University Press)Motivation: Combination therapy has shown to improve therapeutic efficacy while reducing side effects. Importantly, it has become an indispensable strategy to overcome resistance in antibiotics, antimicrobials and anticancer drugs. Facing enormous chem. space and unclear design principles for small-mol. combinations, computational drug-combination design has not seen generative models to meet its potential to accelerate resistance-overcoming drug combination discovery. Results: We have developed the first deep generative model for drug combination design, by jointly embedding graph-structured domain knowledge and iteratively training a reinforcement learning-based chem. graph-set designer. First, we have developed hierarchical variational graph auto-encoders trained end-to-end to jointly embed gene-gene, gene-disease and disease-disease networks. Novel attentional pooling is introduced here for learning disease representations from assocd. genes' representations. Second, targeting diseases in learned representations, we have recast the drug-combination design problem as graph-set generation and developed a deep learning-based model with novel rewards. Specifically, besides chem. validity rewards, we have introduced novel generative adversarial award, being generalized sliced Wasserstein, for chem. diverse mols. with distributions similar to known drugs. We have also designed a network principle-based reward for disease-specific drug combinations. Numerical results indicate that, compared to state-of-the-art graph embedding methods, hierarchical variational graph auto-encoder learns more informative and generalizable disease representations. Results also show that the deep generative models generate drug combinations following the principle across diseases. Case studies on four diseases show that network-principled drug combinations tend to have low toxicity. The generated drug combinations collectively cover the disease module similar to FDA-approved drug combinations and could potentially suggest novel systems pharmacol. strategies. Ourmethod allows for examg. and following network-based principle or hypothesis to efficiently generate disease-specific drug combinations in a vast chem. combinatorial space.
- 93Griffiths, R.-R.; Hernández-Lobato, J. M. Constrained Bayesian optimization for auto- matic chemical design using variational autoencoders. Chem. Sci. 2020, 11, 577– 586, DOI: 10.1039/C9SC04026A[Crossref], [PubMed], [CAS], Google Scholar93https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXitFOis7bK&md5=628684b606d9d93ccf21b674438acd6aConstrained Bayesian optimization for automatic chemical design using variational autoencodersGriffiths, Ryan-Rhys; Hernandez-Lobato, Jose MiguelChemical Science (2020), 11 (2), 577-586CODEN: CSHCCN; ISSN:2041-6520. (Royal Society of Chemistry)Automatic Chem. Design is a framework for generating novel mols. with optimized properties. The original scheme, featuring Bayesian optimization over the latent space of a variational autoencoder, suffers from the pathol. that it tends to produce invalid mol. structures. First, we demonstrate empirically that this pathol. arises when the Bayesian optimization scheme queries latent space points far away from the data on which the variational autoencoder has been trained. Secondly, by reformulating the search procedure as a constrained Bayesian optimization problem, we show that the effects of this pathol. can be mitigated, yielding marked improvements in the validity of the generated mols. We posit that constrained Bayesian optimization is a good approach for solving this kind of training set mismatch in many generative tasks involving Bayesian optimization over the latent space of a variational autoencoder.
- 94Blaschke, T.; Olivecrona, M.; Engkvist, O.; Bajorath, J.; Chen, H. Application of Generative Autoencoder in De Novo Molecular Design. Mol. Inf. 2018, 37, 1700123, DOI: 10.1002/minf.201700123
- 95Kusner, M. J.; Paige, B.; Hernández-Lobato, J. M. Grammar variational autoencoder. Proc. 34th Int. Conf. Mach. Learn. 2017, 70, 1945– 1954Google ScholarThere is no corresponding record for this reference.
- 96Dai, H.; Tian, Y.; Dai, B.; Skiena, S.; Song, L. Syntax-directed variational autoencoder for structured data. arXiv (Machine Learning) , February 24, 2018, 1802.08786, ver 1.Google ScholarThere is no corresponding record for this reference.
- 97Jin, W.; Barzilay, R.; Jaakkola, T. Junction tree variational autoencoder for molecular graph generation. Proc. 35th Int. Conf. Mach. Learn. 2018, 50, 2323– 2332Google ScholarThere is no corresponding record for this reference.
- 98Samanta, B.; De, A.; Jana, G.; Chattaraj, P. K.; Ganguly, N.; Rodriguez, M. G. NeVAE: A Deep Generative Model for Molecular Graphs. Proceedings of the AAAI Conference on Artificial Intelligence 2019, 33, 1110– 1117, DOI: 10.1609/aaai.v33i01.33011110
- 99Bresson, X.; Laurent, T. A Two-Step Graph Convolutional Decoder for Molecule Generation. arXiv (Machine Learning) , June 15, 2019, 1906.03412, ver 2.Google ScholarThere is no corresponding record for this reference.
- 100Maziarka, L.; Pocha, A.; Kaczmarczyk, J.; Rataj, K.; Danel, T.; Warcho-l, M. Mol- CycleGAN: a generative model for molecular optimization. J. Cheminf. 2020, 12, 2, DOI: 10.1186/s13321-019-0404-1[Crossref], [CAS], Google Scholar100https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXotFarsg%253D%253D&md5=243c1aeef3e517ee774fcfabebf80260Mol-CycleGAN: a generative model for molecular optimizationMaziarka, Lukasz; Pocha, Agnieszka; Kaczmarczyk, Jan; Rataj, Krzysztof; Danel, Tomasz; Warchol, MichalJournal of Cheminformatics (2020), 12 (1), 2CODEN: JCOHB3; ISSN:1758-2946. (SpringerOpen)Designing a mol. with desired properties is one of the biggest challenges in drug development, as it requires optimization of chem. compd. structures with respect to many complex properties. To improve the compd. design process, we introduce Mol-CycleGAN-a CycleGAN-based model that generates optimized compds. with high structural similarity to the original ones. Namely, given a mol. our model generates a structurally similar one with an optimized value of the considered property. We evaluate the performance of the model on selected optimization objectives related to structural properties (presence of halogen groups, no. of arom. rings) and to a physicochem. property (penalized logP). In the task of optimization of penalized logP of drug-like mols. our model significantly outperforms previous results.
- 101Sattarov, B.; Baskin, I. I.; Horvath, D.; Marcou, G.; Bjerrum, E. J.; Varnek, A. De Novo Molecular Design by Combining Deep Deep Autoencoder Recurrent Neural Networks with Generative Topographic Mapping. J. Chem. Inf. Model. 2019, 59, 1182– 1196, DOI: 10.1021/acs.jcim.8b00751[ACS Full Text
], [CAS], Google Scholar
101https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXjtlCisbc%253D&md5=44d2f043cc64e112b2caa7851ed1eb4eDe Novo Molecular Design by Combining Deep Autoencoder Recurrent Neural Networks with Generative Topographic MappingSattarov, Boris; Baskin, Igor I.; Horvath, Dragos; Marcou, Gilles; Bjerrum, Esben Jannik; Varnek, AlexandreJournal of Chemical Information and Modeling (2019), 59 (3), 1182-1196CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Here we show that Generative Topog. Mapping (GTM) can be used to explore the latent space of the SMILES-based autoencoders and generate focused mol. libraries of interest. We have built a sequence-to-sequence neural network with Bidirectional Long Short-Term Memory layers and trained it on the SMILES strings from ChEMBL23. Very high reconstruction rates of the test set mols. were achieved (>98%), which are comparable to the ones reported in related publications. Using GTM, we have visualized the autoencoder latent space on the two-dimensional topog. map. Targeted map zones can be used for generating novel mol. structures by sampling assocd. latent space points and decoding them to SMILES. The sampling method based on a genetic algorithm was introduced to optimize compd. properties "on the fly". The generated focused mol. libraries were shown to contain original and a priori feasible compds. which, pending actual synthesis and testing, showed encouraging behavior in independent structure-based affinity estn. procedures (pharmacophore matching, docking). - 102Winter, R.; Montanari, F.; Steffen, A.; Briem, H.; Nóe, F.; Clevert, D.-A. Efficient multi-objective molecular optimization in a continuous latent space. Chem. Sci. 2019, 10, 8016– 8024, DOI: 10.1039/C9SC01928F[Crossref], [PubMed], [CAS], Google Scholar102https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXhtlaltrjO&md5=c63599dc8a30df8f805ce5457ecc053aEfficient multi-objective molecular optimization in a continuous latent spaceWinter, Robin; Montanari, Floriane; Steffen, Andreas; Briem, Hans; Noe, Frank; Clevert, Djork-ArneChemical Science (2019), 10 (34), 8016-8024CODEN: CSHCCN; ISSN:2041-6520. (Royal Society of Chemistry)One of the main challenges in small mol. drug discovery is finding novel chem. compds. with desirable properties. In this work, we propose a novel method that combines in silico prediction of mol. properties such as biol. activity or pharmacokinetics with an in silico optimization algorithm, namely Particle Swarm Optimization. Our method takes a starting compd. as input and proposes new mols. with more desirable (predicted) properties. It navigates a machine-learned continuous representation of a drug-like chem. space guided by a defined objective function. The objective function combines multiple in silico prediction models, defined desirability ranges and substructure constraints. We demonstrate that our proposed method is able to consistently find more desirable mols. for the studied tasks in relatively short time. We hope that our method can support medicinal chemists in accelerating and improving the lead optimization process.
- 103Chenthamarakshan, V.; Das, P.; Hoffman, C. S.; Strobelt, H.; Padhi, I.; Lim, W. K.; Hoover, B.; Manica, M.; Born, J.; Laino, T.; Mojsilovic, A. CogMol: Target-Specific and Selective Drug Design for COVID-19 Using Deep Generative Models. NeurIPS 2020 2020.Google ScholarThere is no corresponding record for this reference.
- 104Kotsias, P.-C.; Arús-Pous, J.; Chen, H.; Engkvist, O.; Tyrchan, C.; Bjerrum, E. J. Direct steering of de novo molecular generation with descriptor conditional recurrent neural networks. Nature Machine Intelligence 2020, 2, 254– 265, DOI: 10.1038/s42256-020-0174-5
- 105Shayakhmetov, R.; Kuznetsov, M.; Zhebrak, A.; Kadurin, A.; Nikolenko, S.; Aliper, A.; Polykovskiy, D. Molecular Generation for Desired Transcriptome Changes With Ad- versarial Autoencoders. Front. Pharmacol. 2020, 11, 269, DOI: 10.3389/fphar.2020.00269[Crossref], [PubMed], [CAS], Google Scholar105https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXhvFKksbbE&md5=36064731a6cb6395596e418d379f2782Molecular generation for desired transcriptome changes with adversarial autoencodersShayakhmetov, Rim; Kuznetsov, Maksim; Zhebrak, Alexander; Kadurin, Artur; Nikolenko, Sergey; Aliper, Alexander; Polykovskiy, DaniilFrontiers in Pharmacology (2020), 11 (), 00269CODEN: FPRHAU; ISSN:1663-9812. (Frontiers Media S.A.)Gene expression profiles are useful for assessing the efficacy and side effects of drugs. In this paper, we propose a new generative model that infers drug mols. that could induce a desired change in gene expression. Our model-the Bidirectional Adversarial Autoencoder-explicitly separates cellular processes captured in gene expression changes into two feature sets: those related and unrelated to the drug incubation. The model uses related features to produce a drug hypothesis. We have validated our model on the LINCS L1000 dataset by generating mol. structures in the SMILES format for the desired transcriptional response. In the expts., we have shown that the proposed model can generate novel mol. structures that could induce a given gene expression change or predict a gene expression difference after incubation of a given mol. structure.
- 106Ḿendez-Lucio, O.; Baillif, B.; Clevert, D.-A.; Rouquíe, D.; Wichard, J. De novo gener- ation of hit-like molecules from gene expression signatures using artificial intelligence. Nat. Commun. 2020, 11, 1– 10, DOI: 10.1038/s41467-019-13807-w
- 107Born, J.; Manica, M.; Oskooei, A.; Cadow, J.; Rodŕıguez Mart́ınez, M. PaccMannRL: Designing Anticancer Drugs From Transcriptomic Data via Reinforcement Learning. In Research in Computational Molecular Biology; Springer: Cham, 2020; pp 231– 233.
- 108Jin, W.; Yang, K.; Barzilay, R.; Jaakkola, T. Learning Multimodal Graph-to-Graph Translation for Molecular Optimization. arXiv (Machine Learning) , January 28, 2019, 1812.01070, ver. 3.Google ScholarThere is no corresponding record for this reference.
- 109Masuda, T.; Ragoza, M.; Koes, D. R. Generating 3D Molecular Structures Conditional on a Receptor Binding Site with Deep Generative Models. arXiv (Chemical Physics) , November 23, 2020, 2010.14442, ver. 3.Google ScholarThere is no corresponding record for this reference.
- 110Kang, S.; Cho, K. Conditional Molecular Design with Deep Generative Models. J. Chem. Inf. Model. 2019, 59, 43– 52, DOI: 10.1021/acs.jcim.8b00263[ACS Full Text
], [CAS], Google Scholar
110https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXhtlantb3N&md5=d2c3a3ff1f2189698828775e89c2a885Conditional Molecular Design with Deep Generative ModelsKang, Seokho; Cho, KyunghyunJournal of Chemical Information and Modeling (2019), 59 (1), 43-52CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Although machine learning has been successfully used to propose novel mols. that satisfy desired properties, it is still challenging to explore a large chem. space efficiently. In this paper, we present a conditional mol. design method that facilitates generating new mols. with desired properties. The proposed model, which simultaneously performs both property prediction and mol. generation, is built as a semisupervised variational autoencoder trained on a set of existing mols. with only a partial annotation. We generate new mols. with desired properties by sampling from the generative distribution estd. by the model. We demonstrate the effectiveness of the proposed model by evaluating it on drug-like mols. The model improves the performance of property prediction by exploiting unlabeled mols. and efficiently generates novel mols. fulfilling various target conditions. - 111Lim, J.; Hwang, S.-Y.; Moon, S.; Kim, S.; Kim, W. Y. Scaffold-based molecular design with a graph generative model. Chem. Sci. 2020, 11, 1153– 1164, DOI: 10.1039/C9SC04503A[Crossref], [CAS], Google Scholar111https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXit1Ortr%252FO&md5=be36a65abcce15f18b4c1e529bffd905Scaffold-based molecular design with a graph generative modelLim, Jaechang; Hwang, Sang-Yeon; Moon, Seokhyun; Kim, Seungsu; Kim, Woo YounChemical Science (2020), 11 (4), 1153-1164CODEN: CSHCCN; ISSN:2041-6520. (Royal Society of Chemistry)Searching for new mols. in areas like drug discovery often starts from the core structures of known mols. Such a method has called for a strategy of designing deriv. compds. retaining a particular scaffold as a substructure. On this account, our present work proposes a graph generative model that targets its use in scaffold-based mol. design. Our model accepts a mol. scaffold as input and extends it by sequentially adding atoms and bonds. The generated mols. are then guaranteed to contain the scaffold with certainty, and their properties can be controlled by conditioning the generation process on desired properties. The learned rule of extending mols. can well generalize to arbitrary kinds of scaffolds, including those unseen during learning. In the conditional generation of mols., our model can simultaneously control multiple chem. properties despite the search space constrained by fixing the substructure. As a demonstration, we applied our model to designing inhibitors of the epidermal growth factor receptor and show that our model can employ a simple semi-supervised extension to broaden its applicability to situations where only a small amt. of data is available.
- 112Varnek, A., Ed. Tutorials in chemoinformatics; John Wiley & Sons, Inc: Hoboken, NJ, 2017.
- 113Engel, T., Gasteiger, J., Eds. Applied chemoinformatics: achievements and future opportunities; Wiley-VCH: Weinheim, 2018; OCLC: 1034693178.
- 114Kadurin, A.; Aliper, A.; Kazennov, A.; Mamoshina, P.; Vanhaelen, Q.; Khrabrov, K.; Zhavoronkov, A. The cornucopia of meaningful leads: Applying deep adversarial au- toencoders for new molecule development in oncology. Oncotarget 2017, 8, 10883– 10890, DOI: 10.18632/oncotarget.14073[Crossref], [PubMed], [CAS], Google Scholar114https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC1c%252FpvFKruw%253D%253D&md5=677ef0264494eb8a7ef8c6584c1202abThe cornucopia of meaningful leads: Applying deep adversarial autoencoders for new molecule development in oncologyKadurin Artur; Khrabrov Kuzma; Kadurin Artur; Aliper Alexander; Kazennov Andrey; Mamoshina Polina; Vanhaelen Quentin; Zhavoronkov Alex; Kadurin Artur; Kadurin Artur; Kazennov Andrey; Zhavoronkov Alex; Mamoshina Polina; Zhavoronkov AlexOncotarget (2017), 8 (7), 10883-10890 ISSN:.Recent advances in deep learning and specifically in generative adversarial networks have demonstrated surprising results in generating new images and videos upon request even using natural language as input. In this paper we present the first application of generative adversarial autoencoders (AAE) for generating novel molecular fingerprints with a defined set of parameters. We developed a 7-layer AAE architecture with the latent middle layer serving as a discriminator. As an input and output the AAE uses a vector of binary fingerprints and concentration of the molecule. In the latent layer we also introduced a neuron responsible for growth inhibition percentage, which when negative indicates the reduction in the number of tumor cells after the treatment. To train the AAE we used the NCI-60 cell line assay data for 6252 compounds profiled on MCF-7 cell line. The output of the AAE was used to screen 72 million compounds in PubChem and select candidate molecules with potential anti-cancer properties. This approach is a proof of concept of an artificially-intelligent drug discovery engine, where AAEs are used to generate new molecular fingerprints with the desired molecular properties.
- 115Alpaydin, E. Introduction to machine learning, 2nd ed.; Adaptive computation and machine learning; MIT Press: Cambridge, Mass, 2010; OCLC: ocn317698631.Google ScholarThere is no corresponding record for this reference.
- 116Raschka, S. Python machine learning: unlock deeper insights into machine learning with this vital guide to cutting-edge predictive analytics; Community experience distilled; Packt Publishing Open Source: Birmingham, UK; Mumbai, 2016.Google ScholarThere is no corresponding record for this reference.
- 117Frazier, P. I. A Tutorial on Bayesian Optimization. arXiv (Machine Learning) , July 8, 2018, 1807.02811, ver. 1.Google ScholarThere is no corresponding record for this reference.
- 118Shahriari, B.; Swersky, K.; Wang, Z.; Adams, R. P.; De Freitas, N. Taking the human out of the loop: A review of Bayesian optimization. Proc. IEEE 2016, 104, 148– 175, DOI: 10.1109/JPROC.2015.2494218
- 119Das, P.; Sercu, T.; Wadhawan, K.; Padhi, I.; Gehrmann, S.; Cipcigan, F.; Chen- thamarakshan, V.; Strobelt, H.; Santos, C. D.; Chen, P.-Y.; Yang, Y. Y.; Tan, J.; Hedrick, J.; Crain, J.; Mojsilovic, A. Accelerating antimicrobial discovery with controllable deep generative models and molecular dynamics. arXiv (Machine Learning) , February 26, 2020, 2005.11248, ver. 2.Google ScholarThere is no corresponding record for this reference.
- 120Kingma, D. P.; Mohamed, S.; Rezende, D. J.; Welling, M. Semi-supervised learning with deep generative models. Adv. Neural Inf. Process. Syst. 2014, 3581– 3589Google ScholarThere is no corresponding record for this reference.
- 121Gao, W.; Coley, C. W. The synthesizability of molecules proposed by generative mod- els. J. Chem. Inf. Model. 2020, 60, 5714– 5723, DOI: 10.1021/acs.jcim.0c00174[ACS Full Text
], [CAS], Google Scholar
121https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXmsFKrurw%253D&md5=5f8ff43137b0489f41b5c58518bec270The Synthesizability of Molecules Proposed by Generative ModelsGao, Wenhao; Coley, Connor W.Journal of Chemical Information and Modeling (2020), 60 (12), 5714-5723CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)The discovery of functional mols. is an expensive and time-consuming process, exemplified by the rising costs of small mol. therapeutic discovery. One class of techniques of growing interest for early stage drug discovery is de novo mol. generation and optimization, catalyzed by the development of new deep learning approaches. These techniques can suggest novel mol. structures intended to maximize a multiobjective function, e.g., suitability as a therapeutic against a particular target, without relying on brute-force exploration of a chem. space. However, the utility of these approaches is stymied by ignorance of synthesizability. To highlight the severity of this issue, we use a data-driven computer-aided synthesis planning program to quantify how often mols. proposed by state-of-the-art generative models cannot be readily synthesized. Our anal. demonstrates that there are several tasks for which these models generate unrealistic mol. structures despite performing well on popular quant. benchmarks. Synthetic complexity heuristics can successfully bias generation toward synthetically tractable chem. space, although doing so necessarily detracts from the primary objective. This anal. suggests that to improve the utility of these models in real discovery workflows, new algorithm development is warranted. - 122Horwood, J.; Noutahi, E. Molecular Design in Synthetically Accessible Chemical Space via Deep Reinforcement Learning. ACS Omega 2020, 5, 32984– 32994, DOI: 10.1021/acsomega.0c04153[ACS Full Text
], [CAS], Google Scholar
122https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXisFOmtL3P&md5=1037d12603e7a52332c11e66e8885838Molecular Design in Synthetically Accessible Chemical Space via Deep Reinforcement LearningHorwood, Julien; Noutahi, EmmanuelACS Omega (2020), 5 (51), 32984-32994CODEN: ACSODF; ISSN:2470-1343. (American Chemical Society)The fundamental goal of generative drug design is to propose optimized mols. that meet predefined activity, selectivity, and pharmacokinetic criteria. Despite recent progress, we argue that existing generative methods are limited in their ability to favorably shift the distributions of mol. properties during optimization. We instead propose a novel Reinforcement Learning framework for mol. design in which an agent learns to directly optimize through a space of synthetically accessible drug-like mols. This becomes possible by defining transitions in our Markov decision process as chem. reactions and allows us to leverage synthetic routes as an inductive bias. We validate our method by demonstrating that it outperforms existing state-of-the-art approaches in the optimization of pharmacol. relevant objectives, while results on multi-objective optimization tasks suggest increased scalability to realistic pharmaceutical design problems. - 123Gottipati, S. K.; Sattarov, B.; Niu, S.; Pathak, Y.; Wei, H.; Liu, S.; Blackburn, S.; Thomas, K.; Coley, C.; Tang, J. Learning to navigate the synthetically accessible chemical space using reinforcement learning. Int. Conf. Mach. Learn. 2020, 3668– 3679Google ScholarThere is no corresponding record for this reference.
- 124Bradshaw, J.; Paige, B.; Kusner, M. J.; Segler, M.; Hernández-Lobato, J. M. Barking up the right tree: an approach to search over molecule synthesis DAGs. Adv. Neural Inf. Process. Syst. 2020, 6852– 6866Google ScholarThere is no corresponding record for this reference.
- 125Imrie, F.; Bradley, A. R.; van der Schaar, M.; Deane, C. M. Deep generative models for 3d linker design. J. Chem. Inf. Model. 2020, 60, 1983– 1995, DOI: 10.1021/acs.jcim.9b01120[ACS Full Text
], [CAS], Google Scholar
125https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXltlenurs%253D&md5=430c929761c5020d1013538e028ba1bcDeep Generative Models for 3D Linker DesignImrie, Fergus; Bradley, Anthony R.; van der Schaar, Mihaela; Deane, Charlotte M.Journal of Chemical Information and Modeling (2020), 60 (4), 1983-1995CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Rational compd. design remains a challenging problem for both computational methods and medicinal chemists. Computational generative methods have begun to show promising results for the design problem. However, they have not yet used the power of three-dimensional (3D) structural information. We have developed a novel graph-based deep generative model that combines state-of-the-art machine learning techniques with structural knowledge. Our method ("DeLinker") takes two fragments or partial structures and designs a mol. incorporating both. The generation process is protein-context-dependent, utilizing the relative distance and orientation between the partial structures. This 3D information is vital to successful compd. design, and we demonstrate its impact on the generation process and the limitations of omitting such information. In a large-scale evaluation, DeLinker designed 60% more mols. with high 3D similarity to the original mol. than a database baseline. When considering the more relevant problem of longer linkers with at least five atoms, the outperformance increased to 200%. We demonstrate the effectiveness and applicability of this approach on a diverse range of design problems: fragment linking, scaffold hopping, and proteolysis targeting chimera (PROTAC) design. As far as we are aware, this is the first mol. generative model to incorporate 3D structural information directly in the design process. The code is available at https://github.com/oxpig/DeLinker. - 126Yang, Y.; Zheng, S.; Su, S.; Zhao, C.; Xu, J.; Chen, H. SyntaLinker: automatic fragment linking with deep conditional transformer neural networks. Chem. Sci. 2020, 11, 8312– 8322, DOI: 10.1039/D0SC03126G[Crossref], [PubMed], [CAS], Google Scholar126https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXhsVagsrvK&md5=403cc7a09f96ecbe27afa3ce6bee51a2SyntaLinker: automatic fragment linking with deep conditional transformer neural networksYang, Yuyao; Zheng, Shuangjia; Su, Shimin; Zhao, Chao; Xu, Jun; Chen, HongmingChemical Science (2020), 11 (31), 8312-8322CODEN: CSHCCN; ISSN:2041-6520. (Royal Society of Chemistry)Linking fragments to generate a focused compd. library for a specific drug target is one of the challenges in fragment-based drug design (FBDD). Hereby, we propose a new program named SyntaLinker, which is based on a syntactic pattern recognition approach using deep conditional transformer neural networks. This state-of-the-art transformer can link mol. fragments automatically by learning from the knowledge of structures in medicinal chem. databases (e.g.ChEMBL database). Conventionally, linking mol. fragments was viewed as connecting substructures that were predefined by empirical rules. In SyntaLinker, however, the rules of linking fragments can be learned implicitly from known chem. structures by recognizing syntactic patterns embedded in SMILES notations. With deep conditional transformer neural networks, SyntaLinker can generate mol. structures based on a given pair of fragments and addnl. restrictions. Case studies have demonstrated the advantages and usefulness of SyntaLinker in FBDD.
- 127Tan, X. Automated design and optimization of multitarget schizophrenia drug candidates by deep learning. Eur. J. Med. Chem. 2020, 204, 112572, DOI: 10.1016/j.ejmech.2020.112572[Crossref], [PubMed], [CAS], Google Scholar127https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXhsVCksbnI&md5=dd01837fd3bdbb488e8781cea7d5103bAutomated design and optimization of multitarget schizophrenia drug candidates by deep learningTan, Xiaoqin; Jiang, Xiangrui; He, Yang; Zhong, Feisheng; Li, Xutong; Xiong, Zhaoping; Li, Zhaojun; Liu, Xiaohong; Cui, Chen; Zhao, Qingjie; Xie, Yuanchao; Yang, Feipu; Wu, Chunhui; Shen, Jingshan; Zheng, Mingyue; Wang, Zhen; Jiang, HualiangEuropean Journal of Medicinal Chemistry (2020), 204 (), 112572CODEN: EJMCA5; ISSN:0223-5234. (Elsevier Masson SAS)Complex neuropsychiatric diseases such as schizophrenia require drugs that can target multiple G protein-coupled receptors (GPCRs) to modulate complex neuropsychiatric functions. Here, we report an automated system comprising a deep recurrent neural network (RNN) and a multitask deep neural network (MTDNN) to design and optimize multitarget antipsychotic drugs. The system has successfully generated novel mol. structures with desired multiple target activities, among which high-ranking compd. 3 was synthesized, and demonstrated potent activities against dopamine D2, serotonin 5-HT1A and 5-HT2A receptors. Hit expansion based on the MTDNN was performed, 6 analogs of compd. 3 were evaluated exptl., among which compd. 8 not only exhibited specific polypharmacol. profiles but also showed antipsychotic effect in animal models with low potential for sedation and catalepsy, highlighting their suitability for further preclin. studies. The approach can be an efficient tool for designing lead compds. with multitarget profiles to achieve the desired efficacy in the treatment of complex neuropsychiatric diseases.
- 128Yang, Y.; Zhang, R.; Li, Z.; Mei, L.; Wan, S.; Ding, H.; Chen, Z.; Xing, J.; Feng, H.; Han, J.; Jiang, H.; Zheng, M.; Luo, C.; Zhou, B. Discovery of Highly Potent, Selec- tive, and Orally Efficacious p300/CBP Histone Acetyltransferases Inhibitors. J. Med. Chem. 2020, 63, 1337– 1360, DOI: 10.1021/acs.jmedchem.9b01721