A Robotics-Inspired Screening Algorithm for Molecular Caging Prediction

We define a molecular caging complex as a pair of molecules in which one molecule (the “host” or “cage”) possesses a cavity that can encapsulate the other molecule (the “guest”) and prevent it from escaping. Molecular caging complexes can be useful in applications such as molecular shape sorting, drug delivery, and molecular immobilization in materials science, to name just a few. However, the design and computational discovery of new caging complexes is a challenging task, as it is hard to predict whether one molecule can encapsulate another because their shapes can be quite complex. In this paper, we propose a computational screening method that predicts whether a given pair of molecules form a caging complex. Our method is based on a caging verification algorithm that was designed by our group for applications in robotic manipulation. We tested our algorithm on three pairs of molecules that were previously described in a pioneering work on molecular caging complexes and found that our results are fully consistent with the previously reported ones. Furthermore, we performed a screening experiment on a data set consisting of 46 hosts and four guests and used our algorithm to predict which pairs are likely to form caging complexes. Our method is computationally efficient and can be integrated into a screening pipeline to complement experimental techniques.

1 Implementation details Figure S1: Preparation of input geometries for the algorithm from a publicly available CCDC database entry.

Proposed algorithm
The typical workflow for input preparation in our algorithm is shown in Figure S1. The obtained models in XYZR format (each line contains X, Y , Z coordinates of the ball center and its radius) can be directly used in the algorithm. In addition to two molecular models, the algorithm requires a discretization of the SO(3) orientation space [1,2] which defines the precision (ε-core) and computational time of the algorithm: Input: models of the host and guest (*.xyzr); discretization of SO(3) orientation space (set of intervals).
Output: number of bounded connected components of the free space of the guest.
In our experiments, we always use the same discretization of SO(3), which was pre-computed using the algorithm proposed by Yershova et al. [1]. It is a uniform grid over SO(3) that consists of 36,864 points; our configuration space approximation is thus based on 36,864 orientation slices.
If the algorithm reports 0 bounded connected components (BCCs), then the free space of the guest consists of only one unbounded connected component meaning that the guest is not caged by the host. Otherwise, a positive number of BCCs means that the guest is caged. Moreover, the existence of several BCCs indicates that the motion of the guest inside the host cavity is restricted.
For experimental purposes, algorithms were implemented in C++11, with the aid of the Computational Geometry Algorithms Library (CGAL) [3].

pyWINDOW
The open-source Python package pyWINDOW (whose version 0.0.2 is publicly available at https://github.com/ marcinmiklitz/pywindow) was used for the determination of pore-limiting diameters (PLDs) of single molecules [4]. The primary method calculate windows() was used with parameters processes=8 (parallelization on 8 CPUs) and adjust=N, where N = 1, 5, 25 for 250, 1250, and 6250 sample points on a sampling sphere, respectively. The best approximation of each window diameter was calculated as the minimum of the values obtained using N = 1, 5, 25 and various orientations of the input geometry. Calculations of circumcircle radii were performed using the built-in method circumcircle() with manually selected atom sets.

Zeo++
The open-source C++ package Zeo++ (whose version 0.3 is publicly available at http://www.zeoplusplus.org/ download.html upon registration) was used to determine pore diameters in crystal structures. Crystallographic Information Files (CIF) from corresponding CCDC entries (see Fig. S2) were used as input. In particular, the routine network with parameter -ha for improved accuracy was invoked.

Algorithm runtimes
All computations were performed on a desktop computer with Intel Core i7-4790K CPU and 32 GB RAM. . Running the algorithm with N = 250 did not produce correct results in many cases. Therefore increasing N up to 6250 was required for the comparison with our algorithm. b Algorithms like pyWINDOW do not allow detecting caging complexes directly; they require the determination of PLD followed by a comparison with the diameter of a spherical guest; therefore runtimes are the same for both experiments.  Figure S2: Hosts selected for PLD determination. Each entry presents a structure, an abbreviation given in the reference paper (CCC was used in lieu of abbreviation for chiral covalent cage [5]), and the identifier of the corresponding CCDC database entry. Each structure is accompanied by a green sphere with radius equal to PLD. Figure S3: Hosts selected for the screening experiment. Each entry presents a structure, an abbreviation given in the reference paper (number replaced with X if the name was missing [6,7]), and the identifier of the corresponding CCDC database entry. Detailed structures as well as references to the first reports of host molecules can be found in the work by Miklitz et al. [8] 3.2 Guests Figure S4: Guests used in the experiments. Each entry presents a structure, an abbreviation given in the reference paper or present work, and the identifier of the corresponding CCDC database entry.

Conformations
CC3 conformations were obtained from the published molecular dynamics trajectory [4]. Molecular dynamics trajectories for the guest molecules were generated using the Gromacs software [9]. Gromacs input was prepared using ACPYPE [10] and AMBER force field [11]. Simulations were run in the NVT ensemble using the modified Berendsen thermostat [12] set at temperature 300 K. 1 fs step was used in all simulations, with equilibration run of 10 ps, followed by a production run of 10 ns (snapshot taken every 1 ps). Root mean square displacement of atoms between conformations was calculated either with the built-in Gromacs RMSD routine gmx rms, or with the open-source RMSD utility (publicly available at https://github.com/charnley/rmsd). RMSD values for all guests did not exceed 0.05Å (maximum value for IB is 0.05Å); the RMSD value for CC3 was determined to be 0.25Å. Figure S5: Results of the screening of 184 host-guest pairs: s -strong caging complex, w -weak caging complex, n -not a caging complex. FB -fluorobenzene, CB -chlorobenzene, BB -bromobenzene, IB -iodobenzene.