Molecular Perception for Visualization and Computation: The Proxima Library

Proxima is a molecular perception library designed with a double purpose: to be used with immersive molecular viewers (thus providing any required feature not supported by third party libraries) and to be integrated in workflow managers thus providing the functionalities needed for the first steps of molecular modeling studies. It thus stands at the boundary between visualization and computation. The purpose of the present article is to provide a general introduction to the first release of Proxima, describe its most significant features, and highlight its performance by means of some case studies. The current version of Proxima is available for evaluation purposes at https://bitbucket.org/sns-smartlab/proxima/src/master/.

something deeply related to biological systems and/or proteins. Fragments are more generally seen as chemically distinguishable portions of a molecular system. Being "chemically distinguishable" means one of two different things: • These are two disconnected component of the system (there are no covalent bonds between these two portions).
• These two portions are made of different types of residues.
As an example, two not bonded chains are seen as different fragments even if they share the same chain identifier in the pdb file. Furthermore, if two or more polynucleotides and polypeptides are bonded together and they share the same chain identifier in the PDB file, these fragments are perceived by Proxima as different entities. Deep inspection of additional residue types will be considered in future developments.

Z-Matrix
Proxima is capable of creating a so-called Z-Matrix [2] file as an output for the Molecular Systems loaded by PDB or XYZ. This feature is important in making Proxima compatible with several computational chemistry software. The computation of the Z-Matrix in Proxima is performed by a Breadth First Search (BFS) along the molecule graph, while computing torsions and angles. 1 .

Mol and Mol2 file formats
Proxima is also capable of creating mol and mol2 files [3] from molecular systems. This has proven usefull in interfacing Proxima with common Molecular Viewers. As an example, JMol [4] reads the mol file format but without performing an explicit bonds perception on the input system. The mol2 file is also required by the HTEQ software [5] that computes torsional parameters for biaryl systems.

Chemical Ring Perception
Here are shown some benchmarks for the rings perception algorithm performed on peptides made of increasing numbers of histidine residues. In particular, in Fig. S1 are reported the time of execution for the ring perception with and without the block partitioning scheme.

Parallelizing Ring Computation
A further optimization performed on our software has been to parallelize the code in threads. To do this, we used the OpenMP set of compiler directives which is supported by several free and commercial compilers (e. g. GCC, LLVM, VisualStudio). It is worth noting that while "critical" sections limit parallelism, a decent speedup on a few cores is enough for most applications. The times of execution are shown in Fig. S2. All of the benchmarks in this work have been performed on a machine equipped with a 64-bit Intel Core i7-6700 CPU (featuring 4 cores with a base clock frequency of 3.40GHz and Hyper-Threading technology) and 32GB of RAM memory.

Caffeine
In the following, we show some examples of Molecular Electrostatic Potential (MEP) surfaces computed by Proxima and visualized with the Caffeine software. In Fig. S4, the MEP iso-surfaces associated to a vanishing electrostatic potential value are shown for the benzene molecule. In Fig. S3, instead, the corresponding MEP iso-surface for the glycine molecule is shown. The green  surface is the one computed using the Gasteiger charges while the blue one is the one computed with the QeQ charges. It is noteworthy that the Gasteiger method tends to polarize more the individual atoms thus resulting in a more corrugated surface.

Hybridisation
The following rules are used for computing the hybridisation: sd number of bonds < 3 AND atomic number ≥ 21 AND max angle < 90 + tolerance AND min angle > 90-tolerance sp number of bonds < 3 AND (max angle + min angle)/2 < 180 + tolerance AND (max angle + min angle)/2 > 180-tolerance sd 2 number of bonds < 4 AND atomic number ≥ 21 AND max angle <90 + tolerance AND min angle > 90 -tolerance sp 2 number of bonds < 4 AND (max angle + min angle)/2 < (120+tolerance*0.55) AND (max angle + min angle)/2 > (120-tolerance*0.55) sd 3 number of bonds < 5 AND atomic number ≥ 21 AND max angle < 109.5 + tolerance AND min angle > 109.5 -tolerance sp 3 number of bonds < 5 AND max angle < 109.5 + tolerance AND min angle > 109.5 -tolerance sp 2 d 2 number of bonds < 5 AND atomic number ≥ 21 AND max angle < 90 + tolerance AND min angle > 90 -tolerance sp 3 d number of bonds < 6 AND atomic number ≥ 21 AND max angle < 180 + tolerance AND min angle > 90 -tolerance AND there is at least a 120 • angle. sd 4 number of bonds < 7 AND atomic number ≥ 21 AND max angle < 123 + tolerance AND min angle > 73 -tolerance sd 5 number of bonds < 7 AND atomic number ≥ 21 AND max angle < 116.5 + tolerance AND min angle > 63.5 -tolerance sp 3 d 2 number of bonds < 7 AND atomic number ≥ 21 AND max angle < 180 + tolerance AND min angle > 90 -tolerance sp 3 d 3 number of bonds < 8 AND atomic number ≥ 21 AND max angle < 180 + tolerance AND min angle > 72 -tolerance AND there is at least a 120 • angle. Figure S4: The MEP surface for the Benzene molecule computed with Proxima. The surface is shown in the C.A.V.E. system at the SMART laboratory of Scuola Normale Superiore [6] through the Caffeine software. [7] In the corner, the same MEP surface from a different point of view, rendered with the desktop version of the Caffeine software [7].
Here, a tolerance of 10 degrees is taken for atomic numbers below 21, and of 20 degrees otherwise. Typically, the hybridisation of terminal atoms is left unknown. There are some special cases, though, that are treated differently, such as the oxygen, the nitrogen and the carbon atoms. With regard to oxygen and nitrogen terminal atoms, the hybridisation is set to sp 2 whether they are bonded to other sp 2 atoms, sp 3 otherwise. This is reasonable since they have lone pairs and can contribute in resonance structures. With regard to the carbon terminal atoms, instead, the hybridisation of the other bonded atom is checked: If this atom has an sp 2 hybridisation, and it is not bonded to other sp 2 atoms, the hybridisation of the current terminal carbon atom is set to sp 2 . Otherwise, if the other atom still has an hybridisation of sp 2 but is also bonded to other sp 2 atoms, an sp 3 hybridisation is assigned to the considered terminal atom. The same reasoning applies to the sp case.

PyProxima
PyProxima is a Python interface aimed at the further diffusion of the Proxima library by allowing non-C++ users to take advantage of its features. This Python module has been generated with the use of the Cython software. Most of the original methods implemented in Proxima are currently included in PyProxima and more will come in future updates. There are no external dependecies except for OpenMP (required for the parallel version of the code) and for the Eigen library (e.g. used for linear algebra calculations). To install the software library these files are required: • The Proxima C++ source code

ProximaConsole
ProximaConsole is a console application designed for using Proxima to compute bonds, hydrogen bonds, charges and molecular rings. ProximaConsole can be run with the following command: An input file is given to the executable (pdb or xyz files are accepted as inputs) and a mol and a JSON files are generated as output. The mol file contains the geometry with the bonds computed by Proxima. The JSON file, instead, contains the additional computed rings, charges and hydrogen bonds. In order to trigger such computations additional flags can be given to ProximaConsole such as the one shown in S5.
An example of the generated JSon file is given in Fig. S6. All of the computed quantities are stored in the "vms_proxima" object. The "data_mol" object is the molecular systems, with the covalent bonds computed by Proxima, stored in the mol file format. Instead, the "vms_charges" array stores the charges of the atoms (in the same order of the atom indexes). The "vms_f rags" array stores fragments such as the rings: each fragment is an object described by a name (the "f r_name" value) -cycles This flag is used for computing rings with the Horton's algorithm.
-gasteiger This flag is used for computing charges with the Gasteiger method.
-fq This flag is used for computing charges with the FQ method.
-hbonds This flag is used for computing hydrogen bonds.
-mep This flag is used for computing the electrostatic potential and output it as a cube file.
-spin This flag is followed by the total electron spin for the given system.
-charge This flags is followed by the total charge of the given system. Figure S5: ProximaConsole additional flags and the set of indexes of the atoms that take part of the fragment (the "f r_index" array). Finally, the "vms_hbonds" array stores hydrogen bonds. Each hydrogen bond is an object described by a "hb_f orce" value that identifies the strenght of the hydrogen bond and, again, an array of indexes ("hb_atoms") of the atoms that take part of the hydrogen bonds in the order [donor, hydrogen, acceptor]. The ProximaConsole software is a compiled software available for Windows, Linux and macOS under request.

Complementary Tools
The PyProxima and ProximaConsole tools allow users to employ directly the library as preprocessing tool (in a way akin to e.g. Antechamber [8] or to import and use it in a Python based enviroment or pipeline. To further increase the flexibility, we devised an interactive GUI for non-expert users. This is called ProximaGUI and is based on the widespread JMol software [4] for the visualization of chemical structures. ProximaGUI is a compiled software avialable for Windows, Linux and macOS under request.