Open-Source, Python-Based Redevelopment of the ChemShell Multiscale QM/MM Environment

: ChemShell is a scriptable computational chemistry environment with an emphasis on multiscale simulation of complex systems using combined quantum mechanical and molecular mechanical (QM/MM) methods. Motivated by a scienti ﬁ c need to e ﬃ ciently and accurately model chemical reactions on surfaces and within microporous solids on massively parallel computing systems, we present a major redevelopment of the ChemShell code, which provides a modern platform for advanced QM/MM embedding models. The new version of ChemShell has been re-engineered from the ground up with a new QM/ MM driver module, an improved parallelization framework, new interfaces to high performance QM and MM programs, and a user interface written in the Python programming language. The redeveloped package is capable of performing QM/MM calculations on systems of signi ﬁ cantly increased size, which we illustrate with benchmarks on zirconium dioxide nanoparticles of over 160000 atoms.

Thi s v e r sio n is b ei n g m a d e a v ail a bl e in a c c o r d a n c e wit h p u blis h e r p olici e s. S e e h t t p://o r c a . cf. a c. u k/ p olici e s. h t ml fo r u s a g e p olici e s. Co py ri g h t a n d m o r al ri g h t s fo r p u blic a tio n s m a d e a v ail a bl e in ORCA a r e r e t ai n e d by t h e c o py ri g h t h ol d e r s .

INTRODUCTION
Combined quantum mechanical/molecular mechanical (QM/ MM) calculations are an efficient means of calculating localized chemistry in complex molecular and solid systems.
In the QM/MM approach, a region of interest in the system is identified where an electronic structure description is required, for example the site of a reaction or a defect. A quantum mechanical calculation is performed on this region, while the environment around it is described with a classical molecular mechanics force field. This approach ensures that local chemical processes are modeled with sufficient accuracy, while avoiding the expense of treating the whole system quantum mechanically. ChemShell is a general-purpose scriptable computational chemistry environment 1,2 with an emphasis on QM/MM simulations. ChemShell implements a flexible, modular approach to QM/MM, where a variety of QM and MM programs (either built in as libraries or called through external interfaces) may be used to evaluate energies and gradients of the quantum and classical regions, while ChemShell takes on the task of coupling the results to obtain the combined QM/ MM energy and gradient, including an appropriate treatment of the boundary region that couples the two subsystems. Highlevel tasks such as geometry optimization and molecular dynamics are also handled at the ChemShell level, ensuring that these are carried out in a consistent way regardless of the choice of energy evaluator.
Previously, ChemShell has been used to model a wide range of chemical systems using QM/MM methods, particularly in the fields of biomolecular and materials modeling. In biochemistry, QM/MM methods are well-suited to simulating enzymatic reaction mechanisms 3 and ChemShell has been used to study many important enzymatic processes, with recent investigations targeting oxidoreductases such as flavin-containing amine oxidases, 4 lignin peroxidase, 5 multicopper oxidases, 6 [Fe] hydrogenase, 7 Ni−Fe carbon monoxide dehydrogenases, 8 iron-containing dioxygenases, 9−13 and iron-containing halogenases; 14,15 transferases such as protein kinases 16 and glycosyltransferases; 17 hydrolases such as the lipase family 18 and matrix metalloproteinases; 19,20 lyases such as the radical SAM superfamily; 21 and ligases such as carbapenem synthase. 22 ChemShell continues to be particularly heavily used for studies of the cytochrome P450 superfamily of enzymes owing to their importance in metabolic processes, 23 with recent studies investigating reactivity networks, 24 active site chemoselectivity, 25 regio-and enantioselectivity of fatty acid hydroxylation, 26,27 and redox-catalyzed drug metabolism mechanisms. 28 ChemShell has also been used to study prototypical copper complexes, 29 electron hole transport in proteins, 30 protein−protein interfaces, 31 covalent inhibitors for drug design, 32 DNA repair mechanisms, 33,34 and enzymatic reactions in crystals. 35,36 As well as studying reaction mechanisms, ChemShell is used to determine QM/MM spectroscopic properties through use of appropriate QM programs, with recent work involving IR, 37 NMR, 38 and Mossbauer 15,39 calculations. The QM and QM/ MM molecular dynamics driver can be used to study chemical reactivity and kinetics at arbitrary levels of theory, 40−42 and excited state optimization and molecular dynamics methods are also supported, which have recently been used to study chromophore emission properties in proteins, 43,44 light-oxygen-voltage domains in blue-light-sensitive proteins, 45 and OLEDs. 46 In materials chemistry, QM/MM techniques are increasingly used to model defects, localized electronic states, sorbed species, and catalytic reactions on surfaces, where they have distinct advantages over the widely used periodic methods in their avoidance of artificial periodicity, capability to handle charged (surface) defects without a posteriori correction schemes, and the availability of a clearly defined vacuum level; they may also prove to be more economical in computational resources. The solid state QM/MM embedded cluster model implemented in ChemShell has recently been used to investigate defect formation in wide band gap semiconductors; 47 the band energies of TiO 2 polymorphs 48 and band alignment of mixed-phase TiO 2 ; 49 the energetics of Mg doping, 50 stabilization of silicon and oxygen dopants, 51 and multiband luminescence in GaN; 52 and the nature of oxygen vacancies in transparent conducting oxides. 53 The embedded cluster approach is also extensively used for the study of microporous catalysts 54,55 and has been adapted to describe one-dimensional CdS nanowires. 56 Recent surface studies have included the adsorption of CO 2 on MgO 57 and Mn-doped MgO 58 and its reactivity with H 2 , 59 water oxidation on TiO 2 , 60 and reactivity on ice surfaces using instanton theory. 61−63 ChemShell has also been used to study molecular crystals, 64 metal−organic frameworks (MOFs), 65 and the aggregateinduced emission and the influence of structural distortion on the optical structure of nanoparticles. 66,67 Further discussion of the methodology for solid state embedding is available in ref 68.
The original ChemShell program, with an interface written in the Tcl scripting language 69 (herein referred to as "Tcl-ChemShell"), is a well-established, mature software package, having been under active development for over 20 years. This article describes our work to redevelop ChemShell using the Python programming language 70 ("Py-ChemShell"). This work was motivated for several reasons. First, Python has become established in recent years as one of the most popular programming languages in the computational chemistry community, and the redevelopment provides a more familiar and appealing working environment for new users. Second, Python is a more powerful and flexible language than Tcl, with better support for mathematical operations, complex data structures, and a wider range of support libraries. Third, the redevelopment has offered the opportunity to rationalize the underlying codebase in order to make ongoing development of new embedding methods more straightforward. This includes the development of a significantly improved parallelization framework, with associated new functionality including a general purpose finite difference gradient module and new directly linked interfaces to QM and MM codes to enable highly scalable QM/MM materials simulations. Finally, we have taken the opportunity afforded by rewriting the software codebase to change to an open source licensing model. The new Py-ChemShell package is available for download under the GNU LGPL v3 free software license at www.chemshell.org.

PYTHON IMPLEMENTATION
In the original Tcl-ChemShell program, the Tcl scripting language is used mainly to provide the user interface layer, with complex data structures and manipulation implemented in the more flexible C programming language, and some modules and interfaces to external packages written in Fortran. In Py-ChemShell, we have exploited the power of the Python language and the associated mathematical library NumPy 71 to simplify this structure, with both the user interface and data structures handled at the Python level, while the data structures can be directly accessed by Fortran modules and interface code without the need for C wrappers, as described below.
2.1. User Interface. Py-ChemShell retains the flexible scripting approach taken by Tcl-ChemShell, and we intend that Py-ChemShell inputs will look familiar to experienced ChemShell users, albeit translated into a new programming language. As a simple example, we present a script that carries out a purely quantum mechanical optimization of a water molecule using the NWChem QM code with the BLYP density functional and a 6-31G basis set: This script can be run either in an ordinary Python interpreter after importing the Py-ChemShell library, or through calling the user-compiled ChemShell executable as follows: chemsh h2o.py where h2o.py is the Python input script. The ChemShell executable is particularly intended for parallel calculations, as described in section 3.
2.2. Data Structures. The ChemShell object types previously implemented in C have been reimplemented as Python classes in accordance with the principles of objectoriented programming. An overview of these classes is provided in Figure 1. At the highest level are classes defining computational tasks, such as the SP class for single-point calculations, and the Opt class for geometry optimization. Different levels of theory are then defined, with classes

Journal of Chemical Theory and Computation
Article corresponding to each external software package interfaced to ChemShell. The classes are grouped by type of theory (QM or MM) and can be combined together using the QMMM class. Underneath this is the Fragment class that holds information on the chemical system such as the geometry, thus corresponding to the Fragment object in Tcl-ChemShell. In Py-ChemShell additional information such as shells (Shells), point charges (BQs), link atoms (Links), and periodic boundary conditions (Cell) are contained in subclasses that are accessible from the Fragment object.
Further classes include Field, which, as in Tcl-ChemShell, contains potential and field data on a grid, and Result, which collates the results of a calculation into a single object including energy, gradient, and Hessian information.
The user creates a Py-ChemShell object by instantiating one of the above classes. User instances can have arbitrary names. For example, in the script in section 2.1, my_mol is an instance of class Fragment while my_qm is an instance of the theory class NWChem. Each Py-ChemShell object instance contains data which is stored in instance variables, as with any other Python object. All array data (such as a list of atomic coordinates, for example) is stored in NumPy arrays to ensure high performance and ease of data manipulation. Each instance may also contain scalar data (such as the number of atoms, for example) and instances of other Py-ChemShell classes. The names of these attributes also serve as input keywords in the user interface, so that instances can be initialized when created. In the script in section 2.1, the line my_mol = Fragment(coords='water.xyz') creates an instance of the Fragment object named my_mol and initializes it by passing in the name of a file containing an XYZ geometry using the coords keyword, which refers to the NumPy array holding the geometry data. Once loaded, this data can then be accessed and edited using standard Python syntax as my_mol.coords.
Some other attributes are used as options to control the computational job. For example, my_qm.restart = True forces the QM calculation to restart based on a previously saved wave function from NWChem. Another set of attributes are read-only properties which are provided for the convenience of the user to enquire, such as radius, volume, and so on.
Py-ChemShell objects also contain methods, which are functions that act on the associated object; these correspond to the procedures in Tcl-ChemShell which are used to manipulate object data. For example, to remove the first atom from the my_mol fragment, the delete() method would be called as follows: my_mol.delete(0) 2.3. Python/Fortran Coupling. While the ChemShell user interface has been reimplemented in Python and ChemShell objects are now implemented as Python classes, a Fortran layer is required for certain lower level operations.

Journal of Chemical Theory and Computation
Article These include the integration of modules written in Fortran such as the routines for cutting clusters and the DL-FIND geometry optimization library, and the ability to interface directly to external packages written in Fortran. The parallelization framework in Py-ChemShell has also been written in Fortran so that it is not necessary to launch a Python interpreter on every MPI process (see section 3).
To address this need we have developed a general-purpose library, DL_PY2F, which enables Py-ChemShell objects to be directly accessed from the Fortran layer so that data can be passed in a straightforward manner to and from Fortran modules and external libraries. When DL_PY2F is run, it creates a table of pointers to the Python object data (such as NumPy arrays and references to other objects) together with their datatypes. This table is accessed from the Fortran side through a convenient dictionary-like interface of key and value pairs, through which any data within the object can be obtained and amended. For example, for the DL-FIND optimization library described in section 2.6, the user options selected for a geometry optimization, together with the initial structure, are passed to DL-FIND by DL_PY2F. The calculated energy and gradients required over the course of the optimization are obtained via a callback mechanism. DL-FIND then updates the new sets of positions directly using DL_PY2F.
Unlike other Python/Fortran coupling approaches, DL_PY2F is specifically designed for interfacing with modern Fortran and fully supports complex datatypes such as those used by Py-ChemShell. Full technical details of DL_PY2F will be published in a companion paper.
2.4. External Interfaces. Py-ChemShell supports calculations using a range of QM and MM packages through user interfaces written in Python that automatically handle both the creation of input files and the parsing of outputs in a consistent manner across packages. Each interface is implemented as a Python class, with optional attributes to specify general settings, such as energy evaluation methods, and any program-specific settings for each software package. In the example input script in section 2.1, my_qm is defined as an instance of the NWChem class with an appropriate set of QM options.
For the initial release of Py-ChemShell, a subset of the interfaces available in Tcl-ChemShell were targeted, specifically those that are particularly well-suited for calculations on highperformance computing platforms. For each external code a loosely coupled interface is available, where the code is executed through a system call, or alternatively a directly linked interface where the external code is compiled into ChemShell as a library. The latter mode is important for parallel execution as described in section 3.
NWChem is an ab initio quantum chemistry software package developed by a consortium of scientists led by the Environmental Molecular Sciences Laboratory at the US Pacific Northwest National Laboratory. 72 It supports Hartree− Fock, density functional theory, and several high-level ab initio methods. NWChem is highly scalable with an architecture built on the Global Arrays toolkit (see section 3). The NWChem package is freely available under the open source Educational Community License 2.0. As part of the ChemShell redevelopment project, we have implemented a frozen density embedding model 73,74 in NWChem for use with ChemShell QM/MM calculations, which will be described in a follow-up paper.
GAMESS-UK is a quantum chemistry package developed by Daresbury Laboratory and collaborators 75 as part of the UK's Collaborative Computational Project for the Electronic Structure of Molecules (CCP1). It supports Hartree−Fock, DFT, and MCSCF calculations together with a variety of post Hartree−Fock methods. GAMESS-UK is highly scalable through MPI parallelism or Global Arrays builds. The GAMESS-UK package has a proprietary license but is available free of charge to UK-based researchers.
LSDalton is a quantum chemistry program which forms part of the Dalton suite 76 led by developers at the University of Oslo. LSDalton contains a highly efficient, linear-scaling implementation of Hartree−Fock and density functional theory suitable for large molecular systems with robust wave function and response optimization procedures. The LSDalton program is open source and available for download under the GNU LGPL v3 license. QM/MM calculations with Chem-Shell/LSDalton support hybrid MPI/OpenMP execution and have been shown to give excellent scaling on high performance platforms. 77 GULP (General Utility Lattice Program) is a molecular mechanics software package developed at Curtin University. 78 ChemShell fully supports all forms of QM/MM embedding with GULP, including polarized embedding using shell-model force fields. It is particularly well-suited for QM/MM materials chemistry calculations as it offers excellent support for materials force fields. GULP has a proprietary license but is free of charge for academic users.
DL_POLY 4 is a general purpose molecular dynamics program 79 developed by a team led by Daresbury Laboratory under the UK's Collaborative Computational Project for Computer Simulation of Condensed Phases (CCP5). In Tcl-ChemShell, the original replicated data version of DL_POLY ("DL_POLY Classic") was available as a built-in module, which offered support for QM/MM calculations of up to roughly 50000 atoms. In Py-ChemShell, we have implemented an interface to the more modern DL_POLY 4 program, which is parallelized based on a domain decomposition strategy allowing much larger MM environments to be included in the QM/MM calculation (see section 3). DL_POLY 4 has a proprietary license but is free of charge for academic users.
Further QM and MM programs will be interfaced to Py-ChemShell in future releases.
2.5. QM/MM Driver. The original Tcl/C "hybrid" module for QM/MM calculations in Tcl-ChemShell has been rewritten in Python with an emphasis on flexibility to support ongoing embedding model developments. In the initial release of Py-ChemShell, the standard additive QM/MM scheme has been implemented as described in the original QUASI project publication, 1 with full support for both link-atom and ionic boundary methods for handling the QM/MM boundary region. Mechanical, electrostatic, and polarized (shell model) embedding are all implemented in full. The charge shifting scheme for avoiding overpolarization of link atoms is available and automatically applied when electrostatic embedding is selected. Link atom forces are resolved as in the original implementation.
QM/MM calculations are performed in an analogous manner to Tcl-ChemShell, with a QMMM class used to define the overall calculation, which contains subclasses to define the QM and MM components. The new QM/MM driver has been rigorously validated against the original Tcl-ChemShell implementation. In the initial release, MM force fields must

Journal of Chemical Theory and Computation
Article be defined in either GULP or DL_POLY format, but future releases will include a mechanism for automated import of standard MM force fields to support biomolecular QM/MM calculations.
2.6. Geometry Optimization with DL-FIND. DL-FIND is an open-source geometry optimization library 80 developed at Daresbury Laboratory. It is the recommended optimizer module in Tcl-ChemShell and Py-ChemShell, implementing a wide range of optimization algorithms. Local energy minimization methods include steepest descent, conjugate gradients, Newton−Raphson, Quasi-Newton (BFGS), and damped molecular dynamics. Transition states can be located via partitioned rational function optimization (P-RFO) or the dimer method, 81 and full minimum energy reaction paths can be optimized using the nudged elastic band (NEB) method. 82,83 Global optimization can be performed using a genetic algorithm or stochastic search, and multiple electronic state crossings can be found using three conical intersection optimization algorithms. 84,85 Reaction rates can be calculated using harmonic transition-state theory and quantum tunneling paths optimized using instanton theory. 86 During a DL-FIND optimization, the main optimization cycle is controlled by the chosen optimizer, and energies and gradients are requested from ChemShell as required. The modular design of ChemShell allows any choice of QM, MM, or combined QM/MM method to be used in conjunction with DL-FIND, which needs no information on the underlying method used. Several techniques have however been implemented within DL-FIND to facilitate the optimization of large QM/MM systems. For example, the calculation of a Hessian matrix rapidly becomes intractable for large systems, but this can be avoided by using the limited memory variant of the BFGS algorithm 87 (L-BFGS) for minimization and dimer method for transition state optimization. A number of coordinate systems are implemented, which are converted automatically within DL-FIND to/from Cartesian coordinates, including the highly efficient hybrid delocalized coordinate system (HDLC), which calculates delocalized internal coordinates using a divide-and-conquer approach that splits a complex system into manageable residues. 88 Microiterative methods have also been implemented for minimization, transition state searches, and nudged elastic band, in order to increase the efficiency of QM/MM optimization. 89 These methods relax the MM environment fully after every QM step, relying on the fact that MM energy and gradient evaluations are usually much quicker than QM to reduce the overall time to optimization.
In Py-ChemShell, DL-FIND is invoked through the Opt Python class, which is designed to be extensible to enable linking in other optimization modules in future. The Opt class interfaces with DL-FIND using the DL_PY2F library as described in section 2.3, with optimized structures returned as Py-ChemShell objects.
2.7. Generation of Finite Cluster Models. ChemShell uses an embedded cluster approach for QM/MM calculations of materials, in which a finite cluster is cut from a periodic system in order to model localized states such as point defects and adsorbed molecules. 1 This approach has a number of advantages over periodic models: the method is well-suited to QM packages that do not support periodic MM background charges, there are no inaccuracies due to periodically repeating defects, charged states can be readily calculated, and a common reference energy can be defined for ionization potentials of different systems. The disadvantage is that periodic electrostatic interactions are lost, but these can be compensated for by fitting point charges around the outside of the cluster to match the electrostatic potential and derivatives in a sampling region of the original periodic system.
In Tcl-ChemShell, a cluster cutting module is available that automates much of the cluster setup process, including the charge fitting scheme. This module has been ported to Py-ChemShell with a new Python wrapper interfacing the code via DL_PY2F.
The point charge fitting scheme can be computationally intensive as Ewald summations over the periodic system are required to calculate the electrostatic potential and its derivatives over a set of atomic centers and grid points in the sampling region. These calculations are however independent and so amenable to parallelization. An MPI replicated data version of the module is currently under development and will be released in a future version of Py-ChemShell.

TASK-FARMING PARALLELISM
3.1. Methodology. Task-farming parallelism is useful in situations where a set of independent calculations have to be performed. 90, 91 The task farm consists of all available processors, which are then divided into subsets of processors

Journal of Chemical Theory and Computation
Article called workgroups. The tasks are split between the workgroups, which then work on the separate tasks in parallel. As the calculations are independent, no information needs to be exchanged between workgroups during this time, and sharing of results can be postponed until all the tasks have completed. In computational chemistry, it is common to carry out calculations containing a number of single-point energy evaluations that are independent from each other. Typical examples of applications include the nudged elastic band method for reaction path optimization, finite-difference numerical gradient and Hessian calculations, and optimization methods based on a population of structures (e.g., genetic algorithms). Figure 2 illustrates the task-farmed MPI parallel environment of Py-ChemShell. The MPI processes are evenly grouped into workgroups, each containing one Master process and zero or more Replica processes. Each Master process launches a Python interpreter instance, which parses the user input script and invokes external programs to execute computational chemistry tasks. If the size of the workgroup is greater than one, these tasks will be run in parallel across the Master and Replica processes. This results in a two-level parallelization framework that allows Py-ChemShell to execute multiple parallel energy evaluations simultaneously.
This approach follows the previously implemented taskfarming framework in Tcl-ChemShell 91 but goes beyond it in several important respects.
First, as indicated in Figure 2, we allow parallel tasks within workgroups to be executed either over the full workgroup or on a subset of the workgroup processes. This is useful for QM/ MM simulations where the QM part of the calculation is usually the dominant computational expense and the most amenable to parallelization, whereas the MM region is generally relatively small and quick to compute, such that using a subset of the processes may reduce the time to solution due to avoiding unnecessary parallel communication overheads.
Second, task-farming with the Global Arrays (GA) library is fully supported. The GA library is used in a variety of popular quantum chemistry packages, including NWChem and GAMESS-UK, and our implementation ensures that taskfarmed QM/MM calculations with NWChem can be performed in ChemShell. In order to support task-farming, GA processor groups are set up at the initialization of the calculation to match exactly the MPI workgroups, such that all GA and MPI calls apply to the same set of processes.
Third, we have implemented new parallel tasks within Py-ChemShell, including a new numerical finite difference gradient module, which can be helpful when using levels of theory for which analytic gradients are not available. The finite difference gradient module is seamlessly integrated into ChemShell so that it can be used in place of an analytical gradient wherever the user requests it and can be run either in serial or parallel as desired. One-point and two-point finite difference gradients are available.
3.2. QM/MM Task-Farming Benchmarks. To validate and benchmark the task-farming implementation, we performed a series of two-point finite difference gradient calculations. An MgO cluster containing 2263 atoms and 1133 shell centers was constructed (see Figure 3), and numerical gradients were calculated over the 34 QM atoms (102 degrees of freedom, i.e., 204 QM/MM single-point calculations). In each single-point calculation, a full relaxation of the shell centers is undertaken until the QM and MM regions are self-consistently converged (tolerance: 1.0 × 10 −4 a.u.), after which a final energy evaluation is carried out for estimating the gradients.
In Table 1, we present example QM/MM benchmark calculations using GAMESS-UK as the external QM driver and G U L Pa st h eM Md r i v e r .T h eQ Mc a l c u l a t i o n sw e r e performed using density functional theory with the BLYP functional. 93,94 The atoms in the QM region were described using a modified def2-SVP basis set 95 with Stuttgart effective core potentials 96 placed on the Mg ions in the QM region and on the cations in the boundary region. The GULP force field used was as described in previous studies of MgO. 57 Taskfarming scalability has been benchmarked by varying the number of workgroups in a fixed allocation of 48 cores on the ARCHER supercomputing service (a Cray XC30 platform with two 2.7 GHz 12-core Ivy Bridge processors per node). The table lists the wall time for these calculations with respect to changing the number of workgroups (n workgroups ), and Figure 4 visualizes the profiles. Overall, we observe a 5.7-fold speedup for a 24 workgroup calculation compared to the non-taskfarmed reference (n workgroups = 1). Note that the scaling profile of the task-farmed calculations is nonlinear as n workgroups grows, and it always has a minimum point. This is due to the increase of computational time for a single-point GAMESS-UK or GULP execution as n cores per workgroup decreases (see dashed lines in Figure 4).
To demonstrate the scalability of task-farming parallelization to higher core counts, we also carried out similar two-point finite difference gradient calculations while increasing both n workgroups and the total number of processor cores (n cores ), while fixing n cores per workgroup at 48. Results are shown in Table 2 and Figure 5. The speedup obtained in this way on 8 workgroups (384 ARCHER cores) is 7.26, illustrating the inherent linearity in the task-farming parallelization of the finite difference task. In this case, the time used for a single-point GAMESS-UK or GULP execution remains roughly equal because the available n cores per workgroup is a constant

Journal of Chemical Theory and Computation
Article number. Note that superlinear scaling is possible because of subtle variations in the shell relaxation starting points, but this is an artifact of the benchmark rather than a systematic effect.
Although these benchmarks cannot be compared in a straightforward manner with Tcl-ChemShell (as the finite difference gradient module is not available in that program), the observed performance is broadly in line with the previously implemented task-farming parallelization framework, 91 as would be expected given that the overall timings are determined for the most part by the parallel scaling performance of the GAMESS-UK and GULP packages. In the next section, we consider a system which is beyond the capability of Tcl-ChemShell to calculate.
3.3. QM/MM Nanoparticle Benchmarks. To assess the performance of Py-ChemShell for large QM/MM systems, we have carried out NWChem/DL_POLY 4 single-point QM/ MM energy evaluations for a ZrO 2 nanoparticle of radius 75.0 Å, containing 162994 atoms, as illustrated in Figure 6. It is not possible to treat such a system in Tcl-ChemShell because the built-in version of DL_POLY Classic used for MM energy evaluations uses replica data parallelism with an upper limit of The total wall time (t total in h) for the calculation quoted is the average over workgroups, and the time for single-point (SP) calculations is averaged over all calculations (t SP in sec). n workgroups , n cores , n cores/workgroup , and n SP refer to the number of workgroups, processor cores, processor cores per workgroup, and single-point calculations, respectively. The active region for the calculation is the QM region (

Journal of Chemical Theory and Computation
Article roughly 50000 atoms. This limit is removed in Py-ChemShell through interfacing to DL_POLY 4. The QM calculations were carried out at the Hartree−Fock level using the def2-TZVP basis set, 95 with the corresponding ECP 97 for Zr. The MM force field used was as described in previous studies of ZrO 2 . 98 The resulting time used for the calculations are listed in Table 3, which shows that Py-ChemShell scales efficiently for this system. To demonstrate the facility for using different number of processes for the QM and MM parts of the calculation, we have restricted the DL_POLY energy evaluations to 8 cores, while the computationally far more intensive NWChem calculations uses the full set of processes. The overall wall time is therefore dominated by the NWChem QM calculation (>99%), and good scaling is observed as the number of cores increases (speedup column).

CONCLUSION AND OUTLOOK
The ChemShell software package has been completely rewritten with the aim of providing a modern, scriptable platform for multiscale computational chemistry. This redevelopment of ChemShell features an easy-to-use Pythonbased interface and a high-performance Fortran parallelization framework. This article describes the features of the code available in the initial public release in December 2017 (version 17.0), which is focused on QM/MM calculations for materials chemistry. However, the code continues to undergo rapid development, and a number of additional features are targeted for release in the near future. These include a built-in molecular dynamics driver incorporating the open source DL_POLY Classic code 99 and support for biomolecular QM/ MM calculations through automatic import of biomolecular force fields via an interface to the DL_FIELD program, 100 bringing these aspects of Py-ChemShell up to feature equivalence with Tcl-ChemShell. Further functionality not currently available in Tcl-ChemShell will continue to be introduced, including full support for embedded cluster calculations with the ORCA QM code 101 and a new interface to the density functional tight binding package DFTB+ for fast, semiempirical DFT calculations. 102 In ongoing projects, we are implementing a periodic QM/MM embedding scheme together with interfaces to periodic QM programs including CP2K 103 and CRYSTAL 104 and developing new methods for spectroscopic signatures based on the frozen density embedding model developed in NWChem. Future releases will also include methods for "adaptive" QM/MM calculations, where molecules can be exchanged between the QM and MM regions. A graphical user interface is also under development through a plug-in to the open source Aten visualizer. 105 For further information about the ChemShell project and to download the Py-ChemShell code, please visit www.chemshell. org.

* S Supporting Information
The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jctc.8b01036.
Summary description of archived structures, basis sets, effective core potentials, and force field definitions (PDF) An archive of structures, basis sets, effective core potentials, and force field definitions used in the MgO and ZrO 2 benchmark calculations (ZIP)