PiNN: A Python Library for Building Atomic Neural Networks of Molecules and Materials

Atomic neural networks (ANNs) constitute a class of machine learning methods for predicting potential energy surfaces and physico-chemical properties of molecules and materials. Despite many successes, developing interpretable ANN architectures and implementing existing ones efficiently are still challenging. This calls for reliable, general-purpose and open-source codes. Here, we present a python library named PiNN as a solution toward this goal. In PiNN, we designed a new interpretable and high-performing graph convolutional neural network variant, PiNet, as well as implemented the established Behler-Parrinello high-dimensional neural network. These implementations were tested using datasets of isolated small molecules, crystalline materials, liquid water and an aqueous alkaline electrolyte. PiNN comes with a visualizer called PiNNBoard to extract chemical insight"learned"by ANNs, provides analytical stress tensor calculations and interfaces to both the Atomic Simulation Environment and a development version of the Amsterdam Modeling Suite. Moreover, PiNN is highly modularized which makes it useful not only as a standalone package but also as a chain of tools to develop and to implement novel ANNs. The code is distributed under a permissive BSD license and is freely accessible at https://github.com/Teoroo-CMC/PiNN/ with full documentation and tutorials.


Introduction
One major task of computational chemistry is to map the structure of a molecule or a material to its property, i.e. f : { x i , Z i } → P .When P is the total energy, then the task is to devise computational methods to find approximate solutions to the Schrödinger equation, as Dirac foresaw in his 1929 account 1 and what generations of computational and theoretical chemists have been devoted to.What is even more challenging is to do the reverse f : P → { x i , Z i }, i.e., to propose new structures which have properties of particular value.
To address these challenges, machine learning (ML) has attracted considerable attention and efforts in computational chemistry and materials discovery, [2][3][4] and many different types of ML methods have been successfully applied in those areas.In this work, we will focus on atomic neural networks (ANNs), that have been very successful in predicting physico-chemical properties, approximating potential energy surfaces (PES) 5,6 and allowing for simulations of large-scale systems with the accuracy of reference electronic structure calculations but at only a fraction of the computational cost.
In spite of this great promise and present success, the development of ANNs is not straightforward.ANNs must preserve rotational, translational and permutational invariances of the system, which was recognized in the early days of ANN development. 7Besides using functions of internal coordinates, which are rotationally and translationally invariant, a symmetrization process was proposed to preserve also permutational invariance.However, such models could only be applied to systems of a given size and large-scale simulations were not possible.
Behler-Parrinello neural networks (BPNNs), 5 or high-dimensional neural network potentials, introduced the ansatz of partitioning the total potential energy of the system into effective atomic contributions.Not only does the atomic energy ansatz enable applying a trained ANN to systems of different sizes, but it also transforms the problem of describing the full system to that of describing the local chemical environment of each atom. 8,9][16][17] The BPNN architecture relies on calculating fingerprints of the atomic environment using a set of symmetry functions, that need to be selected before the fitting procedure can begin. 5In contrast, the features are automatically learned via a feature hierarchy rather than handcrafted in convolutional neural networks, one of the most successful end-to-end techniques in the field of deep learning which won the ImageNet competition in 2012. 18Because molecules and materials can be viewed as fully connected graphs, this led to the development of graph convolution neural networks (GCNN) in atomic systems. 6,19,20Hierarchical atomic features can be obtained by applying multistage concatenated convolution operations and this leads to impressive performance for a variety of systems 6,[21][22][23][24][25][26] .Among those, SchNet 21 is a leading example for extending GCNN methods to the modeling of both molecules and materials.
Despite these progresses, developing interpretable ANN architectures, 27,28 and implementing existing ones efficiently are still chal-lenging.][31][32][33] Here, we present a python library named PiNN as a solution toward this goal.
PiNN supports both BPNNs and GCNNs.The unique features of PiNN are, for example, i) that it contains a new type interpretable and high-performing GCNN variant, namely PiNet, and ii) that the interpretation of GCNN in PiNN is given by visualizing feature activation using a user-friendly JavaScript plugin PiNNBoard, instead of doing dimension reduction of embedding vectors 22,25,26 or generating 3D potential map of a test particle. 21Moreover, PiNN provides analytical stress tensor calculations for lattice optimizations and constant pressure MD simulations.
In the following, we first introduce PiNN's representation and abstraction of ANNs with focus on PiNet -an interpretable GCNN architecture we developed.Then, we discuss implementation and package features of PiNN.After that, we present different case studies for molecules, crystalline materials and liquids.Finally, we conclude with an outlook.

Representing local chemical environments
The many-body expansion decomposes the total potential energy E tot of a system of N atoms into n-body terms E (n) : where x i is the atomic position and Z i is the atomic number.In ANNs discussed in this work, E tot by construction equals the sum of all effective atomic energies E i .Note that one may also apply this construction to other atomic properties, such as partial charges.
In the construction of ANNs, descriptors which satisfy certain conditions such as symmetry invariance are needed.Two categories of descriptors have been developed to represent local chemical environments around atoms.One is inspired by the many-body expansion to include radial and angular terms 34 (but this does not mean only two-body and three-body information are captured), e.g.atom-centered symmetry functions, 5, 35 Faber-Christensen-Huang-Lilienfeld (FCHL) representation 36 and local reference frames. 37The other is using the expansion of atomic density in terms of orthogonal radial functions and spherical harmonics, 38 e.g.smooth overlap of atomic positions (SOAP). 39However, as pointed out recently, 38,40 these phenomenological categories are indeed connected and one can systematically build up many-body representations from atomic cluster expansion in terms of singlebond basis functions, 40 tensor product of symmetrized two-body dyad 38 or finite powers of the two-body base kernel. 41he key insight of these recent works is that the full set of bond vectors originating from a given center atom in an atomic configuration of molecules or materials contains the complete information about the corresponding structure and a many-body representation can be generated through interactions of bond vectors.In fact, the same inspiration is shared in GCNNs, where the starting point is a directed graph of the structure with annotated bonds as edges and the many-body representation in the latent space is generated from a series of graph convolution blocks (See Section Pairwise Interactions and Interaction Pooling for more discussions).

Behler-Parrinello and graph convolutional neural networks
Here, we will briefly describe two important ANN architectures relevant to PiNN.
In BPNNs, the chemical environment around an atom is represented by a vector of symmetry functions. 5,8Such symmetry functions are rotationally, translationally, and permutationally invariant, and can capture both radial and angular features of the chemical environment within a cutoff radius R c .Each chemical element has its own set of symmetry functions, and also its own neural network architecture and fitted parameters.The adoption of the atomic energy ansatz (Eq. 1) in BPNNs makes the resulting ANN applicable to systems with an arbitrary number of atoms.
A GCNN 6,19,20 is a combination of a graph neural network and a convolutional neural network.A graph neural network considers the atoms as nodes and the pairwise interactions between them as weighted edges.Node and edge feature vectors are iteratively updated through e.g. a message passing function. 42he ingredient of convolution is often recognized as a learnable radial filter which gathers information from neighboring atoms within a cutoff radius R c and creates a feature hierarchy.Because the generation of node features includes the element-specificity by construction (See Eq. 2 and Eq. 3 in the next Section), GCNN has exactly the same subnet for each element.

Pairwise interaction and interaction pooling
In PiNN, we describe both BPNN and GCNN with two abstractions (Fig. 1): pairwise interaction operation (PI), and interaction pooling operation (IP).We start by labeling atom i with a set of scalar numbers, or an atomic property P i .It is sufficient to use the nuclear charge Z i or a one-hot embedding of the element as the starting point.Then, we create the pairwise interaction I ij as a function of the initial atomic properties of two atoms and their distance r ij (Eq.2).
where t is an iterator.For BPNN, t = 0 and for GCNN, t + 1 goes up to the number of graph convolution (GC) blocks (Fig. 1).The opposite of the PI operation is IP.Namely, this operation creates an atomic property from all the pairwise interactions associated with that atom.This is done by passing the summation over all the pairwise interactions to another function called IP (Eq.3).The summation ensures the permutation invariance of the generated atomic property.
The combination of the PI and IP operations essentially creates an updated atomic property, with information collected from neighboring atoms.This is referred to as atomic fingerprints, continuous-filter convolution or neural message passing in the literature. 9,21,42Through multiple GC blocks PI + IP → PI + IP • • • , I ij also get updated in this process.The actual forms of the IP and PI functions are different in each ANN and this gives the freedom to create novel architectures (Fig. 1).
To illustrate the many-body representation in GCNN, we use a single water molecule as an example to show how three-body interactions are generated from two-body representation, as shown in Fig. 2. In the stage of embedding, the water molecule turns into a directed graph with node feature P t i and edge feature I t ij for i = 1, 2, 3 and j = 1, 2, 3.It is worth to note that I t ij is a high-dimensional feature vector and I t ij is not necessarily be the same as I t ji .At t = 0, the PI operation will take node features P 0 i , P 0 j , and the bond r ij to generate the interaction I 1 ij .Subsequently, the IP operation will update the node feature P 1 i by summing all interactions centered at atom i. P 1 i is equivalent to a vector of radial symmetry function values in BPNNs for atom i.At t = 1, the PI operation generates a new interaction I 2 ij by repeating the same procedure, which is followed by another IP operation.What is important is to note that P 2 i is not only a three-body function but also a unique representation of the local chemical environment of atom i.Moreover, the pairwise interaction I 2 ij is already a three-body function in GCNN and the inclusion of explicit triplewise interactions I ijk is possible but unnecessary.
To close this section, we note that although multiple GC blocks are capable of generating multi-body representations, there is no general proof regarding the uniqueness of GCNN representations.To the authors' knowledge, the closest example shows that GCNNs can be as powerful as the Weisfeiler-Lehman algorithm in detecting isomorphic graphs. 43gure 2: Illustrating how three-body representations for a water molecule are generated from two-body representations and graph convolution blocks in GCNN.I t ij and P t i are the pairwise interaction between atom i and j and the node feature vector of atom i respectively.t is the iterator in graph convolution blocks.

Architecture of PiNet
With the formalism of PI and IP, we designed and implemented a GCNN variant called PiNet.This network is motivated by the aim to make the activations (pairwise interactions and atomic properties) of ANNs interpretable.
The main idea behind PiNet is to define the pairwise interaction as a function of distance, whose exact form depends on the atomic properties of both interacting atoms.In other words, the weight matrix W ij used for generating the pairwise interaction I ij depends on both atomic properties P i and P j .This makes each component of the pairwise interaction I ij to have different radial dependence, which is unique in PiNet and differs from the common approach of using a single radial-dependent filter function (attention mask). 21,23,24n PiNet, the PI operation is split into three steps (Figure 1c): (i) expressing the interatomic distances in a radial basis e ij ; (ii) activation through the PI "layer", which is a feedforward NN generating a weight matrix W ij from the atomic properties P i and P j ; (iii) activation through the II (Interaction to Interaction) layer, which is a feed-forward NN using the information from the previous two steps as input and generating the interaction I ij .
The radial basis e ij for the pair of atoms i and j is calculated as n basis Gaussian functions multiplied by a cutoff function f c .
The centers of the Gaussian functions r 1 , r 2 , • • • , r n basis are chosen to be evenly spaced between 0 and the cutoff radius R c , and the hyperparameter η determines the width of the Gaussian functions.
We have chosen the cutoff function f c given in Ref. 8, which ensures that the interaction and its gradient vanish at the cutoff radius R c : The PI layer generates a weight matrix W ij from the concatenated atomic properties P i and P j : The II layer calculates I ij by means of a different feed-forward NN The radial basis e ij decays to zero beyond the cutoff radius and all biases are set to zero in the II layer, which guarantees the smoothness of PES in PiNet in contrast to other approaches where the interaction is directly generated from the distance and the atomic properties. 22,25t is worth mentioning that despite the fact that PiNet does not include a triplewise filter I ijk in the current implementation, angular information is nonetheless captured through the iteration of the graph convolution block.
After the PI operation, the updated atomic property P i is calculated from all pairwise interactions I ij as part of an IP operation (Eq.3).Before passing this atomic property on to the next GC block in the iteration, a PP (Propertyto-Property) layer (another feed-forward NN) is used for the further refinement, see Figure 1b.

Modularized ANN
PiNN is modularized so as to cater to the need of different uses.The task of building an ANN is split into three stages: dataset preparation, ANN definition, and model definition.As shown in Fig 3, PiNN primarily consists of three modules: input/output (io), networks, and models, which correspond to these three stages.The network does not specify the physical meaning of atomic predictions.That is instead defined through the loss function in a model.The model also contains hyperparameters such as the learning rate and the regularization.
This design enables a user to easily import an arbitrary dataset without touching the code base.Similarly, researchers interested in implementing a new ANN architecture could implement it as one of "networks", and use the rest modules to test its performance.Finally, the "models" module could be extended to use the existing networks for different property predictions (e.g.dipole moments and partial charges in the Case Studies), or to interface with other external codes.Notably, this modularized structure also decouples the implementation of common neural network building blocks, such as activation functions and optimization algorithms, from the ANN architecture.For example, a number of activation functions (e.g.tanh, logistic, or softplus) can be chosen in PiNN as adjustable hyperparameters of the network architecture.

Dataset preparation
PiNN is implemented on top of TensorFlow's estimator API, 44 which requires the training data to be represented with the dataset class.
To enable easier utilization of data from different sources, we provide the functionality of creating custom dataset loaders.Given a list of data files and a reader function, the dataset loader can be used to split the dataset, as is commonly required for neural network training.Similar procedures can be applied to trajectory files or databases.Further instructions for building datasets are provided in the documentation. 45everal types of dataset loaders are provided, such as for the QM9 dataset, 46 ANI-1 dataset, 47 Numpy 48 formatted datasets, ASE 49 databases, RuNNer-format 9,50 datasets and CP2K 51 trajectories in XYZ-format.
In addition, the dataset objects can be saved into TensorFlow's tfrecord file format, for fast reading, and serving the dataset from a remote file system.

Interfaces with ASE and AMS
PiNN comes with interfaces connecting the trained neural network potential to ASE 49 through its calculator class (see Fig 4), and to a development version of AMS 52 as an external engine.
Periodic boundary conditions are seamlessly supported.The neighbor list and pairwise distances of periodic system are efficiently calculated using a cell lists algorithm 53 that we implemented in TensorFlow.
With either ASE or AMS, the PiNN implementations of BPNNs and PiNet can be used to run, for example, geometry optimizations and molecular dynamics (MD) simulations of both gas phase and condensed phase systems.The simulation trajectories can be visualized using their respective graphical user interfaces.Moreover, the implementation of PiNN as an ASE calculator allows it easily to be interfaced with other codes.For example, the path integral molecular dynamics (PIMD) code i-PI 54 can communicate with ASE calculators via a socket protocol, allowing PIMD simulations to be run with PiNN energies and forces.

Pressure calculations
Computing the stress tensor requires attention, especially when the potential is not pairwiseadditive. 55Although an ANN potential does not look like pairwise-additive, the stress tensor calculated using the pairwise form F ij • r ij yields the same result as the atomic form given in Ref. 55.These results were also validated using the finite difference of the potential energy with respect to the change of the volume.Therefore, the instantaneous pressure P (t) during an MD simulation is calculated as where is the derivative of the potential energy with respect to the pairwise displacement vectors r ij , 56 p i is the momentum of atom i, m i is the mass of that atom and Ω is the system volume.The first term on the right hand side of the equation is the ideal gas contribution and the second term is the virial pressure output from PiNN.

Visualization of atomic neural networks with PiNNboard
It has been shown that a visualization of the activations can provide insights to their func-tions in convolutional neural network. 57Such a visualization can also serve as a diagnostic tool to inspect and improve network architectures.Therefore, we developed a tool, PiNNboard, to visualize the activations of ANNs in atomic or pair forms.PiNNboard was implemented as a plugin for Tensorboard -TensorFlow's visualization toolkit. 44ere we demonstrate PiNNboard with the minimalistic PiNet, trained on a subset of the QM9 dataset containing 50604 organic molecules consisting of only C, H and O atoms.The network was trained for 3 million steps on the internal energy at 0 K (U 0 ) and details about the training setups can be found in the Case Studies section below.
In  Notably, three independently trained networks (Fig 6) reach quite different atomic energy partitions as well as different pairwise interactions for the testing molecules, in spite of their identical network structure, hyperparameters and similar mean absolute error (MAE).Therefore, one needs to be careful when interpreting atomic energies extracted from ANNs, which has also been pointed out recently. 58e also notice that the trained network (No.1 in Fig 6) which has the smallest prediction er-ror for the testing molecule also provides the best chemical interpretability.This suggests that not only is PiNet capable of providing a state-of-the-art performance but also the good outcome from PiNet can be rationalized in a chemically intuitive manner.

Case studies
For all discussed benchmarks, a network with 5 graph convolution (GC) blocks is used.The parameters are given in Table 1.Hyperbolic tangent activation functions were used in all layers, except for the output layer where a linear activation function is used for output.
Unless otherwise stated, a 80:20 dataset splitting was used for case studies, which means 80% of the structures of each dataset were randomly chosen to train the neural network, and the remaining 20% were used for validation.The Adam 59 optimizer in TensorFlow 44 was used for gradient descent updates with a batch size of 100 samples, the training rate was set to 0.0003 and decayed by a factor of 0.994 every 100000 steps, other parameters were kept unchanged.A gradient norm clipping strategy was employed to avoid exploding gradient problems. 60The trainings were terminated after 1-3 million gradient descent steps, which typically takes a day with a single NVIDIA TITAN V GPU card.2][23][24][25][26] Instead, SchNet, 21 which pioneered the application of GCNN for modeling both molecules and materials will be quoted as the main reference to put our results into perspective.

QM9 dataset
The QM9 dataset 46 is a dataset made up of 134k small organic molecules containing computed electronic, energetic and thermodynamic properties at B3LYP/6-31G(2df,p) level of theory, [61][62][63] which is often used for benchmarking ANNs.As commonly done in the literature, 30054 structures which failed a consistency check were excluded during training and evaluation. 21,24iNet reaches a MAE of 0.012 eV for the prediction of internal energy at 0 K, in comparison with 0.014 eV from SchNet.It is worth to mention that the so-called chemical accuracy from thermo-chemistry measurements is about 0.043 eV.
As an example of property predictions, we used PiNet to predict partial charges by regressing only the molecular dipole moment µ: where qi is the predicted partial charge on atom i.To ensure that the predicted total charge of each molecule is zero, we added a constraint term to the loss function.
By implementing this dipole model in PiNet, we predicted the dipole moment with an MAE of 0.018 D for the QM9 dataset.The network used to predict the dipole for the QM9 dataset was trained with a learning rate of 0.0001 and batch size of 200 structures instead.
To further validate the PiNet dipole model, we also calculated partial charges using the CM5 charge model, which has been parameterized to reproduce dipole moments from exper-iments or high-level quantum mechanical calculations. 64The CM5 charges were calculated at the B3LYP/6-31G(2df,p) level of theory using the Gaussian package, 65 matching the original conditions in which QM9 dataset was generated.Note that CM5 charges were not used in the training of PiNet.When comparing predicted partial charges from PiNet with those from CM5, 64 a good correlation was found as shown in Fig. 7, indicating that PiNet can generate physically meaningful partial charges.This result is particularly encouraging in light of the fact that only the scalar dipole moment was used during the fitting. 66

Materials Project and Perovskite datasets
We used the dataset ("MP-crystals-2018.6.1")provided by MEGNet 25 which contains DFTcomputed energies and band gaps for 69640 crystals extracted from the Materials Project. 67raining of PiNet was done with 60000 crystal structures from this dataset.The trained PiNet leads to a MAE of 0.029 eV/atom for the prediction of formation energy on the test configurations, in comparison with that of 0.035 eV/atom from SchNet using the same number of structures in the training set.To put these numbers into perspective, we note that the accuracy of experimental measurement of formation energies is about 0.082 eV/atom. 68Thus, PiNet provides also a sub-chemical accuracy for materials modeling.
In addition, PiNet was benchmarked on a dataset consisting of 18928 perovskite structures published by Castelli et al. 69 The achieved MAE of the total energy respective to the convex hull (for the purpose of assessing the thermodynamics metastability) from the trained PiNet is 0.042 eV/atom with a 60:40 splitting of the dataset.This is in comparison with the same MAE obtained from Crystal Graph Convolutional Neural Network (CGCNN) with a 80:20 splitting instead. 70The learning curve of PiNet with this dataset is shown in Fig. 8 (in logarithm scale).

Liquid water dataset
Here we showcase the application of the BPNN implemented in PiNN to liquid water using the dataset published by Morawietz et.al 71 based on the BLYP functional. 62,72To facilitate the training, we also augmented this dataset using the original BPNN implementation 71 (See Supporting Information for details).The set of symmetry functions were chosen to match the original ones, 16 however, hyperparameters such as the learning rate and optimizer are specific in PiNN.MD simulations were performed with the Berendsen thermostat 73 implemented in ASE and a patched Berendsen barostat (to include the missing ideal gas contribution in Eq. 8, for details see Ref. 45).
After the training, the BPNN implemented in PiNN reaches a root mean squared error (RMSE) of 7 meV/H 2 O for energy and 60 meV/Å for force components.These can be compared to 2 meV/H 2 O and 70 meV/Å for energy and force respectively, reported in Ref. 16.To validate this BPNN potential, we further carried out ab initio molecular dynamics (AIMD) simulations of the liquid water system at both NVT (constant particle, volume and temperature) and NPT (constant particle, pressure and temperature) ensembles with CP2K 51 and BLYP functional. 62,72Details of AIMD simulations can be found in the Supporting Information.

Proton transfer reactions
Finally, we demonstrate that PiNet can be applied to reactive MD simulations in which covalent bonds break and form.In particular, we take the example of proton transfer reactions in aqueous NaOH solutions.
We reused and slightly modified the dataset for NaOH(aq) solutions from Ref. 74, generated with the RPBE density functional 75 and Grimme's D3 dispersion correction. 76This dataset was originally constructed for use with BPNN and the resulting BPNN potential was tested for various thermodynamic and dynamical properties of NaOH solutions. 77,78n this case, we used smaller layers in PiNet (with 16 nodes per layer rather than 64) as compared to the other case studies to prevent overfitting, and optimized the parameters primarily to minimize the error in the predicted forces.We obtained an excellent RMSE of 0.11 eV/Å for the force components for both the training and validation sets, indicating that the fit did not suffer from overfitting.
Environment-dependent proton transfer free energy profiles for NaOH solutions were calculated with PiNet (Figure 10).For each OH -, the proton transfer coordinate δ min is calculated as the difference in length between the hydrogen bond along which the proton is transferred, and the covalent O-H distance for the bond that becomes broken. 79Corresponding equilibrium MD simulations were run using an interface with the Amsterdam Modeling Suite (AMS) 52 and technical settings of MD simulations were chosen to be the same as in Ref. 74 for the sake of comparison.
Figure 10 illustrates how the free-energy landscape for proton transfer depends on the local hydrogen-bonding environments around OH - and H 2 O. Just as revealed in the previous work using BPNNs 74 and ab initio simulations, 79 proton transfer occurs predominately via a presolvation mechanism: OH -mostly accepts four hydrogen bonds in its equilibrium structure, but the forward PT barrier is quite high in this case (blue curve).Instead, if via a hydrogen bond fluctuation the OH -only accepts three hydrogen bonds (red curve), then the forward PT barrier is much smaller.
Figure 11 shows the time required to evaluate the energy and forces for different system sizes of liquid water (averaged over 100 samples), using the PiNet model parameterized for NaOH.This highlights one of the appealing features of ANNs in which the computational cost grows linearly with the number of atoms.

Conclusion
Here we present PiNN -an open-source Python library for building ANNs of molecules and materials.In the current version of PiNN, BPNN and our GCNN variant PiNet have been implemented and benchmarked against several publicly available datasets as well as in-house data.
Built with the TensorFlow framework, PiNN allows for fast training of ANNs with GPUs.PiNN interfaces with ASE and AMS and provides several useful package features such as analytical stress tensor calculations and a visualizer of trained ANNs called PiNNBoard.
With modularized building blocks, PiNN can be used not only as a standalone package but also as a chain of tools to develop novel ANNs.In this work, we showed how such ANNs can be used for approximating potential energy surfaces, allowing for fast and accurate reactive MD simulations, or for directly predicting several different physico-chemical properties of molecules and materials, such as dipole moments and partial charges.
In the current implementation of PiNN, the primary focus is on predicting atomic properties.In the near future, we aim to include predictions of electronic properties such as polarization as well as the coupling to the external field in periodic systems.These extensions will be quite important for modeling electrochemical systems with finite field MD simulations. 80,81Moreover, we are interested in incor-porating active learning and on-the-fly potential generation.
Last but not least, PiNN will take advantage of the evolving TensorFlow ecosystem which allows to access novel optimization methods implemented in its peripheral packages such as K-FAC (Kronecker-Factored Approximated Curvature) 82 for fast training of ANNs.calculated at the BLYP level of theory to validate the BPNN implementation in PiNN.To aid the fitting procedure, we augmented the dataset with 2841 structures of liquid water distributed in the same density range.We used the original BPNN from Morawietz et al.S1 implemented in the RuNNer code S3,S4 to run molecular dynamics simulations of liquid water, and extracted random snapshots from these simulations for inclusion in the training set.The reference data, to which we trained our own BPNN implemented in PiNN, thus contained energies and forces calculated using DFT as well as using the original BPNN from Morawietz et al., which has been shown to have first-principles quality.

AIMD simulations of liquid water
The electronic structure of liquid water was solved applying DFT in the BLYP approximation S5,S6 as implemented in CP2K.S7 Triple-ζ basis sets with two additional polarization functions (TZV2P) and a charge density cutoff of 600 Ry were used.Core electrons were taken into account using the dual-space Goedecker-Teter-Hutter (GTH) pseudopotentials.S8 The model system consisted of 64 water molecules in a cubic box of length 12.432 Å as the initial condition.
For the NVT simulation, Bussi-Donadio-Parrinello thermostat S9 was applied to keep the temperature at 330K.For the NPT simulation, Martyna -Tuckerman-Tobias-Klein barostat S10 was employed as well.In both simulations, the time-step was chosen to be 0.5 fs.
Trajectories were collected for 20 ps in each case, which has been shown to be sufficient to obtain a reliable estimation of the density.S11

Figure 1 :
Figure 1: Illustration of the abstractions of BPNN and GCNN frameworks in PiNN, for calculating atomic energies E i (Eq.1).The operations containing trainable variable are filled with yellow color.Operations inside the dashed box are not yet implemented in PiNN but extendable.See Section "Pairwise Interaction and Interaction Pooling" for more explanations.

Figure 3 :
Figure 3: Illustration of the modularized structures in PiNN.

Figure 4 :
Figure 4: Code example for using a trained model as an ASE calculator.

Figure 5 :
Figure 5: Visualization of a minimalistic PiNet for a 2,3-Dimethylfuran molecule generated directly with PiNNboard.For simplicity, only one graph convolution block without PP and II layers was used.The atoms and bonds in each box indicate the normalized activations of the atomic property and pairwise interaction respectively.Dashed lines show the normalized trainable weights.Activation of pairwise interactions with an absolute value smaller than 0.3 is not shown for the sake of clarity.

Fig 5 ,
the activations of this minimalistic PiNet for the testing molecule 2,3-Dimethylfuran are visualized using PiNNboard.Colored atoms indicates the contributions of each individual atom to atomic properties.Colored bonds between atoms indicates the pairwise interactions whose contributions are significant.Indeed, most of the interactions identified by the PI layer in PiNet can be recognized as covalent bonds in Fig 5.Interestingly, strong activations in the pairwise interaction are also observed between hydrogen atoms and their second neighbors in the case of the methyl groups.

Figure 6 :
Figure 6: Activations of graph convolution and output layers after three independent trainings of the minimalistic PiNet shown in Fig 5.The prediction error in the U 0 of the testing molecule and the mean absolute error (MAE) on the validation set are also listed.

Figure 7 :
Figure 7: The correlation of the predicted partial charges from PiNet with those calculated from CM5 using the QM9 dataset.

Figure 8 :
Figure 8: MAE of predicted energy above the hull from PiNet as a function of training configurations' size in the Perovskites dataset.

Figure 9 :
Figure 9: Calculated oxygen-oxygen radial distribution function of liquid water at a) 330 K (NVT) and b) 330 K and 1 bar (NPT) using BPNN implemented in PiNN and AIMD.The level of theory is BLYP.

Figure 10 :
Figure 10: PiNet-calculated proton transfer free energy profile in 2.6 mol/L NaOH(aq) for OH -and H 2 O accepting different number of hydrogen bonds.The numbers beneath the molecules in the legend at the top refer to the total number of accepted hydrogen bonds.Here, we used a common definition of a hydrogen bond, for which the O-O distance is smaller than 3.5 Å, and the hydrogen-bonding angle is smaller than 30 degrees.

Figure 11 :
Figure 11: Computational cost as a function of system size, for the NaOH PiNet model with a single GPU card.

Acknowledgement
CZ is grateful to Uppsala University for a start-up grant, to the Swedish Research Council for a starting grant (No. 2019-05012) and to the Swedish National Strategic e-Science program eSSENCE for funding.MH received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 798129.Supports from NVIDIA Corporation GPU grant program and Google Cloud Platform (GCP) research credits award are also gratefully acknowledged.Part of the simulations were performed on the resources provided by the Swedish National Infrastructure for Computing (SNIC) at UPPMAX and PDC.Morawietz et al.S1 parameterized several Behler-Parrinello neural networks (BPNNs) with datasets containing energies and atomic forces, calculated at several different levels of theory, for structures of ice and liquid water.In the present work, we reused their dataset S2

Table 1 :
Network parameters used in Case Studies.
bThe PI layer does not contain any hidden layer and the output dimension of the PI layer equals to the number of elements in W ij .