Knowledge Graph Approach to Combustion Chemistry and Interoperability

In this paper, we demonstrate through examples how the concept of a Semantic Web based knowledge graph can be used to integrate combustion modeling into cross-disciplinary applications and in particular how inconsistency issues in chemical mechanisms can be addressed. We discuss the advantages of linked data that form the essence of a knowledge graph and how we implement this in a number of interconnected ontologies, specifically in the context of combustion chemistry. Central to this is OntoKin, an ontology we have developed for capturing both the content and the semantics of chemical kinetic reaction mechanisms. OntoKin is used to represent the example mechanisms from the literature in a knowledge graph, which itself is part of the existing, more general knowledge graph and ecosystem of autonomous software agents that are acting on it. We describe a web interface, which allows users to interact with the system, upload and compare the existing mechanisms, and query species and reactions across the knowledge graph. The utility of the knowledge-graph approach is demonstrated for two use-cases: querying across multiple mechanisms from the literature and modeling the atmospheric dispersion of pollutants emitted by ships. As part of the query use-case, our ontological tools are applied to identify variations in the rate of a hydrogen abstraction reaction from methane as represented by 10 different mechanisms.


■ INTRODUCTION
Modeling combustion in devices as part of relevant applications such as pollution prediction necessarily covers multiple domains. As an example, consider the prediction of emissions from ships, which involves at the very least a fuel model, and internal combustion engine model, data on wind direction and speed, an atmospheric dispersion model, and terrain and building models. In practice, this requires compatibility of data obtained from various sources in different formats and seamless interaction between various pieces of software, in short, interoperability.
Chemical kinetic fuel models, i.e., reaction mechanisms, form an essential part of any simulation of emissions from a combustion device but may not always be readily available for a particular fuel of interest and thus may need to be created in some way from the existing databases. The latter can be achieved for example either through automated mechanism generation tools 1 or assembly of subsets of species and reactions from (possibly multiple) the previously published mechanisms.
When trying to assemble a mechanism by combining collections of species and reactions from multiple sources, one encounters two well-known classes of consistency problems. 2,3 The first one relates to unique identification: what should be one and the same species may have been given different names or labels in models originating from different sources. Also, vice versa, species that ought to be distinct may have been given identical labels in different mechanisms. The second problem relates to data inconsistency: The same species or reaction from different sources may have been assigned different thermodynamic or kinetic parameter values, with variations at times well beyond the reported uncertainties.
What the two, at first sight perhaps seemingly unrelated, challenges of interoperability and consistency have in common that they can be both addressed at the same time using ideas from the Semantic Web. 4 The Semantic Web offers the ability to connect previously isolated pieces of data, associate meaning to them, and represent knowledge extracted from them. It is this collection of entities and the connections between them that defines the knowledge graph. Autonomous software agents 5 can then navigate this graph to manipulate it and interact with human and machine users.
A natural way to implement a knowledge graph is by means of ontologies 6 collections of entities and relationships between them. There have been several attempts to build a Chemical Semantic Web 7 using chemical ontologies 8 representing elements and substances to meet an increasing interest to generate knowledge from chemical data and to facilitate data sharing. A number of ontologies have been developed to capture and represent the semantics and knowledge of chemicals and chemical interactions with different levels of granularity. OntoCAPE (Ontology for Computer Aided Process Engineering) 9 was developed as a formal ontology for modeling chemical processes, including the concepts (classes) of elements, species, and reactions. In addition, a number of cross-domain ontologies that cover aspects of chemical modeling have been developed. ChEBI 10 is an ontology created for representing concepts and relations belonging to chemistry and biology. PubChemRDF 11 represents structures and metadata of chemical substances and compounds. In addition to chemical semantic resources, there are initiatives that have led to well-established chemical databases (PubChem, 12 PrIMe, 2 and Reaxys (https://www. reaxys.com), to name a few).
The J-Park Simulator (JPS, http://www.theworldavatar. com) is an implementation of a universal knowledge graph that uses the semantic representation to harness the reasoning and inferencing power of ontologies to perform cross-domain simulations.
The purpose of this paper is to present a proof of principle of how the concept of a knowledge graph can be used to address both the problem of interoperability in cross-domain applications involving combustion and the problem of naming and data inconsistencies in chemical reaction mechanisms. We aim to achieve this through two examples. In the first one, we apply ontological tools we have developed to query across multiple mechanisms from the literature and find inconsistencies in the rate of a hydrogen abstraction reaction from methane as represented by 10 different mechanisms. In the second example, we integrate kinetic fuel models in the form of mechanisms with an internal combustion engine model, realtime weather, and ship location data and an atmospheric pollutant dispersion model to simulate emissions from ships.
■ KNOWLEDGE-GRAPH APPROACH World Avatar. The J-Park Simulator (JPS) is an automation-centric implementation of a World Avatar as a decentralized privacy-aware extendable system that supports data-driven decision-making via the use of data and models that can be publicly available or privately owned and are represented and linked using a knowledge graph ( Figure 1). While respecting the accessibility restrictions put in place, the approach allows the navigation of automated intelligent software agents through relevant information objects that have different levels of accessibility to generate, store, and analyze data and enables the interoperability of data and models across multiple domains.
Linked Data 13 is the state-of-the-art approach for generating the web of data with semantics. JPS provides structure to data and semantics using a knowledge graph built upon the principles of Linked Data using ontologies. This allows the representation of data encompassing both empirically observed results and calculated output to record the state of a system and involved models (both physics-and data-based) to characterize the system as a function of its state and other model parameters. JPS facilitates automation of tasks via an ecosystem of computational and representational agents (of various types, 14 featuring behaviors 15 including simple, composite, sequential, and parallel) that operate on the knowledge graph. The OntoAgent ontology 16 has the logical infrastructure and coverage in terms of concepts and properties for the codification of agents.
JPS has been readily applied to many aspects of Industry 4.0 17 due to the codification of operational semantics of models and data. An example of this is the development of process optimizing solutions for the Eco-Industrial Park (EIP) on Jurong Island in Singapore. An EIP is comprised mainly of the product manufacturers and service providers collaborating to address issues related to CO 2 footprint and particulate emission and recover and reuse of waste materials and heat to achieve environmental and economic benefits. 18 An EIP may involve recovered waste heat supply to district heating, material exchanges, energy systems, and wastewater treatment networks, which can be modeled at different levels such as unit operations, processes, plants, and networks as well as optimized for improved performance. 19, 20 A number of ontologies have been developed for the JPS, which seamlessly connects with the relevant branches of OntoCAPE, 21 including OntoEIP, 22 designed for resource and transportation networks, and chemical process plants, an EIP energy system ontology, 23 built for a decision-making system integrating data from heterogeneous sources, and a biodiesel plant ontology, 24 built for simulating and optimizing biodiesel production.
The work described in this paper is positioned within this context. It addresses the needs of JPS by developing an ontology to represent chemical mechanisms and integrate the corresponding data into its knowledge graph. This supports the automation of processes within JPS by enabling intelligent agents to query and manipulate the knowledge graph, and thus to search and retrieve mechanisms for a given task.
OntoKin, OntoCompChem, and OntoSpecies. Onto-Kin 25 is a chemical ontology specialized for representing and managing chemical kinetic reaction mechanisms. OntoKin ACS Omega http://pubs.acs.org/journal/acsodf Article includes semantics of the chemical data in the representation of reaction mechanisms using Description Logic (DL). This offers advantages such as interoperability between chemical kinetic systems, agents' ability to comprehend chemical mechanisms automatically, the capability to perform complex semantic queries on the mechanisms in the Web environment, and easy detection of thermodynamic, transport, and reaction data inconsistencies across mechanisms. OntoCompChem 26 is an ontology for quantum chemistry calculations. It is an extension of the Gainesville Core 27 ontology and CompChem. 28 The goal of OntoCompChem is to add DL-based semantics of chemical data to computational chemistry calculations. This enables interoperability between quantum chemistry software, automated agents to understand such calculations, and reduced consumption of computational resources via the reuse of already performed calculations.
OntoSpecies is an ontology designed to capture both generic and domain-specific information about species, such as empirical formula, molecular weight, and standard enthalpy of formation. The ontology focuses on the linking of quantum chemistry calculations represented in OntoCompChem with reaction mechanisms codified in OntoKin. Due to its generic structure, the ontology can be used to map the existing databases of species. The ontology is suitable for harvesting and curating species data to develop high-quality resources of species. Figure 2 illustrates the three ontologies with a small subset of their concepts, data properties, and relations that are the building blocks of the knowledge graph. For OntoKin, the figure shows the Mechanism, Species, and Thermo Model concepts. The ontological model of the Mechanism concept consists of data and metadata of a mechanism. The Species concept includes data properties and relations of a chemical species. The Thermo Model concept defines the structure of thermodynamic models required for a species. The hasQuan-tumCalculationIRI data property represents an Internationalized Resource Identifier (IRI), which connects the thermodynamic model to computational chemistry calculations of a species. The hasUniqueSpeciesIRI data property represents an IRI that connects a species in a mechanism to its corresponding representation in OntoSpecies. The OntoKin ontology is available at http://www.theworldavatar.com/ ontology/ontokin/OntoKin.owl. Figure 2 depicts the G16, Geometry Optimization, Molecule, and Atom concepts of OntoCompChem. The G16 concept is an ontological model for the representation of electronic structure calculations, while Geometry Optimization represents the molecular geometry of both stable minima and transition-state species. The hasCoordinates object property is used for the codification of the three-dimensional (3D) geometry of a molecule. The hasUniqueSpeciesIRI data property links computational chemistry calculations of a species to its corresponding representation in OntoSpecies by means of an IRI. The OntoCompChem ontology is available at http://theworldavatar.com/ontology/ ontocompchem/ontocompchem.owl. Figure 2 includes the Species, Empirical Formula, Element Number, and Element concepts of OntoSpecies. The Species concept is designed to model a real-world species. Element defines the ontological structure to describe a chemical element or an atom, whereas Element Number establishes a link between a chemical element and its quantity within a species. The data properties that belong to OntoSpecies are dc:identifier, which codifies the unique identifier of a species, and skos:altLabel, which codifies alternative names. Adopting best practices in ontology development, these properties are reused from Dublin Core (dc) 29 and Simple Knowledge Organisation System (skos), 30 respectively. This modeling choice separates the names of a species from its identity. As a result, a species that has multiple names can still be recognized uniquely via its identifier (this approach is also taken for example by the CAS Registry and PrIMe 2 ). OntoSpecies thus addresses the species naming issues mentioned in the Introduction, including isomers, etc., via enforcing a unique entry for each real-world species. The OntoSpecies ontology is available at http://www.theworldavatar.com/ontology/ ontospecies/OntoSpecies.owl.
Populating the Knowledge Graph. For this paper, the OntoKin knowledge graph is populated by integrating the ontological representation of 50 arbitrarily chosen publicly available mechanisms from the literature. The largest mechanism contains more than 2800 species and 18 000 reactions, whereas the smallest one contains 14 species and 33 reactions, resulting in a total of over 16 million subject− predicate−object triples when deployed in an RDF4J triple store.
The agent that creates instances in the knowledge graph when a mechanism is uploaded relies on a conversion agent to convert between CHEMKIN 31 mechanism files and Web Ontology Language (OWL) files. The conversion agent supports the transformation of mechanisms in both directions between CHEMKIN and OWL, which is also used to prove

■ RESULTS AND DISCUSSION
This section introduces two use-cases to show how the OntoKin ontology and mechanism-integrated JPS knowledge graph can be applied: querying across mechanisms and the atmospheric dispersion of pollutants emitted by ships.
Querying across Mechanisms. OntoKin has been developed to allow any user to upload chemical mechanisms to the JPS knowledge graph and to query the knowledge graph to retrieve and compare species and reaction data. A web-based user interface (UI) to demonstrate this is available at the following link: http://theworldavatar.com/ontokin. A screen shot of the UI is shown in Figure 3.
The OntoKin system consists of three main components: the UI, a business logic layer, and the underlying JPS knowledge graph. The UI allows uploading mechanisms in the CHEMKIN format. The business logic layer includes a CHEMKIN-to-OWL conversion agent, an OWL file consistency checking agent, an OWL file-uploading component, and a query component. The conversion agent can assess the validity of a CHEMKIN mechanism. It is necessary to upload at least the kinetic mechanism and the thermodynamic data files. Transport data and surface chemistry files are optional. If the user-provided files represent a complete mechanism, the converter proceeds with the conversion and reports success or failure. In case of success, a consistency check is performed using the HermiT reasoner. If the OWL file passes the consistency check, it is uploaded to the JPS knowledge graph.
The UI allows users to select from a list of predefined queries (see Figure 3). The UI translates the user input into a SPARQL (SPARQL Protocol and RDF Query Language) query that is used to search the knowledge graph. The results are displayed as charts or tables in the UI. The queries predefined in the system will allow identification of the mechanisms containing a species of interest, as well as comparing the thermodynamic data of a species and the rate coefficients of a reaction across mechanisms.
An example of how to use the UI and the mechanisms in the knowledge graph is shown in Figure 4, which compares the heat capacity of benzene across a selection of mechanisms in the knowledge graph. 32−39 We note that in this case, the UI allows us to retrieve the information from the knowledge graph even though benzene appears under three different names: C6H6, 33−36,39 A1, 32,37 and A1-C6H6. 38 We observe, as is wellknown, that the thermodynamic data used for benzene varies across the literature.
Furthermore, the UI allows querying the rate parameters of a reaction of interest. Figure 5 shows pre-exponential factors and reaction rates as a function of temperature for a hydrogen abstraction reaction from methane as reported in the previous studies. 32 −41 Temperature exponents and activation energies   ACS Omega http://pubs.acs.org/journal/acsodf Article are also available via the UI but are not shown here. As before, we find variations in the reported rate parameters. We emphasize that the selection of mechanisms for this study is entirely arbitrary, as one of the goals of this paper is to demonstrate the suitability of the UI to identify and explore the information available in the knowledge graph.
Atmospheric Dispersion of Pollutants Emitted by Ships. In Singapore, the Green Port Programme (GPP), which is part of the Maritime Singapore Green Initiative (MSGI), had come into effect on Jan 1st, 2020, to encourage ocean-going vessels anchoring at the Port of Singapore through the implementation of an incentive-driven model to reduce emissions for achieving environmental sustainability. 42 The GPP reduces the port or harbor dues by 25% if ships use liquefied natural gas (LNG) as a marine fuel and meet the energy efficiency design index (EEDI) defined by the International Maritime Organisation (IMO). This indicates that the GPP does not make it mandatory to use a specific fuel. Although there is an allowed upper limit (≤0.50% m/m) on the amount of sulfur content in clean fuels used in such vessels, emissions of sulfur dioxide (SO 2 ), nitrogen dioxide (NO 2 ), ozone (O 3 ), carbon monoxide (CO), and particulate matter PM 2.5 and PM 10 from each ship can be arbitrary.
Predicting the dispersion of emissions from ships involves heterogeneous data, models, and tools from different domains. Interoperability and how it can be achieved in applications involving multiple domains is illustrated in Figure 6, which shows a cross-domain use-case from JPS (http://www. theworldavatar.com/JPS/?lat=52.076&lon=4.31&zoom=14. 5&tilt=0.0&rotation=0.6). As shown in the figure, the SRM Engine Suite (https://cmclinnovations.com/products/srm), which is a software developed to evaluate the performance of

ACS Omega
http://pubs.acs.org/journal/acsodf Article and emissions from internal combustion engines, within JPS simulates the exhaust emissions from a ship's diesel engine. ADMS, the atmospheric dispersion modeling system (https:// cerc.co.uk/environmental-software.html), simulates the dispersion of pollutants emitted from each point source. ADMS uses real-time weather data extracted from the web and added to the JPS knowledge graph by agents. In the simulations, SRM uses reaction mechanisms retrieved automatically by an agent from the knowledge graph via SPARQL queries using IRIs of the mechanisms. The response from the knowledge graph is the corresponding mechanism in RDF, which is converted to a form that is processable by the SRM. In this use-case, we use several ontologies including OntoKin and OntoCAPE to enable interoperability between software from different domains. The atmospheric dispersion of the emissions is visualized in JPS using Google Maps (Figure 7).

■ CONCLUSIONS
In this paper, we have demonstrated how a knowledge-graph approach can be used to address naming and data inconsistency problems in chemical kinetics and achieve interoperability, allowing to describe complex combustionderived air pollution scenarios. We showed two use-cases. In the first one, we used OntoKin, an ontological model that captures the semantics of chemical kinetic reaction mechanisms as they are used in combustion, to represent a collection of mechanisms from the literature and thus integrate them into the knowledge graph of the J-Park Simulator. We applied the ontological tools we have developed to query across multiple mechanisms, and identified variations in thermodynamic data as well as reaction rates. The tools provide a first step toward facilitating querying and comparing mechanisms via the Semantic Web. In the second use-case, we integrated a kinetic fuel model with an internal combustion engine model, realtime weather and ship location data, and an atmospheric pollutant dispersion model to simulate emissions from ships, thus establishing interoperability between a number of software agents and heterogeneous data sources. In the future, the amount of data in the knowledge graph is scaled up, including links to other types of data sources and identification of the highest quality thermodynamic and kinetic data, and more advanced tools for human and machine−interaction will be developed in the form of more intelligent agents acting on the knowledge graph.