Integrated Probabilistic Annotation: A Bayesian-Based Annotation Method for Metabolomic Profiles Integrating Biochemical Connections, Isotope Patterns, and Adduct Relationships

In a typical untargeted metabolomics experiment, the huge amount of complex data generated by mass spectrometry necessitates automated tools for the extraction of useful biological information. Each metabolite generates numerous mass spectrometry features. The association of these experimental features to the underlying metabolites still represents one of the major bottlenecks in metabolomics data processing. While certain identification (e.g., by comparison to authentic standards) is always desirable, it is usually achievable only for a limited number of compounds, and scientists often deal with a significant amount of putatively annotated metabolites. The confidence in a specific annotation is usually assessed by considering different sources of information (e.g., isotope patterns, adduct formation, chromatographic retention times, and fragmentation patterns). IPA (integrated probabilistic annotation) offers a rigorous and reproducible method to automatically annotate metabolite profiles and evaluate the resulting confidence of the putative annotations. It is able to provide a rigorous measure of our confidence in any putative annotation and is also able to update and refine our beliefs (i.e., background prior knowledge) by incorporating different sources of information in the annotation process, such as isotope patterns, adduct formation and biochemical relations. The IPA package is freely available on GitHub (https://github.com/ francescodc87/IPA), together with the related extensive documentation. W trying to convert raw mass spectrometry (MS) data into useful biological information, the association of the detected experimental features (or groups of features) to specific metabolites plays a pivotal role. This process, called annotation, is one of the major bottlenecks for untargeted metabolomics, and the definition of what represents a valid “metabolite identification” is still under discussion. The Metabolomics Standards Initiative (MSI, http://www. metabolomics-msi.org) defines 4 different levels of identification. The highest confidence is achieved with level 1, which requires at least two orthogonal molecular properties to be confirmed (e.g., retention time and exact mass) with the aid of a pure standard analyzed in the same analytical conditions. Levels 2 and 3 (putatively annotated compounds or compound classes) are achieved with a simple comparison of molecular properties against literature and databases. Level 4 refers to unknown compounds. Confident and certain identification (level 1) requires a significant effort and is not always achievable for a number of reasons: it is expensive, as in many cases, it is necessary to rebuild the database of standards when the analytical conditions change, and often there is no commercially available standard for the relevant metabolites. Therefore, in most cases only a level 2 identification is achieved, by comparing the detected m/zs with available databases, such as KEGG, HMDB, ECMDB, Lipid Maps, and PubChem. Because of the nature of the technology, MS data show a high grade of redundancy, with several features corresponding to the same metabolite due to naturally occurring isotopes, adduct formation and in-source fragmentation. Such redundancy negatively affects the annotation process by increasing the number of possible misidentifications. In the past few years, a number of different methods have been developed in order to improve the performance of the annotation process. For example, Creek et al. successfully applied a retention time prediction approach. Through this approach, they removed 40% of the misannotated compounds, which showed inconsistency between m/z and retention time. Other methods tried to tackle the Received: May 21, 2019 Accepted: September 11, 2019 Published: September 11, 2019 Article pubs.acs.org/ac Cite This: Anal. Chem. 2019, 91, 12799−12807 © 2019 American Chemical Society 12799 DOI: 10.1021/acs.analchem.9b02354 Anal. Chem. 2019, 91, 12799−12807 This is an open access article published under a Creative Commons Attribution (CC-BY) License, which permits unrestricted use, distribution and reproduction in any medium, provided the author and source are cited. D ow nl oa de d vi a 94 .6 .2 28 .1 97 o n Ja nu ar y 4, 2 02 1 at 1 1: 05 :0 9 (U T C ). Se e ht tp s: //p ub s. ac s. or g/ sh ar in gg ui de lin es f or o pt io ns o n ho w to le gi tim at el y sh ar e pu bl is he d ar tic le s. annotation by grouping together the features that appear to be related to the same compound. Such an approach reduces the number of queries in the databases and the false annotation of adducts, fragments and isotopes. It is noteworthy that, regardless how sophisticated, the grouping algorithm is prone to errors, especially when coelution is observed. Additionally, this approach does not help in those cases where the same feature has more than one hit in the database. An interesting approach called CliqueMS was recently introduced by Senan et al. This approach aims for the annotation of LC-MS features by considering the similarity between coelution profiles and a calculated natural frequency of adduct formation observed in real complex biological samples and pure compounds from the NIST database. Rogers et al. (2009) demonstrated that through a Bayesian approach it is possible to improve the annotation process by including different sources of information. Here we present IPA (integrated probabilistic annotation), a Bayesian annotation method building on this earlier work, which is able to provide a statistically rigorous assessment of our confidence in any putative annotation and is also able to update and refine our beliefs by incorporating different sources of information, such as isotope patterns, adduct formation, and biochemical relations. Compared to the original implementation of Bayesian metabolite annotation, IPA provides substantial improvements. In fact, it is able to (a) integrate retention time information in the estimation of prior probabilities; (b) integrate the possibility that the peak under consideration is based on an “unknown” (i.e., not present in the database under consideration); (c) treat each source of information separately giving different weight to each of them; and (d) reject implausible connections for adducts and isotopes according to several criteria (e.g., mismatching retention time). The method has been validated both through a simulated data set and through two untargeted metabolomics experiments specifically designed to provide a reliable benchmark for annotation methods. Furthermore, the flexibility and the modular design of our method will allow the easy integration of additional sources of information (e.g., fragmentation patterns or ion mobility drift times). ■ MATERIALS AND METHODS The typical data processing pipeline for untargeted LC/MSbased metabolomics usually ends up with a set of several thousands of redundant metabolic “features”, generated by the presence of hundreds of metabolites in the biological sample analyzed. Each of these features is defined by 3 quantities: its average mass-to-charge ratio (m/z), average retention time (RT), and maximum intensity. When fragmentation data is not available, the annotation of these features is usually based on the matching of all or some of the m/z values with different databases, which often leads to uncertain or incorrect assignments. When considering the association of a specific feature to one adduct of a specific compound our confidence could be simply based on how close the measured and theoretical m/z values are. However, our initial belief will change according to several observations: do we also observe the naturally occurring isotopes of the compound considered? Do the isotope signals show the expected relative intensity? Do we see the expected fragments or adducts that are usually detected in our system due to this compound? How plausible is the presence of this compound in the analyzed sample? Is this compound biochemically related to any of the other compounds found in the sample, for example as a commonly seen degradation product? The method described in this manuscript is based on the simple assumption that each annotation can be informed by all the others. Therefore, the performance of the annotation process can be significantly improved by incorporating additional information. 1. Adduct formation and in-source fragmentation: It is well-known that each compound generates several peaks due to in-source fragmentation and adduct formation. Even with the softest ionization method, electrospray ionization, in-source fragmentation is always present and often generates fragments of identical m/z to common metabolites. Ignoring this kind of information can easily cause misannotation, and it has been integrated into the IPA method as it can significantly help the annotation process. 2. Isotope patterns: Even with 1 ppm mass accuracy the information about the expected isotopes can significantly improve the annotation process. Isotope peak intensity can also be informative; in fact, given the chemical formula, we are able to predict the isotopes’ relative abundance and use this information during annotation; 3. Biochemical connectivity: When analyzing a biological sample, it is safe to assume that almost all metabolites are members of the same metabolic network, that is, they are connected to other compounds in the same sample by simple (bio)chemical transformations. Considering this information during annotation helps to shift our beliefs toward compounds that are more likely to be part of the same metabolic network and away from more exotic compounds. For example, the presence of a particular compound becomes more likely if its biosynthetic precursors are also observed in the experiment. All of these sources of evidence are routinely considered in the manual annotation of LC/MS metabolomics data, and IPA formalizes them in a unified statistical framework. Formal IPA Description. Database. The IPA method relies on a structured database encoding all the information used during the annotation process. For every compound considered, the database must contain an unequivocal identifier, compound name, chemical formula, monoisotopic mass, positive and negative main adducts (i.e., the adduct known or thought to be the most intense) and positive and negative adducts and fragments (i.e., the complete list of adducts and/or fragments that are known or thought to be formed by the compound). Additionally, the database can contain a list of alternative names, a list of unequivocal identifiers for the reactions involving the compound, a list of alternative identifiers and a retention time range where the compound is considered more likely to be detected. The IPA package already includes a database containing all the compounds found in the KEGG database with the addition of several compounds needed for the validation examples considered here (e.g., all the compounds involved in the mevalonate pathway and limonene biosynthesis). This database has been initially populated assuming that all compounds show the same behavior: only a handful of adducts are generated by each compound ([M + H], [M + Na], [M + 2H], and [2M + H] for positive mode and [M − H]−, [2M − H]−, [M − 2H]2−, and [3M − H]− for negative mode) and the protonated ([M + H]) and deprotonated ([M Analytical Chemistry Article DOI: 10.1021/acs.analchem.9b02354 Anal. Chem. 2019, 91, 12799−12807 12800 − H]−) ions are always considered as the main ions depending on the ionization mode. It should be noted that this assumption has only been used for the initial population of the database, and can be adjusted by the user as considered appropriate, given that adduct formation is different for different metabolites and for different experimental conditions. Moreover, the KEGG reaction database has been used to populate the list of reactions involving each compound. The IPA package provides the functionality for updating the database according to previous observations. For example in the course of this work, several standard mixes have been analyzed and used to update the database as described in the Supporting Information. Prior Probabilities. Given the data set containing the measured m/z values, retention times, and intensities associated with all the M detected features, the function f ind.hits() is used to identify all the database hits given a userdefined accuracy window. With the help of the enviPat package, this function also maps all the possible isotopes for each formula identified resulting in the collection of all the C adducts, fragments, and isotopes that could be associated with the measured mass-to-charge ratios. When trying to associate measured to theoretical mass-to-charge ratios, the conventional approach relies on selecting the theoretical mass-to-charge ratios that show an accuracy value better than an instrumentspecific threshold. In the case of multiple hits and absence of additional information, it is reasonable to assume that the most likely annotation corresponds to the match with the best accuracy (closest mass match). Following this simple reasoning, the compute.Priors() function evaluates the prior probabilities using a Gaussian model based on the difference between measured and theoretical mass-to-charge ratio. While a Gaussian model appears to be the most reasonable choice, different noise models can be considered. For example, it would be easy to implement a model based on a uniform distribution where all assignments showing an accuracy better than a specific threshold have the same likelihood. Let Z be the (C × M) binary matrix of assignments, where the single element zc,m = 1 if the mass-to-charge ratio m is assigned to the formula c. The likelihood is defined as p z x y x y ( 1, , , ) ( 0, ) c m m c m c prior , 2 2 σ σ = = − | (1) where yc represents the cth theoretical mass-to-charge ratio and xm represents the mth measured mass-to-charge ratio. Describing the measurement error as a Gaussian distribution with zero mean requires the definition of the model standard deviation σ, which is related to the mass accuracy of the instrument used. A different standard deviation is computed for each mass-to-charge ratio according to the user-provided ppm value according to

W hen trying to convert raw mass spectrometry (MS) data into useful biological information, the association of the detected experimental features (or groups of features) to specific metabolites plays a pivotal role. This process, called annotation, is one of the major bottlenecks for untargeted metabolomics, 1 and the definition of what represents a valid "metabolite identification" is still under discussion. 2 The Metabolomics Standards Initiative (MSI, http://www. metabolomics-msi.org) defines 4 different levels of identification. 3 The highest confidence is achieved with level 1, which requires at least two orthogonal molecular properties to be confirmed (e.g., retention time and exact mass) with the aid of a pure standard analyzed in the same analytical conditions. Levels 2 and 3 (putatively annotated compounds or compound classes) are achieved with a simple comparison of molecular properties against literature and databases. Level 4 refers to unknown compounds. Confident and certain identification (level 1) requires a significant effort and is not always achievable 2 for a number of reasons: it is expensive, as in many cases, it is necessary to rebuild the database of standards when the analytical conditions change, and often there is no commercially available standard for the relevant metabolites. Therefore, in most cases only a level 2 identification is achieved, by comparing the detected m/zs with available databases, such as KEGG, 4,5 HMDB, 6 ECMDB, 7,8 Lipid Maps, 9 and PubChem. 10 Because of the nature of the technology, MS data show a high grade of redundancy, with several features corresponding to the same metabolite due to naturally occurring isotopes, adduct formation and in-source fragmentation. 11 Such redundancy negatively affects the annotation process by increasing the number of possible misidentifications. In the past few years, a number of different methods have been developed in order to improve the performance of the annotation process. 12 For example, Creek et al. successfully applied a retention time prediction approach. Through this approach, they removed 40% of the misannotated compounds, which showed inconsistency between m/z and retention time. Other methods tried to tackle the annotation by grouping together the features that appear to be related to the same compound. 14−17 Such an approach reduces the number of queries in the databases and the false annotation of adducts, fragments and isotopes. It is noteworthy that, regardless how sophisticated, the grouping algorithm is prone to errors, especially when coelution is observed. Additionally, this approach does not help in those cases where the same feature has more than one hit in the database. An interesting approach called CliqueMS was recently introduced by Senan et al. 18 This approach aims for the annotation of LC-MS features by considering the similarity between coelution profiles and a calculated natural frequency of adduct formation observed in real complex biological samples and pure compounds from the NIST database. Rogers et al. (2009) demonstrated that through a Bayesian approach it is possible to improve the annotation process by including different sources of information. Here we present IPA (integrated probabilistic annotation), a Bayesian annotation method building on this earlier work, which is able to provide a statistically rigorous assessment of our confidence in any putative annotation and is also able to update and refine our beliefs by incorporating different sources of information, such as isotope patterns, adduct formation, and biochemical relations. Compared to the original implementation of Bayesian metabolite annotation, 19,20 IPA provides substantial improvements. In fact, it is able to (a) integrate retention time information in the estimation of prior probabilities; (b) integrate the possibility that the peak under consideration is based on an "unknown" (i.e., not present in the database under consideration); (c) treat each source of information separately giving different weight to each of them; and (d) reject implausible connections for adducts and isotopes according to several criteria (e.g., mismatching retention time). The method has been validated both through a simulated data set and through two untargeted metabolomics experiments specifically designed to provide a reliable benchmark for annotation methods. Furthermore, the flexibility and the modular design of our method will allow the easy integration of additional sources of information (e.g., fragmentation patterns or ion mobility drift times).

■ MATERIALS AND METHODS
The typical data processing pipeline for untargeted LC/MSbased metabolomics usually ends up with a set of several thousands of redundant metabolic "features", generated by the presence of hundreds of metabolites in the biological sample analyzed. 21−25 Each of these features is defined by 3 quantities: its average mass-to-charge ratio (m/z), average retention time (RT), and maximum intensity. When fragmentation data is not available, the annotation of these features is usually based on the matching of all or some of the m/z values with different databases, which often leads to uncertain or incorrect assignments. When considering the association of a specific feature to one adduct of a specific compound our confidence could be simply based on how close the measured and theoretical m/z values are. However, our initial belief will change according to several observations: do we also observe the naturally occurring isotopes of the compound considered? Do the isotope signals show the expected relative intensity? Do we see the expected fragments or adducts that are usually detected in our system due to this compound? How plausible is the presence of this compound in the analyzed sample? Is this compound biochemically related to any of the other compounds found in the sample, for example as a commonly seen degradation product? The method described in this manuscript is based on the simple assumption that each annotation can be informed by all the others. Therefore, the performance of the annotation process can be significantly improved by incorporating additional information.
1. Adduct formation and in-source fragmentation: It is well-known that each compound generates several peaks due to in-source fragmentation and adduct formation. Even with the softest ionization method, electrospray ionization, 26,27 in-source fragmentation is always present and often generates fragments of identical m/z to common metabolites. 28 Ignoring this kind of information can easily cause misannotation, and it has been integrated into the IPA method as it can significantly help the annotation process. 2. Isotope patterns: Even with 1 ppm mass accuracy the information about the expected isotopes can significantly improve the annotation process. 29 Isotope peak intensity can also be informative; in fact, given the chemical formula, we are able to predict the isotopes' relative abundance 30,31 and use this information during annotation; 3. Biochemical connectivity: When analyzing a biological sample, it is safe to assume that almost all metabolites are members of the same metabolic network, that is, they are connected to other compounds in the same sample by simple (bio)chemical transformations. Considering this information during annotation helps to shift our beliefs toward compounds that are more likely to be part of the same metabolic network and away from more exotic compounds. For example, the presence of a particular compound becomes more likely if its biosynthetic precursors are also observed in the experiment.
All of these sources of evidence are routinely considered in the manual annotation of LC/MS metabolomics data, and IPA formalizes them in a unified statistical framework. Formal IPA Description. Database. The IPA method relies on a structured database encoding all the information used during the annotation process. For every compound considered, the database must contain an unequivocal identifier, compound name, chemical formula, monoisotopic mass, positive and negative main adducts (i.e., the adduct known or thought to be the most intense) and positive and negative adducts and fragments (i.e., the complete list of adducts and/or fragments that are known or thought to be formed by the compound). Additionally, the database can contain a list of alternative names, a list of unequivocal identifiers for the reactions involving the compound, a list of alternative identifiers and a retention time range where the compound is considered more likely to be detected. The IPA package already includes a database containing all the compounds found in the KEGG database 4 It should be noted that this assumption has only been used for the initial population of the database, and can be adjusted by the user as considered appropriate, given that adduct formation is different for different metabolites and for different experimental conditions. Moreover, the KEGG reaction database has been used to populate the list of reactions involving each compound. The IPA package provides the functionality for updating the database according to previous observations. For example in the course of this work, several standard mixes have been analyzed and used to update the database as described in the Supporting Information.
Prior Probabilities. Given the data set containing the measured m/z values, retention times, and intensities associated with all the M detected features, the function f ind.hits() is used to identify all the database hits given a userdefined accuracy window. With the help of the enviPat package, 31 this function also maps all the possible isotopes for each formula identified resulting in the collection of all the C adducts, fragments, and isotopes that could be associated with the measured mass-to-charge ratios. When trying to associate measured to theoretical mass-to-charge ratios, the conventional approach relies on selecting the theoretical mass-to-charge ratios that show an accuracy value better than an instrumentspecific threshold. In the case of multiple hits and absence of additional information, it is reasonable to assume that the most likely annotation corresponds to the match with the best accuracy (closest mass match). Following this simple reasoning, the compute.Priors() function evaluates the prior probabilities using a Gaussian model based on the difference between measured and theoretical mass-to-charge ratio. While a Gaussian model appears to be the most reasonable choice, different noise models can be considered. For example, it would be easy to implement a model based on a uniform distribution where all assignments showing an accuracy better than a specific threshold have the same likelihood. Let Z be the (C × M) binary matrix of assignments, where the single element z c,m = 1 if the mass-to-charge ratio m is assigned to the formula c. The likelihood is defined as where y c represents the cth theoretical mass-to-charge ratio and x m represents the mth measured mass-to-charge ratio. Describing the measurement error as a Gaussian distribution with zero mean requires the definition of the model standard deviation σ, which is related to the mass accuracy of the instrument used. A different standard deviation is computed for each mass-to-charge ratio according to the user-provided ppm value according to m z ppm / 2 10 6 σ = · × (2) Using eq 2 to estimate the standard deviation, we are assuming that the instrument's accuracy is better than the user-defined threshold ≅ 95.45% of the times. It is reasonable to assume that the probability of any measured mass-to-charge ratio not being related to any of the chemical formulas contained in the database is always greater than zero, that is, there is always a finite probability that the observed mass-to-charge ratio corresponds to an unexpected or novel molecule. To take care of this, the function accepts an user-defined value for the accuracy expressed in ppm (ppm u ) associated with the "unknown" molecule. The estimation of the prior probabilities is then performed according to Algorithm 1. Moreover, the prior estimation has been designed in such a way that it is easy to include additional sources of information. For example, there may be evidence from previous experiments or from literature indicating that specific compounds are more or less likely to be detected in the sample analyzed. This prior knowledge can be easily added to the estimation of the prior as a multiplicative term (p prior ). Despite the relatively poor reproducibility of liquid chromatography retention times between different laboratories or experimental runs, IPA also offers the possibility of considering information regarding RT. In fact, for each compound contained in the database, it is possible to provide an RT range (as broad as reasonable) outside of which the presence of the compound is considered to be unlikely in light of previous experimental evidence or an RT prediction method. 13,33 The multiplicative term related to the retention time (p RT ) will be equal to a user-defined value (≤1), when the RT is found outside this range, and will be equal to 1 otherwise. Additional multiplicative terms (e.g., regarding fragmentation patterns) could be easily implemented.
Creating Connectivity Matrices. As mentioned in the introduction, the IPA package improves the quality of the annotation process by considering the relationships between adducts, fragments, isotopes, and biochemically related compounds. Such information is encoded in three connectivity matrices generated by specific functions.
T h e build.add.connenctivity.matrix() creates a (C × C) binary matrix containing the relationships between adducts and fragments. By default, this matrix connects each monoisotopic adduct to its related main ion, typically the protonated or deprotonated version of the molecule. Alternatively, the user can select the f ully connected parameter, and W add will connect all adducts related to the same compound with each other. 2. Isotope matrix (W iso ): The build.iso.connectivity.matrix() function creates a (C × C) matrix connecting each m/z to its two (or one) isotopologues showing the highest predicted abundance. If the ith and jth m/z values are connected by an isotope relationship, the w i,j element contains the expected intensity ratio between the two. Thanks to this information, the package can filter out isotope connections showing inconsistent intensity ratios. compound or by considering a limited number of biochemical reaction classes that usually occur in a metabolic network. 34 Posterior Probabilities. When annotating by simply querying a database, finding more than one possible hit is common. In such cases, the identification of the most reasonable assignment can be aided by the assignments of the other m/z values detected. For example, the presence of isotopes consistent with the predicted abundance ratios helps to reduce the number of plausible compounds. 35 This can be easily achieved when the assignments of the isotopes are considered as certain and independent. In reality, all assignments are potentially interdependent, which makes the incorporation of the information about possible connections between m/z values (i.e., computing the posterior probabilities) very challenging, as all the possible assignments have to be considered at once. It is possible, however, to define a prior distribution for one mass conditioned on the current assignments of all the other masses and this can be used to build an efficient Gibbs sampling scheme 19 which allows us to draw samples from the posterior distributions. Such conditional priors can be defined according to different sources of information (i.e., different networks of possible interactions). For example, the conditional prior probability of the mth m/z being assigned to the cth formula, given the network of all the possible biochemical connections defined in the W bio binary matrix is defined as where C is the number of chemical formulas in the database, while the parameter δ B can be thought of as the parameter for a Dirichlet prior on a multinomial distribution over β cm and expresses our confidence on the information encoded by the connectivity matrix considered (W bio ). In fact, the smaller is δ B , the higher is the effect of the connections on the conditional prior and consequently on the posterior probabilities. β cm represents the number of relationships the cth compound shows with all the other formulas that are currently assigned to all the other m/z values. It can be computed as where 1 is the (M × 1) vector of ones and W c • and Z • m represent respectively the cth row of the connectivity matrix W and the mth column of the assignments matrix Z. Similarly to eq 3, it is also possible to define the conditional prior related to the adducts and isotope relationships The computation of the number of connections for adducts and isotopes is slightly more complicated. In fact, both adduct and isotope relationships should be considered only between m/z values that might be generated from the same molecule. If two detected m/z values do not appear to be related to the same compound (e.g., they show a very different RT), not all possible adduct or isotope connections should be counted when computing the conditional prior. The IPA package can handle this problem in three different ways, according to the information available.
1. If the RT for each measured m/z is provided, the only connections considered are the ones between mass-tocharge ratios showing an RT difference lower than a user-defined threshold. 2. It is possible to group the detected mass-to-charge ratios that are more likely to be related to the same compound with a correlation based approach. 14 IPA is able to consider this grouping through an (M × 1) group label vector. 3. Alternatively, it is possible to provide an (M × M) correlation matrix. If two m/z values show a correlation lower than a user-defined threshold, all possible adduct of isotope connections between them are not considered. Regarding the isotopes, it is also possible to consider information about peak intensity to filter out connections. As mentioned before, it is possible to estimate the expected intensity ratio between isotopes, and this information is stored in the W iso matrix. If the measured intensities are provided, IPA is able to filter out isotope connections between m/z values if the observed intensity ratio is different from the theoretical one (given a user-defined tolerance). The Gibbs sampler has been implemented by considering the following distribution: Normalizing over C, it is finally possible to obtain a proper discrete distribution. The functionality described in this section is available through the function IPAposteriors(), where the Gibbs sampler is implemented as described in Algorithm 2.
Here, the user has to define both the number of iterations (N) and the number of initial iterations (burn in ) that will be ignored when computing the posterior probabilities.   This synthetic experiment contains 15 compounds involved in the mevalonate pathway and limonene synthesis 32 (specifically, the compounds highlighted in blue in Figure 1 Additionally, all the possible isotopes with a predicted relative abundance higher than 5% were included in the data set. Including all adducts and isotopes, a total of 83 m/z values were simulated for the positive mode, and 95 were simulated for the negative mode. A realistic experimental outcome was simulated by adding Gaussian noise to the m/z values ( (0, ) 2 σ ), where appropriate σ values were computed for each m/z assuming an instrument accuracy of 10 ppm (see eq 2). For each detected m/z, the measured intensities were chosen to be coherent with the theoretical abundance of the isotopes, and the same RT value (±2 s) was assigned to all the simulated m/z values related to the same compound. With the goal of showing that every source of information can positively affect the annotation process, the IPA method was applied in three settings: (1) only considering isotope relationships, (2) considering isotopes and adducts relationships, and (3) considering isotopes and adducts relationships and biochemical connections. The scripts used for the generation of the data sets and for the analysis with the IPA package are provided as Supporting Information. Even considering this simple toy example, several simulated features do not have a clear annotation. For example, the feature simulated for the [M + 2H] 2+ adduct of NADP + (C00006) shows several hits in the database when the annotation process is based on mass only. As shown in Figure  2A, the IPA method changes our initial belief toward the correct annotation by considering that it is the only one showing any kind of relationship with the other possible annotations.
The example reported in Figure 2B is slightly more complicated. When considering the measured m/z values as the sole source of information, the feature considered (simulated for the [M + H] + adduct of isopentenyl pyrophosphate (C00129)) has three possible annotations in our database. Two of them (C00129 [M + H] + , isopentenyl pyrophosphate and C00235 [M + H] + , dimethylallyl pyrophosphate) are equally likely as they have the same chemical formula. It should be noted that both these compounds are actually present in the simulated data set. The third possible annotation (C18875 [M + 2H] 2+ , novaluron) is initially the most likely having a m/z closer to the measured one. By considering all the possible connections with the other features present in the data set, this latter annotation becomes extremely unlikely despite being the one with the smallest difference between theoretical and measured m/z. Having one biochemical connection more, the C00129 [M + H] + annotation becomes slightly more likely than C00235 [M + H] + . One should notice that all these assignment are interconnected, therefore this little advantage also increases the posterior probabilities associated with the other adducts associated with C00129. A couple of cherry picked examples cannot be used as an assessment of the annotation performance of the method.
For a comprehensive assessment, the logarithmic predictive score (LPS) was computed as where p i is the probability given to the correct assignment for the ith measured m/z. In the best case scenario, all the probabilities considered by eq 8 are equal to 1, that is, all features are correctly identified, therefore LPS = 0. In all other situations, LPS assumes a negative value. As shown in Figure 3, the integration of each of the additional sources of information drastically improves the predictive power of the annotation method for both simulated data sets.
Escherichia coli Data Set. To show the value of our method in real-life conditions, we designed two different untargeted metabolomics experiments. In the first experiment, a cell extract of a culture of Escherichia coli was chosen as a biological matrix and divided into 4 groups. With the aim of providing a data set that could also be used easily as a benchmark for biomarker discovery methods, three of the groups were spiked with different amounts of 30 standards as described in Supporting Information. The data processing performed through xcms 36 and the mzMatch pipeline 21 led to a data set containing 1961 features. The IPA method was applied to the obtained data set with default parameters and

Analytical Chemistry
Article using the updated database described in Supporting Information. Additionally, during the estimation of the prior probabilities, the multiplicative term p prior was used to take into account our initial belief on the presence of the considered compounds in the biological samples here considered. Specifically, the p prior is equal to 1 when the compound is also present in the E. coli Metabolome database, 7,8 and equal to 0.8 otherwise. After removing the features for which no hit was found considering a maximum accuracy of 15 ppm, the 1093 remaining features went through 5000 iterations of the Gibbs sampler (1000 burn_in). The Gibbs sampler represents the most computationally demanding step of the method described here, and with this data set, took ∼4 h (an average of 2.89 s per iteration) on a Linux desktop with 32 Gb memory and 8-core Intel Xeon E5-2620 2.1 GHz processor. As shown in Figure 4, one of the features detected in this experiment could be associate to the [M + H] + adducts of 3 different compounds: (1) L-proline (C00148), (2) D-proline (C00763), and (3) 3-acetamidopropanal (C00763). Having the same chemical formula, these three annotations should show the same initial probability when considering the mass as only source of information. The estimated prior probability estimated by the IPA package is slightly higher for L-proline (≃38.5%) since it is the only one, among the three

Analytical Chemistry
Article possibilities, that is also present in the ECMDB database. Moreover, the estimation of the posterior probabilities further shifts our belief toward this annotation (posterior probability associated with L-proline ≃ 64.5%). As mentioned before, 30 different compounds were spiked in these samples. In positive mode, only 24 were detected and for 22 of them the IPA method increased the probability associated with the correct annotation of their main adduct (see Table S6). In negative mode, 20 of these compounds were detected and the IPA method shifted the probabilities toward the correct annotation for 18 of their main adducts (see Table S7). When dealing with real-life data, a comprehensive assessment of the method performance is not easy. In contrast to what is possible with the synthetic experiment, the correct annotation in real-life data will only be known for a very small fraction of the detected features. This makes the calculation of a meaningful LPS value impossible. Nevertheless, it is possible to quantify the impact of IPA on the annotation. Out of the 929 features showing more than one possible hit in the database, 268 (≃29%) showed a maximum difference between prior and posterior probabilities higher than 10%. More importantly, for 186 of these features (≃20%), the IPA method changed the most likely annotation or was a tie-breaker in the case of equal prior probabilities. The full results of this analysis together with the data set are available as Supporting Information.
Beer Data Set. In the second experiment, IPA was tested in another untargeted metabolomics experiment where 21 different beers (7 indian pale ales, 7 lagers, and 7 porters) were analyzed as previously described (see Supporting Information for details). After the data processing, a data set containing 3042 features was obtained. The IPA method was applied to this data set using the same parameters and the same database used in the previous example. During the estimation of the prior probabilities, the multiplicative term p prior was again used to take into account our initial belief on the presence of the considered compounds in the samples here considered. Specifically, the p prior is equal to 1, when the compound considered has been previously detected in published metabolomics studies involving beers, 37−42 and equal to 0.8 otherwise. After removing the features for which no hit was found considering a maximum accuracy of 15 ppm, the 2139 remaining features went through 5000 iterations of the Gibbs samples (1000 burn_in). The Gibbs sampler represents the most computationally demanding step of the method described here, and with this data set set took ∼9 h (an average of 6.4 s per iteration) on a Linux desktop with 32 Gb memory and 8-core Intel Xeon E5-2620 2.1 GHz processor. Also in this case, the IPA method is able to provide a probabilistic assessment of our confidence in each annotation. For example, Figure 5 shows one of the detected features, which m/z could be associated with 8 different compound all having the same chemical formula. The prior probability associated with L-tyrosine by IPA is slightly higher than the other possible annotations (≃15.2%) because it has previously been detected in a similar experiment. 38 After the estimation of the posterior probabilities, IPA makes this annotation extremely more likely (≃67.5%). Also in this case, it is possible to quantify the impact of the IPA method. Out of the 1846 features showing more than one possible hit in the database, 683 (≃37%) showed a maximum difference between prior and posterior probabilities higher than 10%. More importantly, for 618 of these features (≃33%) the IPA method changed the most likely annotation or was a tie-breaker in the case of equal prior probabilities. The full results of this analysis together with the data set are available as Supporting Information. The examples shown in Figures 4 and 5 might seem obvious to an expert in metabolomics, but they are not to the traditional automated annotation methods. They have been chosen to highlight how the IPA approach is able to replicate the reasoning of an expert. Two additional examples in the Supporting Information (Figures S1 and S2) further highlight the value of isotope and adduct connections in the automated annotation. To the best of our knowledge, no other method provides the same kind of functionalities introduced by IPA (i.e., a rigorous statistical integration of evidence, with real pvalues rather than arbitrary scores). For this reason, a comparison with other methods is not straightforward. To validate the annotation precision, IPA has been compared with the xMSannotator package. 17 The results are reported in the Supporting Information.

■ CONCLUSION
The IPA method here presented implements a Bayesian-based approach able to incorporate several sources of information within the annotation process. This leads to a significant increase of the predictive power for assigning measured m/z values to putative formulas. Not only does IPA provide more reliable annotations of compounds, it is also able to quantify our confidence in such annotations and re-evaluate them when new information is provided. The IPA package provides a rigorous and comprehensive probabilistic assessment of the confidence in each annotation, which will be highly valuable for the downstream interpretation of the results. Moreover, IPA is also able to store and successfully utilize the additional information gained from previous experiments, thus leading to an iterative improvement of annotations, especially for data sets collected on the same experimental setup and similar biological samples. This "continuous learning" ability of the IPA approach mimics one of the most important features of human manual annotations and ensures that insights from earlier data sets are maintained for future exploitation.

* S Supporting Information
The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.analchem.9b02354.
Description of standards analysis and database update (PDF) data, code, and results of the E. coli data set analysis, data, code and results of the beer data set analysis, and data, code, and results of the comparison with xMSannotator can be found