Chemical Multiverse and Diversity of Food Chemicals

Food chemicals have a fundamental role in our lives, with an extended impact on nutrition, disease prevention, and marked economic implications in the food industry. The number of food chemical compounds in public databases has substantially increased in the past few years, which can be characterized using chemoinformatics approaches. We and other groups explored public food chemical libraries containing up to 26,500 compounds. This study aimed to analyze the chemical contents, diversity, and coverage in the chemical space of food chemicals and additives and, from here on, food components. The approach to food components addressed in this study is a public database with more than 70,000 compounds, including those predicted via omics techniques. It was concluded that food components have distinctive physicochemical properties and constitutional descriptors despite sharing many chemical structures with natural products. Food components, on average, have large molecular weights and several apolar structures with saturated hydrocarbons. Compared to reference databases, food component structures have low scaffold and fingerprint-based diversity and high structural complexity, as measured by the fraction of sp3 carbons. These structural features are associated with a large fraction of macronutrients as lipids. Lipids in food components were decompiled by an analysis of the maximum common substructures. The chemical multiverse representation of food chemicals showed a larger coverage of chemical space than natural products and FDA-approved drugs by using different sets of representations.


Table S1
Descriptive statistics of physicochemical and constitutional descriptors computed for food components (FooDB), natural products (UNPD-A), FDA-approved drugs, and commercially available compounds from FooDB.

S3
Table S2 Descriptive statistics of complexity indexes for food components (FooDB), natural products (UNPD-A), FDA-approved drugs, and commercially available compounds from FooDB.

Figure S1
Distribution of physicochemical properties and constitutional descriptors of interest among approved drugs, compounds of FooDB, commercially available compounds of FooDB, and natural products.

Figure S3
Examples of chemical structures present in some clusters found in food components.S13

Figure S4
Representative maximum substructures of natural products and FDA-approved drugs, computed for some clusters.n represents the number of molecules that share the substructure within the cluster.

S17
Table S3 Descriptive statistics of natural product-likeness scores computed for food components (FooDB), natural products (UNPD-A), FDA-approved drugs, and commercially available compounds from FooDB.

Table S4
Descriptive statistics of similarity distribution computed for food components (FooDB), natural products (UNPD-A), FDA-approved drugs, and commercially available compounds from FooDB.

Table S5
Summary of the food components profiling according to their biosynthetic pathway, superclass, and class (based on NPClassifier).

Table S6
Summary of the natural products from UNPD-A profiling predicted according to their biosynthetic pathway, superclass, and class (based on NPClassifier).

Table S7
Summary of the FDA-approved drugs profiling predicted according to their biosynthetic pathway, superclass, and class (based on NPClassifier).

Table S8
Summary of the commercially available food components profiling, predicted according to their biosynthetic pathway, superclass, and class (based on NPClassifier).

Figure S5
Chemical multiverse visualization of food components, and their comparison with natural products and approved drugs, using t-SNE and ECFP6 as molecular representations.

S22
Table S1.Descriptive statistics of physicochemical and constitutional descriptors computed for food components (FooDB), natural products (UNPD-A), FDA-approved drugs, and commercially available compounds from FooDB.

Figure S1 .
Figure S1.Distribution of physicochemical properties and constitutional descriptors of interest among approved drugs (orange), compounds of FooDB (red), commercially available compounds of FooDB (green), and natural products (UNPD-A; yellow): a) number of heavy atoms, b) number of ring structures, c) number of heteroatoms, d) number of alicyclic rings of carbon, e) number of alicyclic rings with heteroatoms, f) number of aromatic rings of carbon, g) number of aromatic rings with heteroatoms, h) number of aromatic rings, i) number of acidic atoms.Dotted lines are used for ease of visualization.

Figure S1 (
Figure S1 (continued).Distribution of physicochemical properties and constitutional descriptors of interest among approved drugs (orange), compounds of FooDB (red), commercially available compounds of FooDB (green), and natural products (UNPD-A; yellow): j) number of aromatic atoms, k) number of basic atoms, l) number of nitrogen atoms, m), number of oxygen atoms, n) number of chiral centers, o) number of halogen atoms (cont.).

Figure S2 .
Figure S2.Density plot of CSP3 vs. DataWarrior complexity index pairwise comparison, computed for a) food components (FooDB, gray-red), b) natural products (UNPD-A, gray-yellow), b) FDA-approved drugs (gray-orange), and d) commercially available compounds from FooDB (gray-green).The density of data points is represented in a continuous scale from denser (colored), to less dense (gray).

Figure S3 .
Figure S3.Examples of chemical structures present in some clusters that are found in food components (continued).

Figure S3 .
Figure S3.Examples of chemical structures present in some clusters that are found in food components (continued).

Figure S4 .
Figure S4.Representative maximum substructures of natural products (UNPD-A) and FDA-approved drugs, computed for some clusters.The number below each structure is the number of molecules that share the substructure within each cluster.

Figure S5 .
Figure S5.Chemical multiverse visualization of food components, and their comparison with natural products and approved drugs, using t-SNE and ECFP6 as molecular representations.

Table S2 .
Descriptive statistics of complexity indexes for food components (FooDB), natural products (UNPD-A), FDA-approved drugs, and commercially available compounds from FooDB.
a std: standard deviation.b min: minimum value.c Q1: value under which 25% of data points are found in increasing order.d Q3: value under which 75% of data points are found in increasing order.e max: maximum value.S10

Table S3 .
Descriptive statistics of natural product-likeness scores computed for food components (FooDB), natural products (UNPD-A), FDA-approved drugs, and commercially available compounds from FooDB.
a std: standard deviation.b min: minimum value.c Q1: value under which 25% of data points are found in increasing order.d Q3: value under which 75% of data points are found in increasing order.e max: maximum value.

Table S5 .
Summary of the food components profiling according to their biosynthetic pathway, superclass, and class (based on NPClassifier).

Table S6 .
Summary of the natural products from UNPD-A profiling predicted according to their biosynthetic pathway, superclass, and class (based on NPClassifier).