Electronic Nose for Improved Environmental Methane Monitoring

Reducing emissions of the key greenhouse gas methane (CH4) is increasingly highlighted as being important to mitigate climate change. Effective emission reductions require cost-effective ways to measure CH4 to detect sources and verify that mitigation efforts work. We present here a novel approach to measure methane at atmospheric concentrations by means of a low-cost electronic nose strategy where the readings of a few sensors are combined, leading to errors down to 33 ppb and coefficients of determination, R2, up to 0.91 for in situ measurements. Data from methane, temperature, humidity, and atmospheric pressure sensors were used in customized machine learning models to account for environmental cross-effects and quantify methane in the ppm–ppb range both in indoor and outdoor conditions. The electronic nose strategy was confirmed to be versatile with improved accuracy when more reference data were supplied to the quantification model. Our results pave the way toward the use of networks of low-cost sensor systems for the monitoring of greenhouse gases.


REFERENCE EQUIPMENT
The results of the gas mixing system (GMS) and the ultra-portable greenhouse gas analyzer (UGGA) calibration using the gas chromatograph (GC) are shown in Figure S1.Mean uncertainties relative to the GC are included in the graph.

HUMIDITY AND TEMPERATURE STUDY INSIDE THE GAS TEST CHAMBER
The water vapor (H2O(g)) concentration and temperature (T) along the gas test chamber reported by the SHTC1 sensor before starting the calibration measurements at three different points (introduced gas, in the chamber near the inlet and at the central part of the chamber) are summarized in Table S1.While introducing 15.3 g•m -3 (76.0%RH at 22.6 °C) by means of the GMS, the SHTC1 reported a mean value of 15.5 g•m -3 (35.6% RH at 37.0 °C) inside Figure S1.Methane concentration supplied by our gas mixing system and the corresponding concentrations measured with an ultraportable greenhouse gas analyser and a gas chromatograph.the chamber.Thus, we found similar H2O(g) concentration before and along the chamber; about 15.4 g•m -3 , with errors below 4%.Notes: Relative humidity and temperature values measured with a SHTC1 before and along the gas test chamber when the gas mixing system was supplying 76% of relative humidity, and the corresponding H2O(g) concentrations calculated at 1018 hPa.

STICKER LABELS STUDY
The results of the two measurements, with and without sticker labels, were compared and no relevant difference was observed after the sensor stabilization period, see Figure S2.We conclude that the sticker labels used did not influence the measurements.showing no relevant differences after stabilization.

REGIONS OF INTEREST AND FEATURE CALCULATION DETAILS
The three types of two-minute intervals features used for sensor signal evaluation (mean value, slope, and fast Fourier transform, chosen because they are not mathematically dependent from each other and, therefore, reporting new and different types of information from the sensor signal, were calculated in the following way: 1. Mean value: summing up the value of the sensor signal at minute i and minute i+1, and dividing by two. 2. Linear slope: between the sensor signal (y) at minute i and minute i+1, and time (t), i.e., the steepness of the sensor signal  = ∆/∆.3. Fast Fourier transform (FFT): summing up the value of the sensor signal at minute i and minute i+1, dividing by two, and then converting the sensor signal from time domain to frequency domain using the FFT algorithm 1 in Matlab software.Later, calculating its absolute value, and calculating its natural logarithm.
The relevance of each feature depends on each case studied.This was tested by calculating the variable importance in projection (VIP) scores.The VIP scores measure the contribution of each variable to the model and can be used to rank the variables by their importance.Usually, the variables with VIP scores higher than 1 are the most important.We obtained values higher than 1 differently in each of the cases we studied, informing of more relevance for some of the features or for some others depending on the particularities of each situation.The area under the sensor signal and the difference between consecutive values were tested as well and reported similar regression coefficients as the mean value and slope, respectively.For this reason and to avoid overfitting, these two extra features were discarded.For the three used features, we cannot extract a conclusion of which are the most relevant features, but that all of them were important to train models that could adapt to different scenarios.The results shown in Figure S5 present a linear trend for increasing CH4 and H2O(g) concentrations except for values around 2 ppm CH4.We attribute this behavior to particularities of the GMS at this concentration region, which were observed as well when the system was calibrated with the GC (Figure S1).water vapor concentration affordable with our gas mixing system, 1.1 g•m -3 , showing faster stabilization time when compared to gas mixtures with higher water vapor concentrations.
We observe a stabilization time longer than 12 h when measurements start with H2O(g) concentrations of 14.0 g•m -3 H2O(g) (Figure S6a).The baseline drift during stabilization is lower for TGSC than for TGSE sensors (Figure S6a).Calibrations starting with 1.1 g•m -3 (3.0% RH at 33.5 °C) show a stabilization of the signals of both type of sensors below 4 h (Figure S6b).Therefore, the sensing material itself needs below 4 h to report a stable signal after initialization of the sensor but this time increases with increasing H2O(g) concentration.

PARTIAL LEAST SQUARES REGRESSION EXAMPLE COEFFICIENTS AND PROCEDURE
The PLSR quantification coefficients obtained for one specific e-nose with the laboratory measurements are summarized in the following Table S3: Table S3.

Example of PLSR laboratory coefficients
Notes: Quantification coefficients and constant term for an example model trained and tested in laboratory under controlled conditions (data shown in Figure 4).Sensor1 and Sensor3 refer to TGS2611-E00 sensors, Sensor2 to TGS2611-C00 sensor, and H2O(g) to the humidity supplied by the gas mixing system.Importantly, note that these coefficients are only valid for specific LCSS and cannot be used for other sensors or LCSS.
As an example on how to use the PLSR coefficients, we will use the standardized values of H2O(g) of 14 g•m -3 and the three standardized sensor signal voltage values in the ROI for 2.5 ppm CH4 (we choose the point 60 from the 280 available from this region) i.e., the predictor variables that will give us the CH4 concentration (symbols explained in main text): X H2O(g) = 1.5714

COMPARISON BETWEEN PUBLISHED APPROACHES
The response towards different gases of semiconductor materials and, in particular SnO2, have been reported previously 2 .TGS-type sensors have existed for several decades and their response towards gases have been studied under controlled conditions in laboratory 3 .For example, the following Clifford equation have been suggested when maintaining constant current values and exposing the TGS sensors to O2, CH4, and H2O(g): Clifford equation (4)   In this equation  0 is the sensor resistance in dry air while   is the resistance at certain CH4 and H2O(g) concentration,   2 is the relative oxygen partial pressure (being one for air),  is the power law exponent which varies from one sensor to another from 0.25 to 0.55, [ 4 ] and Other attempts to quantify CH4 via field calibrations 4,5 are exemplified by: )  − 0.7083)) Riddick non-linear equation (7)   With: Where  is the relative humidity in % and   the air temperature in °C.
In our LCSS, the TGS sensors are connected in series to a resistor (  = 4.7 Ω), the circuit voltage supplied is 5 V (  ), and the voltage measured (  ) varies in relation to the sensing material resistance (  ).Therefore, to calculate (    0 )  we use 6 : Where  0 is the lowest measured sensor output voltage measured during lab calibration at 0 ppm CH4 with the GMS at 12.5 g•m -3 H2O(g) (35% RH at 33 °C) for the Wetland site, 12.1 g•m -3 H2O(g) (30% RH at 33 °C) for the Sludge screw room and Sludge piles, or at minimum background levels, 11.8 g•m -3 H2O(g) (61% RH at 22 °C) for the Garden at 2.069 ppm CH4.
We used the equations from former methods to quantify CH4 using data from various of our data sets from different field sites, and compared the results with our approach, as shown in Table S5 and Figures S8 to S11.For our approach, it must be considered that 20% of the data was used to test the model and that we plot the two-minute means of the UGGA to make it correspond to our trained model.From these results it can be observed that the generalized equations proposed by former methods do not adapt to the particularities that different situations or sensors present, except for the case of Clifford equation, that was elaborated and tested only under controlled laboratory conditions.Therefore, the coefficients from the former methods would need to be adapted for every sensor as our model does, giving new quantification coefficients for each particular set of sensors and field site.For this reason, the former methods were evaluated also by fitting the equation coefficients to the same data used to train our model (referred to as adjusted coefficients).The coefficients were adjusted, i.e., fitted with our data, by using iterative least squares estimation.To ensure that the comparison between former and current methods are made with comparable data, the former methods were also tested with the two-minute interval data.The R 2 results obtained for the previous methods improved 0.1 in the best case (see Table S6), while the RMSE decreased down to 3.7 ppm for the methods with negative R 2 values and increased up to 2.2 ppm for the rest.Notes: Comparison of root mean squared error and R 2 using different published approaches including coefficients proposed as well as coefficients adjusted to the same error measures obtained for the partial least squares regression method presented in this work for the Wetland, Sludge screw room, sludge piles, and garden field data.Note that R 2 can be negative when the mean value represents the data better than the fit obtained 7 .Notes: Comparison of root mean squared error and R 2 using 1-or 2-min averaged data for the previous published approaches and the method presented in this work for the Sludge screw room (best results).Note that R 2 can be negative when the slope and/or the intercept terms are affix so that the mean value represents the data better than the fit obtained 7 .

ACCURACY STUDY
To evaluate how R 2 and RMSE depend on the number of points used to train and test the PLSR, we used the case with the largest amount of reference data (Garden).Figure S12 shows the RMSE and R 2 as a function of data points.The R 2 and RMSEtrain decrease with increasing number of data points, and RMSEtest is always lower than RMSEtrain (always below 100 ppb).
The later points out that our model is not over fitted, otherwise RMSEtrain < RMSEtest.Distance between Rtrain 2 and Rtest 2 can help to understand if the model generalizes well, i.e., if the model would predict correct concentrations with unseen data.Larger distance between Rtrain 2 and Rtest 2 means less ability to generalize.Thus, our model generalizes well in most cases.

Figure S2 .
Figure S2.Temporal signal evolution of a TGS2611-E00 sensor without and with sticker label

Figure S3 .
Figure S3.Example of regions of interest from a measurement performed in the laboratory with

Figure S4 .Figure S5 .
Figure S4.Temporal evolution of the signal of 32 TGS2611-C00 sensors exposed to different concentrations of methane, ranging from 1 to 9 ppm, and water vapor, ranging from 4.5 to 14 g•m -3 ,

Figure S6 .
Figure S6.(a) comparison between the TGS2611-E00 and TGS2611-C00 temporal signal evolutions at 14.0, 12.1, and 10.4 g•m -3 of water vapor when exposed to the same concentrations

Figure S7 .
Figure S7.Boxplot results of analysis of variance showing the differences in the sensor signal

[𝐻 2
] are the concentrations of methane and water vapor expressed in volumetric ppm and   4 and   2  are constants of dimensions ppm -1 .

Figure S8 .
Figure S8.Comparison between methane concentration reported from reference measurements

Figure S9 .
Figure S9.Comparison between methane concentration reported from reference measurements

Figure S10 .
Figure S10.Comparison between methane concentration reported from reference

Figure S11 .
Figure S11.Comparison between methane concentration reported from reference

Figure S12 .
Figure S12.Comparison of R 2 and RMSE as a function of the number of data points used for

Table S2
Results of the two-way analysis of variance of the absolute response of the two types of TGS sensors at seven different water vapor concentrations and seven different methane concentrations.SS is the sum of squares, df the degrees of freedom, MS the mean square, F0 the values of a statistical test with an F-distribution, and p-value the probability that the test statistic will take on a value that is at the least as extreme as the observed value of the statistics.

Table S4 . Examples of PLSR field coefficients. Coefficients Site 1 Garden Site 2 Sludge piles Site 3 Sludge screw room Site 4 Wetland
To convert the result into ppm, we need to multiply the result by the standard deviation of the response training data set and sum it to the mean value of the response training data set (back standardize), that for our laboratory measurements were: Notes: Quantification coefficients and constant terms for the partial least squares regression models trained herein and tested in the different field sites studied in our work, and the corresponding root mean squares error, methane range, percentage ratio between root mean squares error and methane range, and number of reference points available from the reference Greenhouse Gas analyzer for each field site.Sensor1 and Sensor3 are TGS2611-E00 and Sensor2 is a TGS2611-C00, while H2O(g), T, and P are the relative humidity, temperature, and atmospheric pressure monitored by the BME680.See main text for details.

Importantly, note that these coefficients are only valid for specific LCSS and cannot be used for other sensors or LCSS.
Note that RMSEtrain and Rtrain 2 correspond to the well-known measures of error calculated with the train data, while RMSEtest and Rtest 2 are the same measures of error but calculated with the test data. . *