Chromatographic Fingerprinting Enables Effective Discrimination and Identitation of High-Quality Italian Extra-Virgin Olive Oils

The challenging process of high-quality food authentication takes advantage of highly informative chromatographic fingerprinting and its identitation potential. In this study, the unique chemical traits of the complex volatile fraction of extra-virgin olive oils from Italian production are captured by comprehensive two-dimensional gas chromatography coupled to time-of-flight mass spectrometry and explored by pattern recognition algorithms. The consistent realignment of untargeted and targeted features of over 73 samples, including oils obtained by different olive cultivars (n = 24), harvest years (n = 3), and processing technologies, provides a solid foundation for sample identification and discrimination based on production region (n = 6). Through a dedicated multivariate statistics workflow, identitation is achieved by two-level partial least-square (PLS) regression, which highlights region diagnostic patterns accounting between 58 and 82 of untargeted and targeted compounds, while sample classification is performed by sequential application of soft independent modeling for class analogy (SIMCA) models, one for each production region. Samples are correctly classified in five of the six single-class models, and quality parameters [i.e., sensitivity, specificity, precision, efficiency, and area under the receiver operating characteristic curve (AUC)] are equal to 1.00.


■ INTRODUCTION
Olive oil (OO) is one of the pillars of the Mediterranean diet and represents the main source of fats in the countries of the Mediterranean basin. 1 In particular, extra-virgin olive oil (EVOO) is recognized as the most valuable product among the edible oils; 2 it is extracted from fresh olive fruits (Olea europeae L.) by mechanical or physical technologies that preserve the composition of the lipid fraction while limiting autoxidation reactions and alterations of its native quality. 3 The reason for the increasing demand for olive oil of high quality, i.e., EVOO (Commission of the European Communities, 1991; "EU Food Qual. Labels", 2021; IOC, 2015), is not only because of its nutritional and health values, due to the presence of antioxidants (i.e., tocopherols and phenolic compounds) and high oleic acid content, but also because of its peculiar sensory characteristics 2 strongly related to olive cultivar, pedoclimatic conditions of the harvest region, olive ripeness, and extraction technology (EU Food Qual. Labels, 2021).
In this context, any analytical methodology capable of delineating chemical patterns informative of the different functional variables influencing the composition of EVOO is useful and has the potential to support the valorization of highquality products, facilitate sensory quality evaluation/screenings on the basis of commercial classification, as well as to counteract fraudulent practices. 4−7 In this latter context, the accurate fingerprinting of the unsaponifiable fraction and of minor components by comprehensive two-dimensional (2D) gas chromatography coupled to mass spectrometry (GC × GC− MS) and/or to parallel flame ionization detection (MS/FID) was successful in identifying admixtures of OO with other fats and/or establish the product freshness/shelf life. 8 Active research in the development of GC fingerprinting methodologies includes also investigations on EVOO volatiles. Monodimensional (1D)-GC fingerprinting accompanied by accurate profiling was recently applied to validate the role of sesquiterpene hydrocarbons as geographical origin markers 7 in EVOOs from different cultivars and production areas. In their study, Quintanilla-Casas et al. 7 confirmed the superior discrimination power of the total volatile fingerprint (100% correct classification) obtained by GC−MS raw data processing followed by suitable supervised chemometrics, compared to the targeted profiling of selected sesquiterpenoids, whose correctness ranged between 46 and 100% of the correct classification, as a function of the production country.
Moreover, using volatile fingerprinting based on 1D-GC, it was also possible to support the commercial classification of OO based on sensory panel evaluation. 9 EVOOs are in fact characterized by peculiar yet essential aroma qualities such as green, grassy, and fruity notes, whose perception is at the basis of commercial classification based on EU regulations. Sensory quality in fact, together with compositional/chemical standards to be complied, 5 and absence of off-flavors guide the OO classification in EVOO (median of the defects = 0.0 and median of the positive attribute > 0.0), virgin olive oil (VOOmedian of the defects between 0.0 and 3.5 and median of the positive attribute > 0.0), and lampante olive oil (LOOmedian of the defects > 6.0) in the presence of sensory defects (rancid, fusty/ muddy, musty, and winey/vinegary), even at lower levels (e.g., median of the defects ≠ 0). 5 In this study, we take a step forward in the direction of validating a powerful and highly flexible chromatographic fingerprinting workflow with superior identitation (i.e., defining the identity of a particular food based on the characteristic features that make it singular or unique 10 ) and classification effectiveness compared to existing tools based on 1D-GC data. The improved separation capacity of GC × GC, the analyte retention logic over the separation space, 11 and the comprehensive capture of a component's features generated by time-offlight mass spectrometry (TOF MS) detection make the resulting 2D fingerprints the sample's unique traits for effective and reliable authentication. Moreover, the specificity of the third information dimension of the system (i.e., EI-MS fragmentation patterns) gives access to a higher informative level as any confirmatory analytical technique.
Compared to existing studies adopting GC × GC as the profiling and/or fingerprinting technique, 12−14 the combined information derived from untargeted and targeted (UT) features is here explored in the challenging scenario of Italian highquality EVOO production connoted by an impressive heritage of olive genetic varieties, with about 540 different registered cultivars 15 and 46 protected designation of origin (PDO) products from different geographical locations (i.e., regions) over the entire territory.
The challenge posed by the complexity and high chemical dimensionality of EVOO volatiles is tackled by a dedicated workflow, named combined untargeted and targeted fingerprinting (UT fingerprinting), 16 where the information from known and unknown components patterns are accurately tracked across many samples and their identitation, discrimination, and classification power is examined in great detail with a focus on regional characters. Furthermore, the synergy between profiling and fingerprinting is also examined by observing the distribution of key-aroma compounds and potent odorants strongly correlated to positive and/or negative odor qualities. 17−19 ■ MATERIALS AND METHODS Chemicals. Pure reference standards of αand β-thujone and methyl-2-octynoate used as internal standards (ISs), n-alkanes (from n-C7 to n-C25) used for linear retention index (I T ) calibration, and pure reference compounds for identity confirmation were supplied by Merck (Milan, Italy). Cyclohexane (HPLC grade) for n-alkane dilution and pure dibutyl phthalate used to prepare IS working solutions were also from Merck (Milan, Italy).
Extra-Virgin Olive Oil Samples. Extra-virgin olive oils (EVOOs) were supplied within the VIOLIN project 20 selection. They were obtained from olives of different cultivars harvested between 2016 and 2018 over the Italian territory; all samples were certified as EVOOs by accredited laboratories (ISO 17025:2018) and by the official sensory panel test. Details on the sample set under study, counting 73 samples, are provided in the Supporting Information Table S1 together with harvest/production regions (i.e., Umbria n = 7, Garda lake n = 10, Lazio n = 11, Puglia n = 12, Sicilia n = 13, and Toscana n = 20). Supporting Information Figure S1 shows geographical locations of selected EVOO production sites.
The ISs were preloaded onto the SPME device by sampling 5.0 μL of α/β-thujone and methyl-2-octynoate IS solution (100 mg L −1 ) placed in a 20 mL headspace vial. IS preloading was performed by exposing the SPME device to the HS kept at 40°C for 5 min.
Sampling was carried out on 0.100 ± 0.005 g of oil samples, precisely weighed in 20 mL headspace vials, at 40°C for 60 min under constant stirring. The amount of sample was chosen matching for HS linearity conditions for most of the characteristic analytes of the EVOO volatile fraction. 21,24,25 After extraction, the SPME device was automatically transferred to the split/splitless injection port of the GC × GC system kept at 250°C, and thermal desorption was for 5 min.
GC × GC-TOF MS: Instrument Setup and Conditions. GC × GC analyses were performed on an Agilent 7890B GC unit (Agilent Technologies, Wilmington DE) coupled to a Markes BenchTOF-Select mass spectrometer featuring tandem ionization (Markes International, Llantrisant, U.K.). The GC transfer line was set at 270°C. TOF MS tuning parameters were set for single ionization at 70 eV, and the scan range was set at 40−350 m/z with a spectra acquisition frequency of 100 Hz. The system was equipped with a two-stage KT 2004 loop-type thermal modulator (Zoex Corporation, Houston, TX) cooled with liquid nitrogen and controlled by Optimode v2.0 (SRA Intruments, Cernusco sul Naviglio, Milan, Italy). Modulation period (P M ) and hot jet pulse times were set, respectively, at 3.5 s and 300 ms, with a cold jet stream at the mass flow controller (MFC) from 40 to 8% of the total flow along the run duration. No secondary oven was adopted in the GC × GC setup.
GC × GC Columns and Settings. The column set was configured as follows: 1  The GC split/splitless injector port was kept at 250°C and operated in the split mode with a split ratio of 1:20. The carrier gas was helium at a constant nominal flow of 1.3 mL min −1 . The oven temperature programming was set as follows: from 40°C (2 min) to 240°C (10 min) at 3.5°C min −1 .
The n-alkane solution for I T determination was analyzed under the following conditions: split/splitless injector in the split mode, split ratio 1:50, injector temperature 250°C, and injection volume 1 μL.
Combined Untargeted and Targeted (UT) Fingerprinting Workflow. The data processing workflow was designed to comprehensively capture the chemical signature of volatiles from EVOO samples by computing both peak and peak-region features from untargeted (unknowns) and targeted components located over the 2D space. The approach, named UT fingerprinting, was designed on EVOO volatile patterns and further adapted to compositional peculiarities of samples in other fields. 11 In this study, the targeting (i.e., identification) of analytes was done as the last step of the process after chromatogram realignment over reliable peaks from untargeted components/features.
The generation of untargeted features (i.e., peaks and peak regions) and their realignment across all sample chromatograms were performed by template matching 26 and actively uses metadata, collected for 2D peaks and peak regions (i.e., retention times, MS spectrum, and detector response) above a signal-to-noise (S/N) threshold value of 100, 23 to establish correspondences across 2D patterns. Realignment specificity is done by active constraints on MS similarity [i.e., a threshold value of 750 for direct match factor DMF and reverse match   between template (reference) and candidate (analyzed) "peak spectra". 23,27−29 The chromatographic fingerprinting was performed automatically by the GC Image Investigator V2.9 (GC Image LLC, Lincoln NE) on a random selection of sample chromatograms (n = 25) acquired across a time-frame of 2 weeks. It aligned the 25 chromatograms through reliable peaks for registration and generated a composite chromatogram over which peak-region features were delineated and extracted to form a feature template for further processing. Reliable peaks in this study were those that positively matched across all but one of the selected 25 chromatograms (i.e., most constrained condition option).
The resulting feature template includes untargeted (reliable) peaks and peak regions comprehensively capturing the chemical composition of samples. Figure 1A shows the pseudocolor image of a Sicilian EVOO (#S1) overlaid with 591 peak regions (red graphics) and 159 targeted peaks (green circles). Targeting of informative compounds, including EVOO key-aroma compounds, ripening indicators, and potent odorants responsible for coded defects, 30 was performed at the end of the realignment process over the entire set of chromatograms (n = 73). Identifications were confirmed by authentic standards when available in the authors' laboratory (criterion "a" in Table 1) or by spectral similarity DMF ≥ 900, RMF ≥ 950, and I T tolerance ± 20 units (criterion "b", corresponding to tentative identification in Table 1). Table 1 lists target analytes with 1 D and 2 D retention times ( 1 t R ; 2 t R ), The output table collecting 2D peaks and peak regions aligned across all chromatograms with feature-related metadata ( 1 D and 2 D retention times, MS spectrum, base peak and molecular ion m/z, and TIC response) was stored and made available for further processing.
Supporting Information Table S2 lists untargeted and targeted peakregion features included in the UT template, together with their experimental 1 D I T values, retention times in the two analytical dimensions ( 1 t R , 2 t R ), % relative standard deviation (% RSD) on retention times across all analyses, and reference MS spectral signature from the peak-apex spectrum.
Method Performance Parameters. Repeatability was evaluated on analytical descriptors considered fundamental for an accurate chromatographic fingerprinting based on both 2D peak patterns and analyte responses. Therefore, % RSD was calculated on retention times and analyte % response (% normalized 2D volumes over IS) for all targeted compounds and on analytical replicates of the same sample analyzed every 2 days over the 2 weeks of the study (n = 6). Results are reported in Table 1. Mean % RSDs on retention times were 0.34% for 1 D ( 1 t R ) and 3.01% for 2 D ( 2 t R ). Maximum % RSD on percent response was instead 20.93%, reported for nonanal, while the mean value was 11.98%.
Data Acquisition and 2D Data Processing. Data were acquired by TOF-DS software (Markes International, Llantrisant, U.K.) and processed by the GC Image V2.9 suite (GC Image, LLC Lincoln, NE).
The data files of peak-region features from each chromatogram were exported in the ".xls" format (Microsoft Excel) and then converted to the MATLAB format (version R2017b). All of the multivariate analyses were performed using PLS_Toolbox 8.6.1 (Eigenvector Research, Manson, WA) for the MATLAB environment (MathWorks Inc., Massachusetts, R2017b). Principal component analysis (PCA), partial least-squares regression (PLS), and soft independent modeling for class analogy (SIMCA) were applied as exploratory analysis, variable selection, and classification method, respectively. In addition, data were preprocessed by autoscaling before model development. Microsoft Excel spreadsheet was used for similarity analysis.

■ RESULTS AND DISCUSSION
Chromatographic fingerprinting based on comprehensive twodimensional separations has a great potential for discrimination and identification of samples based on their chemical signatures, a process described as identitation. 10 Moreover, it offers further advantages when mass spectrometry is used at the detection level, providing additional information for analyte putative identification. This step gives access to a higher information level on sample properties and characteristics. 7,18,31 The strategy adopted to decrypt the hidden information from volatile patterns of EVOOs harvested in different Italian regions collects information from untargeted and targeted (UT) features. It is a fingerprinting approach designed to comprehensively map all detectable volatiles from GC × GC-TOF MS analyses. 16 Chromatogram processing was done by a validated workflow described in the Combined Untargeted and Targeted (UT) Fingerprinting Workflow section; the output was a data matrix of dimensions 73 × 519 (i.e., samples × features) with a subset of 159 identified (targeted) compounds.
The next section highlights the fundamental role of highresolution separations and retention pattern logic based on effective identitation of samples. Machine learning, based on multivariate statistics and modeling algorithms, will be presented as a key tool to access a higher level of information to identify distinctive regional marker patterns.
Complex and Multidimensional EVOO Volatilome. EVOO is highly appreciated by consumers because of its unique and characteristic flavor, which reflects the chemical complexity and dimensionality 32 of its volatile fraction, characterized by the presence of many compounds, especially carbonyls (e.g., aldehydes, ketones), esters, alcohols, and hydrocarbons (e.g., linear, aromatic, terpenoids, etc.). Odor-active compounds, with a low odor perception threshold, and volatiles lacking sensory features (i.e., interferents), 33 concur in the modulation of the "odor code" while triggering aroma perception, whose objectification by instrumental methods is challenging. 9,34 However, EVOO volatiles encrypt additional information about relevant functional variables including olive cultivars, the olive tree's harvest region and local pedoclimatic conditions, olive ripeness, technological processes, and storage condition. 1,2,35,36 Figure 1A shows the pseudocolor image of a Sicilian EVOO (Sicilia origin#S1) volatile fraction comprehensively mapped through untargeted and targeted (UT) peak regions (red graphics); identified/targeted analytes (i.e., targeted compounds) are highlighted by green circles. Patterns of analytes, following a retention logic based on the relative retention exerted by the polar × semipolar column combination adopted, are highlighted in Figure 1B,C.
Compounds derived from oxidative cleavage of linoleic and linolenic acids, promoted by lipoxygenase (LOX) and hydroperoxide lyase (HPL) pathways, constitute the LOX signature (green-color area in Figure 1B and enlarged area in Figure 1C), which is the most abundant fraction in high-quality EVOOs. 1,13 It is characterized by the presence of C 6 and C 5 compounds, in particular aldehydes, alcohols, ketones, and esters (e.g., hexanal, (E)-2-hexenal, 1-penten-3-ol, 1-hexanol, 1-penten-3-one, hexyl acetate, etc.), fundamental to define positive attributes as fruity and green. 14,21 Saturated and unsaturated aldehydes (respectively, in brown and orange) are mainly produced by oxidation of unsaturated fatty acids. 37 While C 6 and C 5 unsaturated aldehydes from LOX are correlated to positive attributes, the others, with a higher molecular weight and low odor threshold (e.g., (E)-2-heptenal, (E)-2-octenal, (E)-2-decenal, heptanal, octanal, and nonanal), are indicated in many studies as responsible for the rancid offflavor with unpleasant and penetrating notes. 19,37−39 Alcohols (purple line in Figure 1B), represented by 30 congeners here identified, have a strong retention in the 1 D polar column and are well separated by informative carbonyls. Of them, the most relevant are 1-octen-3-ol, 1-nonanol, and 1-decanol because of their decisive role in defining sensory defects eliciting fatty, rancid, earthy, and mushroom-like notes. 24,36 Short-chain fatty acids ( Figure 1B black line) derive from the oxidation of the corresponding aldehydes 19,37 with propanoic and butanoic acids as the most odor-active, followed by pentanoic and heptanoic acids. Their presence was correlated to the perception of rancid and fusty defects. 1,37,38 Hydrocarbons ( Figure 1B in cyano) have a negligible contribution in the definition of the EVOO flavor, although some unsaturated derivatives (i.e., 3-ethyl-1,5-octadiene and 4,8-dimethyl-1,3,7-nonatriene) were linked to green and fruity notes 13 or to rancid and fishy aroma. 1 Moreover, a series of C10 alkenes, i.e., 3,4-diethyl-1,5-hexadiene (RS or SR), 3,4-diethyl-1,5-hexadiene (meso), (5Z)-3-ethyl-1,5-octadiene, (5E)-3ethyl-1,5-octadiene, (E,Z)-3,7-decadiene, and (E,E)-3,7-decadiene, whose elution region is highlighted in blue in Figure 1B, are known to be diagnostic markers of early ripening stages of olives (Angerosa, Camera, D'Alessandro, & Mellerio, 1998), while n-octane is an indicator of over-ripening. 3,16,24 Journal of Agricultural and Food Chemistry pubs.acs.org/JAFC Article Figure 2. Workflow including data processing (i.e., fingerprinting and profiling) and machine learning. The presence and abundance of terpenes (gray rectangles in Figure 1B) are of particular interest because of their role as indicators of geographical origin 7,14 or of ripening, e.g., αfarnesene. 3 Moreover, they contribute to defining positive attributes, such as wood, lemon, and roselike odors. 13,40 Lactones are generally detected in low but variable amounts in EVOO, and their relative concentration is cultivar-specific. 36 Esters as well, closely eluting to lactones, contribute to defining fruity notes, with C 6 and C 5 derivatives deriving from the LOX pathway that dominates the class. 1,21,36 Multivariate Analysis. First, an exploratory unsupervised analysis was carried out applying PCA; data structure was examined to check whether geographical-origin-related intrinsic groupings of olive oil samples were detectable. Then, six twolevel PLS regression models (one for each concerned Italian region) were built to obtain the variable importance in prediction (VIP) scores and to select the variables that contribute the most to characterize each EVOO belonging to a particular Italian region against the rest of the samples. From these PLS models, the six characteristic volatile patterns, one for each geographical region, were delineated and a similarity analysis of the characteristic pattern of each region was carried out by applying the nearness index. Finally, six one-input class SIMCA classification models were developed and validated. Figure 2 shows the multivariate analysis workflow designed to capture informative and diagnostic patterns capable of correctly classifying/discriminating EVOO production regions.
Exploratory Analysis. PCA was initially performed considering the 591 variables (i.e., peak-region features) per sample (n = 73). After inferring from this first PCA model, three variables were removed: phthalide, (E)-2-hexenal, and toluene because the related loadings were very large in all cases, and they were masking the behavior of the other variables. Finally, a new PCA with 588 variables was developed and all of the successive multivariate analysis steps were carried out with these variables.
The new PCA model was built with 12 principal components, which explained a total variance of 79.73%. Figure 3 displays the score plot on PC1 vs PC2. Some particular grouping trends were observed for the olive oil samples from Sicilia, Lazio, and Umbria. In addition, Garda and Puglia were spread over the bottom and top halves of PC1. Notice that the variance explained by both PCs is approximately 30% of the total variability. This implies that the main source of variability in the data is not related to geographical origin. Nevertheless, it is sufficient to propose classificatory models.
Variable Selection: Characteristic Profile. The variable importance in the projection (VIP) score, which summarizes the overall contribution of each variable to the PLS model, was used as the variable selection strategy to highlight characteristic volatile patterns for each region. The "greater-than-one-rule" was applied for selecting the VIP scores, and only about 12% of the total number of variables (588) were selected as characteristics. In this way, the number of selected variables per region was Garda, 76; Sicilia, 58; Toscana, 71; Lazio, 82; Puglia, 71; and Umbria, 70; accounting for a total of 121 variables. Table 2 shows the numbers of LVs chosen as well as the percentage of variance explained for each model.
Tables S3−S8 list, for each Italian region, the specific variables and include both untargeted and targeted components.
Similarity Study. The similarity analysis among the region characteristic patterns was carried out by calculating similarity indices. Such indices are defined as a number between 0 and 1, which describes the equivalence of two objects characterized by multivariate data; the value 0 indicates maximum difference and 1 implies maximum similarity. In this study, the nearness index (NEAR) 41 was employed; it can be calculated by eq 1 where x c i and x r i symbolize each element of the considered and reference characteristic profiles, respectively. Note that eq 1 has two terms. The second term is a quotient between the sum of distances between the different elements of the two vectors (global distance) and the value of the sum of these elements. In this way, a normalized global distance between 0 and 1 is calculated. This second term is subtracted from 1 to convert the distance (which measures dissimilarity) into similarity so that 1 represents the total coincidence and 0 represents the null coincidence. Equation 1 could also be reformulated in the matrix notation as reported in eq 2 where, correspondingly, X c i and X R i are the considered and reference characteristic profile vectors, respectively (the superscript T denotes the transposed matrix).
To carry out the similarity study, a new reduced tertiary vector consisting of 0, 1, and 2 codes for each region was built from the regional characteristic profiles; results are reported in Table 3. The following rules were applied to establish the aforementioned codes: • 0: It was assigned to those variables not selected as part of the regional characteristic pattern, e.g., variable 32 corresponding to methyl benzoate was selected only for the Puglia profile, and thus, this variable was codified with the value 0 for the remaining reduced tertiary vectors. • 1: It was assigned to those variables whose VIP scores ranged from 0 to 1, e.g., variable 46 corresponding to ethyl benzoate was selected for Lazio, Puglia, and Umbria profiles. • 2: It was assigned to those variables whose VIP scores were greater than 1, e.g., variable 8 corresponding to αcopaene had a VIP score greater than 1 for Garda, Sicilia, Toscana, Lazio, and Puglia profiles and a VIP score between 0 and 1 for the Umbria profile. Thus, in the five patterns (Garda, Sicilia, Toscana, Lazio, and Puglia), this variable was codified with value 2 and in Umbria having value 1.  Once the reduced tertiary vectors from characteristic patterns were pairwise compared, a similarity matrix was constructed from the found NEAR values, which is shown in Figure 4.
As can be seen in the similarity matrix, in all cases, the NEAR value is significantly less than 1, indicating that the volatile profile/pattern between the regions is significantly dissimilar. Therefore, it may be used to classify samples according to geographical origin. In addition, there were five variables out of the 121 selected that were present in all characteristic patterns having a code higher than 1. It was therefore decided to remove them from the classification models as their contribution to the discrimination among regions would not be relevant.
Classification According to Harvest/Production Region Unique Signature. The most conventional way to develop a classification model is based on building a model with two input classes, the target class and the nontarget class, but a valid alternative is performing the same classification method by training with a single input class, i.e., the target class. 42 Working with one input class classification has significant advantages in food authentication: the model is trained using the data from representative samples from genuine foods (target class) and no other samples are required. In fact, some authors have stated that it is advisable to develop models using a one-class classifier in the case of food authentication. Indeed, if a well-known discriminant method such as partial least squares-discriminant analysis (PLS-DA) is used and a new sample does not belong to any such class, the discriminant analysis is unable to properly define the belonging of the sample to one particular class. Conversely, a one-class classifier such as SIMCA establishes if the acceptance is around the target class, delimiting the target samples from other classes. 43 SIMCA involves building a classification method in which each class of training set is modeled independently and the assignment of an unknown sample as belonging to a specific class is based on the nearest distance to the corresponding regions established in the space of principal components. Six one-input class SIMCA models were built, one for each Italian region. Each individual model was developed using the 116 untargeted/ targeted features, which were selected in at least one of the regional characteristic patterns. The aim was to generate overall models suitable for application in routine analysis. Otherwise, should it be required to classify a sample of unknown origin, whose characteristic variables would be selected or chosen? In this way, any classification model developed can be applied, and it will be possible to assign a class to the sample. Table 4 shows the numbers of PCs chosen for each model and the samples used in the training and validation steps.
Class boundaries were established for each predefined target class model on the basis of the values of Hotelling's T 2 and residual Q statistics. The classification criteria of the samples regarding each region were defined using a combination of the reduced T 2 and Q statistic values. Thus, for a sample to be considered as belonging to a certain target class, both values must be less than 1.0.   Because the number of available samples from each Italian region was limited, each single-class model training was carried out employing all of the samples belonging to the concerned target class. Then, all 73 samples, both those belonging to the target class and those not, were used for validation purposes. All of the samples were correctly classified in five of the six singleclass models, and the quality parameters such as sensitivity, specificity, precision, efficiency (accuracy), and area under the receiver operating characteristic curve (AUC) were equal to 1.00. 42 The only model that misclassified one of the samples was the Garda model, in which a Garda sample was considered as non-Garda. Thus, in this model, the sensitivity, specificity, precision, efficiency (accuracy), and AUC were equal to 0.90, 1.00, 1.00, 0.99, and 0.95, respectively. The classification plots of each model are shown in the supplementary material (Supporting Information Figures S3−S8).
Regional Signatures and GC × GC Identitation Potential. Based on the information shown in Table 3, it is possible to derive some conclusions about peculiar chemical traits specific to certain regions. For example, compounds #28 (n-hexane), #109 (1-penten-3-ol), and #386, #475, and #510 (all unidentified) are characteristic of the Garda region. In the same way, compound #141 ((E,E)-2,4-hexadienal) is specific of Sicilia samples, compound #95 (2-ethyl-2-hexenal) of Lazio, and compound #27 (n-octane) of Umbria. Further assignments could be identified as characteristic of more than one region, e.g., compound #245 (unidentified) is associated with Garda and Umbria. In the same way, following this assignment methodology, and considering the presence/absence of a few volatile compounds previously selected, a classification tree rule could be deduced to classify undoubtedly any sample of EVOO from any of the six considered regions. However, it might be beneficial to have a larger set of representative olive oil samples from each of the regions for such a classification tree to be sufficiently reliable.
The classification strategy proposed in this study is based on using the whole UT fingerprint of volatiles that is established by considering simultaneously all of the compounds that have been selected as characteristic of at least one of the regions concerned. In this way, the one-class SIMCA classification models are applied sequentially to any EVOO sample, regardless of geographic origin, so that the oil is assigned to one of the regions. This overall classification approach based on the use of UT fingerprinting, i.e., identitation, overcomes the main drawback for the routine application of single-step multivariate models.
Moreover, the strategy takes full advantage of the highresolution power of GC × GC that effectively maps all detectable volatile components including (a) those related to major functional variables (e.g., olive cultivar, 13 olive ripening stage, 16 harvest year, and processing technology 12 ), here playing a confounding role in regional classification; and (b) several potent odorants delineating EVOO sensory features. The latter might be masked by coelution phenomena occurring in 1D-GC 18 while resulting in less effective identitation and poorly informative profiling processes.
List of analyzed samples grouped according to the production region, olive cultivars as declared in the label, certifications according to the EU quality schemes and/or conventional/organic production, and production year (Tables ST1−ST8); EVOO's production areas/ regions (Figures SF1−SF7) (PDF) Compound name (XLSX)