Surface Enhanced Raman Spectroscopy for Quantitative Analysis: Results of a Large-Scale European Multi-Instrument Interlaboratory Study

Surface-enhanced Raman scattering (SERS) is a powerful and sensitive technique for the detection of fingerprint signals of molecules and for the investigation of a series of surface chemical reactions. Many studies introduced quantitative applications of SERS in various fields, and several SERS methods have been implemented for each specific application, ranging in performance characteristics, analytes used, instruments, and analytical matrices. In general, very few methods have been validated according to international guidelines. As a consequence, the application of SERS in highly regulated environments is still considered risky, and the perception of a poorly reproducible and insufficiently robust analytical technique has persistently retarded its routine implementation. Collaborative trials are a type of interlaboratory study (ILS) frequently performed to ascertain the quality of a single analytical method. The idea of an ILS of quantification with SERS arose within the framework of Working Group 1 (WG1) of the EU COST Action BM1401 Raman4Clinics in an effort to overcome the problematic perception of quantitative SERS methods. Here, we report the first interlaboratory SERS study ever conducted, involving 15 laboratories and 44 researchers. In this study, we tried to define a methodology to assess the reproducibility and trueness of a quantitative SERS method and to compare different methods. In our opinion, this is a first important step toward a “standardization” process of SERS protocols, not proposed by a single laboratory but by a larger community.


Overview
IMPORTANT. Please read carefully ALL the procedures before starting the experimental activities.
Each Participant will receive a WG1-RR kit (see Section 2) containing  paracetamol powder, required for setup characterization  the necessary materials to prepare the analyte samples (CALIBRATION samples, and TEST samples)  the SERS substrates (solid substrates or metal colloids) necessary to perform the measurements Six possible SERS METHODS have been selected for this Round Robin test, each method being defined by a substrate and an excitation wavelength: sAg@514/532, cAg@514/532, sAg@785, cAg@785, sAu@785, cAu@785.
Each Participant will prepare the samples (see Section 3) and perform the measurements (see Section 4) according to the protocols defined in this Standard Operating Procedure (SOP), for each method assigned to her/his Lab (2 or 3 methods for each Participant).
Samples will be prepared by subsequent dilutions with PBS from stock solutions (see Section 3).
Measurements will be collected according to the following experimental design (see Section 4 for further details). For each method, the following samples will be made available: 1. a CALIBRATION set (10 samples C0, C1, …C9), to build the regression model and 2. a TEST set (5 samples X1, X2, … X5) to validate the model Wherever possible, specific experimental parameters (e.g. maximum laser power density, optics for illumination and collection), will be provided (see Section 4).
Once the data have been collected, they will be submitted to a centralized data analysis (by the RR coordinator) to get a list of figures of merit (FOMs) characterizing the analytical performance for each calibration. Once the analysis is done, details on how the analysis has been performed will be made available to all Participants.
 1 Eppendorf tube (labelled as X) with adenine  to be used for validation 1 .
IMPORTANT: upon receiving the kit, these tubes should be stored in the refrigerator at 4°C until used  1 small plastic Petri dish (labelled as PBS) with a phosphate buffer saline (PBS) tablet. powder.
And, depending on the type of measurements to be performed by the Participant: 3. Samples preparation

Materials/reagents needed
Before starting, make sure you have the following materials and reagents (not included in the kit): -common plastic and glass tubes/vessels, especially 1. To minimize experimental errors, volume measurements should be preferably performed using adequate micropipettes, which have recently been purchased or calibrated:        For each sample (CALIBRATION or TEST), 1 spectrum must be acquired with each of the 3 different COLLOIDAL substrate batches (#1, #2 and #3) provided, for a total of 3 spectra/sample (see Figure 4). This means each Participant is expected to collect, for each method using a COLLOIDAL substrate, a total of 3 x 10 = 30 spectra for the CALIBRATION set, plus a total of 3 x 5 = 15 spectra for the TEST set, for a grand total of 45 spectra. Before starting data acquisition, it is strongly advised that the Participant checks the colloidal substrates by collecting a UV-vis extinction spectrum and comparing it with the one of the freshly-prepared colloids included in the kit.

TEST samples preparation
To collect a UV-vis extinction spectrum using a quartz cuvette with a 10 mm path length, dilute the colloid with milliQ water by a ratio 1:4 (i.e. 1 part colloid, 4 part water). If 2 mm path length cuvettes are used, use undiluted colloids. If the extinction spectrum collected presents significant differences in band maximum position or overall shape with respect to the spectrum of fresh colloids, contact the Coordinator.

PREPARE A SLIDE.
Take a glass microscope slide, cover it first with an aluminium foil (to avoid fluorescence from the glass) and then with Parafilm (to keep the surface hydrophobic and ensure the formation of a drop with a certain height, Figure 5.1 and 5.2). (ALTERNATIVE) Instead of these glass-aluminum-parafilm slides, UV-quality CaF2 microscope slides can also be used.
S-13 4.1.3. ADD SAMPLES TO SUBSTRATES. Using a P100, add 25 L of a sample to 25 L of a COLLOIDAL substrate in a 1.5 mL Eppendorf tube and rapidly mix (few seconds); then immediately transfer the whole volume of the mixture (i.e. a 50 L drop, P100) onto a slide under the microscope objective for data collection.  (Table 4). Collect the spectrum in the region 400-2000 cm -1 , adjusting collection parameters to maximize the signal-to-noise ratio.
The first sample to be analysed should be the C9, which is expected to yield the most intense signal, taking care to adjust the laser power and the exposure time to avoid the saturation of the detector, while maximizing the intensity of the most intense band of adenine. The number of accumulated scans should be enough to ensure an excellent signalto-noise ratio. Examples of spectra from C1 and C9 samples obtained with different methods are shown on page 16 ( Figure 9). Spectra should be saved with filenames according to the schemes detailed in Section 5.
IMPORTANT: after collecting the spectrum for the C9 sample, data collection should be randomized, i.e. samples should NOT be measured in order of increasing or decreasing concentration.

SERS measurements on SOLID substrates
SUMMARY. For each sample (CALIBRATION or TEST), 3 spectra from different spots will be acquired from each of 3 different, independent substrates (in case of solid substrates), for a total of 9 spectra/sample (see Figure 6). This means each Participant is expected to collect, for each method using SOLID substrate, a total of 9 x 10 = 90 spectra for the CALIBRATION set, plus a total of 9 x 5 = 45 spectra for the TEST set, for a grand total of 135 spectra. sample onto each SOLID substrate (aiming at the centre of the substrate) using a P100, taking care not to touch the substrate surface with the pipette tip (Figure 7.1).

RINSE SUBSTRATES.
Wait for the deposited sample drops to dry at room temperature (approximately up to a max of 1.5 h 2 , see Figure 8) and, holding the substrates with a pair of tweezers, rinse them by rapidly dipping them into milliQ water (Figure 7.2) for 3 times (take extra care when handling the substrate with the tweezers) 3 . Let the remaining water on the substrate dry (ca. 30-45 min) before proceeding.

ACQUIRE DATA.
To collect a spectrum, focus the laser beam onto the substrate using a 10x or 20x objectives (Figure 7.3), ensuring that the laser power density at the sample does not exceed the values indicated below (Table 5). Collect the spectrum in the region 400-2000 cm -1 , adjusting collection parameters to maximize the signal-to-noise ratio. The first sample to be analysed should be the C9, which is expected to yield the most intense signal, taking care to adjust the laser power and the exposure time to avoid the saturation of the detector, S-16 while maximizing the intensity of the most intense band of adenine. The number of accumulated scans should be enough to ensure an excellent signal-to-noise ratio. For each substrate, 3 spectra should be collected from different spots. Examples of spectra from C1 and C9 samples obtained with different methods are shown on page 16 (Figure 9) 4 . Spectra should be saved with filenames according to the schemes detailed in Section 5.
OPTIONAL: once the 3 spectra from random spots on a substrate are collected according to the protocol above, Participants are welcome to acquire small maps as additional data.
IMPORTANT: after collecting the spectrum for the C9 sample, data collection should be randomized, i.e. samples should NOT be measured in order of increasing or decreasing concentration. . The spectra at page 16 are all from Ag substrates, SERS spectra of adenine on Au substrates are slightly different than those on Ag, so don't worry if you "get more bands" than the spectra on the SOP. Interlaboratory study S-18

Characterization of the experimental setup
After the collection of the spectra from samples, acquire, for each method used for sample analysis, a normal Raman spectrum of the REF powder in the region 400-2000 cm -1 , using the same optics and laser power used for the SERS measurements (as in 4.1-4.2), but increasing the exposure time or the number of accumulated scans by a factor of 10. The spectrum will be used to characterize each setup in terms of wavenumber calibration and sensitivity.  We recommend inspection of data (e.g. visual inspection of spectra) after each stage of data pre-processing and analysis to spot any adverse effects on the data sets. All decisions should be taken by a priori spectroscopic knowledge of the system. Although such a strategy may not be the optimal one, it will lead to a better model performance.

Remarks on software to be used
This is a software-independent protocol, i.e. it describes the various operations to be done on data in such a way that they can be carried out using different software and applications. Various data analysis software programs and packages exist, ranging from those for general-purpose use to those targeting specific data analysis tasks. At University of Trieste, the site where the centralized data analysis for the ILS has been made, the analysis was performed using the R statistical computing environment (http://www.rproject.org/) version 3.5.1. Throughout the protocol, metrics, parameters, and approaches that may be somewhat different in other software were explained wherever possible.

DATA IMPORT/LOADING
Timing: depends on the import method and on the software used.
1.1. Import data of a single dataset (i.e. one method from one lab) into the selected software, making sure all the metadata available in the filenames (i.e. labcode, substrate, laser, method, sample, type, concentration, batch, replica) is attached to each spectrum.

1.2.
Carefully inspect the data checking for anomalies, poor-quality spectra (e.g. missing data) and artefacts (e.g. cosmic rays)

DATA PRE-PROCESSING
Timing: 5-10 min (depending on the size/kind of the data set. e.g. colloids/solid substrates).

The main goal of this step is to improve the accuracy of the study by minimizing variation within the data that does not pertain to the analytical information.
[SMOOTHING, DOWNSAMPLING AND SELECTION OF SPECTRAL RANGE FOR ANALYSIS]

2.1.
Smooth, down-sample and restrict to the ~400 -~1650 cm -1 wavenumber range by applying a LOESS function 5 to obtain a uniform data step of 3 cm -1 . 5 Interpolation is obtained by locally fitting by weighted least squares. That is, for the fit at point x, the fit is made using points in a neighborhood of x, weighted by their distance from x (with differences in 'parametric' variables being ignored when computing the distance). The size of the neighborhood is controlled by α = 0.75. The neighborhood includes proportion α of the points, and these have tri-cubic weighting (proportional to (1 − ( S-24 2.2.3. Use this baseline-subtracted mean C0 spectrum as customized reference for the EMSC algorithm. (This is important to ensure that zero signals will be truly zeros in the calibration)

2.2.4.
Perform spectral correction on all spectra (C0-C9) using EMSC procedure with polynomials up to the second degree 6 [Checkpoint] A visual check should be performed to assure that C0 spectra are flat. (Figure 12) 6 A R package (EMSC) and a MATLAB graphical user interface for EMSC with several extensions are freely available (https://nofimamodeling.org/). Other implementations of EMSC are also available through Quasar (free, https://quasar.codes) or several commercial software packages. 3.1. Integrate the area of SERS intensity between 715 and 750 cm -1 for the specific ring breathing Raman mode of adenine ( Figure 13).

Calculate the average area (A) for each concentration (c).
[DATA FRAME CREATION]

3.3.
Create two separate data frames (one for the calibration, train set, and one for the validation, test set, of the model), with the average areas (A), and adenine concentrations (C0-9 and X1-5, respectively) ( Figure 14).  (Figure 15). The convention is to plot the instrument response data (A) on the y-axis and the values for the standards (c) on the x-axis.
[Checkpoint] Inspect the plot for possible extreme values of both c (high leverage points) and A (outliers). .

Figure 4 Two examples of concentration vs response plot. The plot of the left (A) is an example of a linear correlation.
Interlaboratory study S-27 [REGRESSION ANALYSIS] 3.5. Generate a fitted model in the form = 0 + 1 + that finds the values for the parameters that best fit the data in the Train set by least squares. Least-squares linear regression can be carried out using most common statistical software. For manual calculation see, for instance, (Ellison et al., 2009).
[Checkpoint] Ensure that the x and y data have been correctly assigned! Regression of c on A is not the same as the regression of A on c (except in the highly improbable case where all the points lie exactly on a straight line).

3.6.
[OPTIONAL] Calculate the uncertainty associated to the calibration line, and plot the confidence and prediction bands 7 ( Figure 16A). It is also useful to obtain a plot of the residuals 8 ( Figure 16B). 7 Uncertainty in a linear regression relationship can be expressed by a 95% confidence interval (green band) and 95% prediction interval (red band). Confidence interval is a range of values that is likely to include the slope and the intercept, with a given confidence level (usually 95%). Prediction interval provides information about a range of values that will contain a single future response value with a given confidence level. It is always wider than confidence interval because it considers the spread in the data and the uncertainty in the model parameters. Unlike confidence intervals, which are accurate when the sampling distribution of the estimator is close to normal, which usually occurs in sufficiently large samples, the prediction interval is accurate only when the errors are close to normal, which is not affected by sample size. 8 When regression assumptions are met, the plot should have zero mean, constant spread and no global trends: the residuals should be scattered approximately randomly around zero, with no trend in the spread of residuals with concentration.
NOTE: Let us assume that the relationship between A and c is reasonably linear, thus represented namely by a straight-line, there are two obvious ways to fit a linear calibration line for predicting c from A. The so-called "classical" calibration approach would fit a regression line of A on c, while the "inverse" regression would treat c as the response and A as the regressor, fitting a regression line of c on A. Both regression, direct or indirect calibration, are carried out by least squares minimization according to Gauss. Let (xi, yi)  Even if the "classical" calibration yields the best linear unbiased estimate of the calibration curve A (c), it has been proved that predictions with inverse calibration are more precise than those with the classical calibration (Centner et al., 1998).

[OPTIONAL] Calculating and Interpreting Regression Statistics
The (Pearson product-moment) correlation coefficient r, can be obtained from software or calculated using the equation The value of r will be in the range ±1. The closer |r| is to 1, the stronger is the correlation between the variables. r can be usefully interpreted as indicative of good linearity only when data are reasonably evenly distributed along the x-axis, with no serious anomalies.
[Checkpoint] Correlation coefficients are simple to calculate, but are easily very misinterpreted. The calibration curve must always be plotted and inspected by eye. If not, a straight-line relationship might wrongly be deduced from the calculation of r. A r value close to 0 does not necessarily mean that there is no relationship; a useful non-linear relationship would not necessarily lead to high linear correlation coefficients. For a simple linear regression, the coefficient of determination r 2 is the square of correlation coefficient r between the observed outcomes and the observed predictor values. The r 2 coefficient of determination is a statistical measure of how well the regression predictions approximate the real data points. An r 2 of 1 indicates that the regression predictions perfectly fit the data 9 . 9 The F-test of overall significance is the hypothesis test for this relationship. In a calibration experiment, the R 2 should be significantly different from zero as it is essential that the y and x values are highly correlated. The p-value should therefore be  where , is the standard deviation of the residuals is the slope of the calibration curve is the value of the analyte concentration at calibration level i (e.g. for C1, C2,…C9) ̅ is the mean analyte concentration computed over all calibration levels , is the value from t-distribution for probability level α (one-sided test) and = ( * ) − 2 degrees of freedom, with p being the number of calibration levels, and q the number of replicates at each calibration level is the number of replicate analyses very small (far less than the usual 0.05) and the F value should be very much greater than the critical value if the calibration is to be useful.

NOTE:
The limit of detection (LOD) and the limit of quantification (LOQ) are key parameters characterizing the performance of the whole test method. The selection of the procedure for estimation of LOD and LOQ primarily depends on legal requirements, as well as availability of blank samples, the presence of "noise" in the spectra and its applicability for calculations, or the practicability of calibration experiments. The presented statisticalmathematical approach is based on elements taken mainly from DIN 32645:2008-11 (DIN 32645:2008-11 2008 and ISO 11843-2:2000(ISO TC69/SC6 2010, with the following assumptions/conditions (which, however,  NOTE. If, after plotting the data and examining the regression statistics, the calibration data are judged to be satisfactory, the estimated intercept α and the slope β of the calibration line can "directly" be used to estimate the concentration of the analyte in Test set.