Tear-Based Vibrational Spectroscopy Applied to Amyotrophic Lateral Sclerosis

Biofluid analysis by optical spectroscopy techniques is attracting considerable interest due to its potential to revolutionize diagnostics and precision medicine, particularly for neurodegenerative diseases. However, the lack of effective biomarkers combined with the unaccomplished identification of convenient biofluids has drastically hampered optical advancements in clinical diagnosis and monitoring of neurodegenerative disorders. Here, we show that vibrational spectroscopy applied to human tears opens a new route, offering a non-invasive, label-free identification of a devastating disease such as amyotrophic lateral sclerosis (ALS). Our proposed approach has been validated using two widespread techniques, namely, Fourier transform infrared (FTIR) and Raman microspectroscopies. In conjunction with multivariate analysis, this vibrational approach made it possible to discriminate between tears from ALS patients and healthy controls (HCs) with high specificity (∼97% and ∼100% for FTIR and Raman spectroscopy, respectively) and sensitivity (∼88% and ∼100% for FTIR and Raman spectroscopy, respectively). Additionally, the investigation of tears allowed us to disclose ALS spectroscopic markers related to protein and lipid alterations, as well as to a reduction of the phenylalanine level, in comparison with HCs. Our findings show that vibrational spectroscopy is a new potential ALS diagnostic approach and indicate that tears are a reliable and non-invasive source of ALS biomarkers.

. Overview of the experimental workflow. Tears are non-invasively collected from ALS positive patients and healthy controls (HCs) and drop-casted onto a BaF 2 window. When allowed drying under controlled conditions, a net flux of solution is driven towards the drop edge by the so-called coffee-stain effect 1 . This spontaneous process yields the peripheral deposition of the colloids and the formation of an external amorphous ring, where proteins tend to segregate 2 . At the centre of the drop, instead, a dendritic crystallisation occurs in fern-like patterns, where proteins, urea and salts accumulate preferentially 2,3 . Consequently, drop-coating is particularly advantageous, because it spontaneously results in analyte partitioning of the fluid. After drop-casting, samples are analysed through FTIR and Raman spectroscopies and the classification carried out by means of dedicated multivariate analyses, in particular partial least square discriminant analysis (PLS-DA), neural networks (NNet), and extreme gradient boosting (xgbTree). Owning to this process flow, we can identify the most significant spectral changes responsible for the discrimination between ALS positive and HC tears. Figure S2. Overview of the FTIR spectra analysis. For a few measured spectra it was necessary to perform a correction for the vapour absorption. In particular, the presence of small positive or negative very sharp peaks observed at specific and well known wavenumbers was taken as an indication of vapour interference. Several peaks in the 1750-1500 cm -1 spectral range -in particular the~1653 cm -1 peak -were considered. The vapour spectrum was subtracted from the sample spectra until a smooth profile around these peaks was observed. After water vapour subtraction, the spectra were corrected for Mie scattering using the algorithm developed by Bassan and colleagues 4 , where the spectral range (originally between 4000-1000 cm -1 ) was extended in our laboratory down to 800 cm -1 . For a better comparison, the corrected spectra were normalized at the Amide I band area and the second derivative analysis was performed (after a 13-point smoothing of the measured S3 spectra) by the Savitzky-Golay method (3rd polynomial, 9 smoothing points), using the GRAMS/32 software (Galactic Ind. Corp., Salem, NH, USA). One hundred randomly selected raw spectra were reported in (a) and after the correction procedures in (b). The average absorption (c) and second derivative (d) spectra obtained from the fern-like morphology of all the analysed tear samples are also reported. Figure S3. Tear morphologies. Optical images of tear samples after deposition on a BaF 2 IR window and water evaporation. The top image displays fern-like morphologies, below the lipid granules. On the right, the average FTIR absorption spectra of fern-like morphologies (F-L) and lipid granules (L.G.) of tears from ALS positive (red spectra) and HC (blue spectra) subjects are shown. For comparison, the absorption spectra of phosphatidylcholine (PC) and of bovine serum albumin (BSA) are reported as, respectively, phospholipid and protein standard. The assignment of the main bands to protein (Amide A, I, II) and lipid (CH x and C=O) moieties is indicated. Figure S4. CH 2 /CH 3 intensity ratio from second derivatives of the FTIR spectra of tears from ALS positive and HCs. F-L, fern-like morphologies; L.G., lipid granules. Before performing the T-test analysis of the differences between the ALS and HC groups, the normality assumption and the variance homogeneity were verified by the Shapiro-Wilk test and the Levene test, respectively. Since the normality assumption and the homogeneity of variance could not be verified in all cases, the non-parametric test Wilcoxon rank sum test was also performed. In all cases, the two populations have different means at the significance level of 0.01. F-L: T-test P-value 1.928e-12; Wilcoxon rank sum P-value 8.338e-15; L.G.: T-test P-value 0.0006072; Wilcoxon rank sum P-value 0.001379.

ALS patients and healthy controls
All the demographic and clinical features were collected, including the Amyotrophic Lateral Sclerosis Functional Rating Scale-revised (ALSFRS-r) total and domain scores, the disease duration expressed as the time from onset to tear collection, the disease progression rate calculated as (48 minus ALSFRS-r total score)/disease duration, the site of onset (limb or bulbar), and the use of invasive ventilation (Table S1).
The following inclusion criteria were applied for patients enrollment: diagnosis of definite/probable ALS (El Escorial Diagnostic Criteria); age 18-90 years.
The following exclusion criteria were applied: severe clinical conditions limiting the compliance to the study; the presence of other severe comorbidities; patients taking drugs (e.g. anticholinergic) that could negatively influence the tear sampling. Compared to the ALS group, healthy subjects were matched by age and gender.

Additional information on the performed multivariate analyses
Multivariate analysis has been performed using R version 3. PLS-DA is a widely used multidimensional linear regression method, which is a variant of the classical partial least square method when the dependent variable is categorical 5 . NNet is a single hidden layer feed-forward neural network 6 implemented in the R package "nnet" version 7.3-13.
Neural network parameter size (number of units in the hidden layer) and decay (regularization parameter to avoid over-fitting) have been tuned using a tuning grid with size varied from 1 to 5 by 1 and decay varied from 0.05 to 0.3 by 0.05. Spectra have been center-scaled (normalized in order to have 0 mean and variance 1) to allow faster convergence of the training algorithm 7 .

S6
The xgbTree belongs to the family of the boosting methods, i.e. classification methods that produce classification models from an ensemble of weak classifiers, classification trees in this case. The extreme adjective refers to the algorithmic tricks implemented in order to increase the computational performances 8,9 . In all training cases, the XgbTree parameters eta (shrinking parameters used to avoid over-fitting) and max-depth (maximum depth of a tree) have been set to 0.1 and 6 respectively.
In order to assess the predictive discrimination and avoid over-fitting, for each method a 10-time artifacts. The best model has been selected using the "one standard error rule". In this case, the model with the best performance value is identified and, using resampling, we can estimate the standard error of performance. The final model used was the simplest model within one standard error of the (empirically) best model 10 . As a performance measure, the area under the curve (auc) was used. The auc is computed from the receiver operating characteristic curve (ROC curve) that is created by plotting the true positive rate (i.e. sensitivity) against the false positive rate (i.e. specificity) at various threshold (value that discriminates between positive and negative outcome) settings. When using normalized units, the auc is equal to the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one (assuming "positive" ranks higher than "negative") 11 . For all trained models the sensitivity and specificity have been also computed. The sensitivity is computed as true positive / (true positive + false negative), while the specificity as true negative / (true negative + false positive). For the PLS-DA method the variable importance measure here is based on weighted sums of the absolute regression coefficients 10 .
Neural networks, despite reaching a high predicting power, can be unable to find stable relations between explanatory variables and dependent variables 12 . Similarly, in boosting trees a correct S7 interpretation of variable importance may also be an issue 13 . PLS-DA is a linear method which may not reach a predicting power comparable to NNet or xgbTree. However, because of its linear nature and its intrinsic robustness it allows a more straightforward interpretation of variable importance and a greater stability 14 . For this reason, we decided to rely on the PLS-DA analysis to identify the most important spectral components in the discrimination between tears from HCs and ALS patients (Table S2).

INFRARED ABSORPTION OF LIPID GRANULES
After tear deposition on a BaF 2 IR window and excess water evaporation, besides the fern-like morphologies we identified also round granules (Fig. S3) that resulted enriched in lipids, as clearly indicated by their FTIR spectra (Fig. S3). In consideration of the observed typical dimension of lipid granules (several tens of microns), they likely formed as a consequence of water evaporation.
These granules were analyzed separately from the fern-like morphologies. These structures displayed different spectral features in ALS positive and healthy controls, further supporting that lipids deserve to be studied as potential biomarkers of the pathology. As shown for the fern-like structures, we analyzed the second derivative spectra -with the support of PLS-DA, xgbTree and NNet -in different spectral ranges.

Analysis of the 3050-2800 cm -1 spectral range: lipid absorption
As reported in Fig. S5a, the IR second derivative spectra of lipid granules from ALS positive and HC tears between 3050 and 2800 cm -1 , mainly due to the CHx group of the lipid hydrocarbon chains, display different spectral features 15 . In general, the relative lipid content appears significantly more intense in the positive samples compared to healthy controls, and characterized by longer hydrocarbon chains. As indicated in Fig. S5c, the PLS-DA identifyed as relevant for discrimination between the two classes the bands at~2965 cm -1 and~2871 cm -1 , both assigned to CH 3 groups, with a contribution of the CH 2 component at~2850 cm -1 (Table S2). The PLS-DA, xgbTree and NNet reported high accuracy and specificity values, while the sensitivity was lower, indicating a significant number of false negatives (see Fig. S6 for details).

Analysis of the 1800-1500 cm -1 spectral range: protein absorption
In Fig. S5b, we reported the lipid granule second derivative spectra of ALS positive and HC tears, between 1800-1500 cm -1 , where the proteins of these structures mainly absorb [16][17][18] . The spectra are characterized by a few well resolved bands of different intensity in the two classes. These S10 components, almost all relevant to discriminate between ALS positive and negative controls, as indicated by PLS-DA analysis (Fig. S5d, Table S2), are:~1657 cm -1 , mainly assigned to α-helix and random coil structures, of higher intensity in negative controls; between~1643-1631 cm -1 and at~1694 cm -1 , mainly due to β-sheet structures [16][17][18] , both of higher intensity in ALS positive samples, and~1738 cm -1 -mainly assigned to the ester C=O groups 15 -that appears of higher intensity in ALS positive samples. Interestingly, the PLS-DA classification performance (Fig. S6) is good, with accuracy 0.88-0.89, sensitivity 0.71, and specificity 0.92. As reported in Fig. S6, the classification performances of xgbTree and NNet methods were similar to those obtained by PLS-DA.

Analysis of the 1500-1200 cm -1 spectral range: lipid absorption
The PLS-DA analysis of the spectral range between 1500-1200 cm -1 (Fig. S7a,c and Table S2) indicated as the most important component for the discrimination between ALS positive and HC tears the band at~1470 cm -1 (Fig. S7c), which is mainly due to the hydrocarbon chain CH 2 and CH 3 groups 15,19 . As it can be seen, this component displays a significant higher intensity in ALS positive tears compared to HCs (Fig. S7a), in agreement with the results obtained between 3050-2800 cm -1 . Noteworthy, the classification performance of this spectral range is high, as reported in Fig. S6. Figure S7. FTIR analysis of lipid granules. Average second derivative spectra of ALS positive samples and healthy controls in the 1500-1200 cm -1 (a) and 1200-900 cm -1 (b) spectral ranges. Wavenumber importance (domain 0-100) for PLS-DA in the 1500-1200 cm -1 (c) and 1200-900 cm -1 (d) ranges.

Analysis of the 1200-900 cm -1 spectral range
As discussed for the fern-like structures, the spectral range 1200-900 cm -1 is very complex, being dominated by the overlapping contributions of different vibrations ascribable mainly to carbohydrates and phosphates.
In particular, the PLS-DA (Fig. S7d) reported as the most significant spectral differences between ALS positive and HC tears -with a satisfactory classification performance (Fig. S6)

Experimental details
The investigated spectral range was 900-1800 cm -1 and was obtained by collating neighboring spectral windows, with a 60 x 5 seconds time of measure each. For every sample, different spectra were measured at various locations to account for concentration gradients and inhomogeneities caused by proteins and lipid segregation during water evaporation. At visible excitations, fluorescence is typically reduced in dried tears compared to other biological environments, i.e. tissues and liquids, however, Raman spectra with an excessive luminescence-related background were discarded from the successive analysis to avoid invasive numerical treatment of the raw data and consequent spurious results.
During the measurements, an automated algorithm was applied to remove cosmic ray spikes using the native software of the Raman spectrometer. Baseline correction and area-under-the-curve normalization were performed using the Bioinformatics Toolbox™ of Matlab and functions like msbackadj. Comparable results were obtained by normalizing the spectra for both the Amide I and phenylalanine bands.

Assignment of Raman bands
The peaks observed in Figure 4a of the main text at 1001 cm -1 , 1010 cm -1 and minor structures in the 1030-1040 cm -1 range can be assigned to aromatic amino acids, predominantly phenylalanine 23 .
Moreover, the presence of tryptophan is explicitly pointed out by weaker peaks at 960 and 981 cm -1 23 . The spectral features around 1160 cm -1 are due to C-C stretching in fatty acids 24 . The presence of methylene groups of lipids is evidenced by a twisting mode at 1298 cm -1 and a skeletal stretching at 1127 cm -1 , although stretching of the C-N bond in peptide groups can also contribute to the latter.
It should be noted that C=C stretching in cholesterol esters and fatty acids can contribute to the Amide I band at about 1660 cm -1 24 . The band at~1250 cm -1 is assigned to the Amide III, i.e. N-H bending vibrations of the peptide groups, while the broad but intense feature in the 1320 -1340 cm -1 range is due to stretching vibrations of the aliphatic side-chains 23 .
C-H vibrations in proteins and lipids, like the methylene scissoring motion, contribute to the wide band observed at about 1450 cm -1 23,24 , whereas the vibrational mode at about 1550 cm -1 arises from ring vibrations of tryptophan 23 . Finally, the weak feature at~1770 cm -1 is ascribed to the C=O stretching of lipids 24,25 . Figure S8. Raman spectrum of the bare BaF 2 substrate. The characteristic Raman mode of BaF 2 occurs as a sharp peak at 243 cm -1 . Consequently, the substrate does not interfere with the investigation of proteins and lipids composing the biofluid since their atomic mass and bond stiffness result in higher energy phonons. Figure S9. Ratio of Amide I/Phe peak area from Raman spectra of ALS and HC tears. The statistical analysis of the differences between the ALS and HC groups was performed as described in Figure S4. The two populations have different means at the significance level of 0.01: T-test P-value 7.995e-07; Wilcoxon rank sum P-value 4.12e-06.  Table S2. Assignment of the relevant IR components. In the Table, the peak positions from second derivative spectra have been reported for the spectral components identified by PLS-DA as relevant for the discrimination between tears from ALS patients and HCs . In addition, the main assignment to the tear biomolecules has been indicated.