Rapid Detection of SARS-CoV-2 Variants Using an Angiotensin-Converting Enzyme 2-Based Surface-Enhanced Raman Spectroscopy Sensor Enhanced by CoVari Deep Learning Algorithms

An integrated approach combining surface-enhanced Raman spectroscopy (SERS) with a specialized deep learning algorithm to rapidly and accurately detect and quantify SARS-CoV-2 variants is developed based on an angiotensin-converting enzyme 2 (ACE2)-functionalized AgNR@SiO2 array SERS sensor. SERS spectra with concentrations of different variants were collected using a portable Raman system. After appropriate spectral preprocessing, a deep learning algorithm, CoVari, is developed to predict both the viral variant species and concentrations. Using a 10-fold cross-validation strategy, the model achieves an average accuracy of 99.9% in discriminating between different virus variants and R2 values larger than 0.98 for quantifying viral concentrations of the three viruses, demonstrating the high quality of the detection. The limit of detection of the ACE2 SERS sensor is determined to be 10.472, 11.882, and 21.591 PFU/mL for SARS-CoV-2, SARS-CoV-2 B1, and CoV-NL63, respectively. The feature importance of virus classification and concentration regression in the CoVari algorithm are calculated based on a permutation algorithm, which showed a clear correlation to the biochemical origins of the spectra or spectral changes. In an unknown specimen test, classification accuracy can achieve >90% for concentrations larger than 781 PFU/mL, and the predicted concentrations consistently align with actual values, highlighting the robustness of the proposed algorithm. Based on the CoVari architecture and the output vector, this algorithm can be generalized to predict both viral variant species and concentrations simultaneously for a broader range of viruses. These results demonstrate that the SERS + CoVari strategy has the potential for rapid and quantitative detection of virus variants and potentially point-of-care diagnostic platforms.


Section S2. Additional information for the coronavirus detection. Table
. The number of SERS spectra collected from references and viruses of different concentrations.
To comprehend the basis for the accurate classification and quantification results, it is useful to extract the discriminating SERS spectral features that the CoVari deep learning algorithm employs as decisive factors.Since the SERS peaks correspond to the vibration mode of different molecular compounds, the discriminating SERS spectral features might confer important biological information for many purposes.Given that the CoVari deep learning algorithm relies on spectral features for classifying and quantifying various coronaviruses, any changes in the important SERS peaks will dramatically affect the model performance.Therefore, permutation feature importance by shuffling the values of a specific feature and measuring the resulting decrease in the 'model's performance is used to assess the impact of each feature (i.e., wavenumber) in the spectrum, 1,2 as illustrated in Figure S3.The process begins with calculating the original accuracy (  ) and original MAE (  ) based on the original test dataset.Then, for each wavenumber (∆), a series of 100 "random permutation cycles" were performed.During every cycle, the corresponding SERS intensities for ∆ are randomly shuffled among all the specimens in the test dataset, while keeping the SERS intensities of other wavenumbers unchanged.The model is then evaluated again using the shuffled dataset, and the performance metric is calculated to obtain (∆) and (∆).The relative difference between the original metric value and the shuffled metric value indicates the importance of each feature.So, the feature importance of classification (FIC) is calculated by (∆) =
feature importance of regression (FIR) is calculated by (∆) = −   −(∆)  .The averages of FIC and FIR, obtained from the 100 random permutation cycles for each wavenumber, were then plotted to visually depict their impact on the model's performance, as shown by the black and red curves in FigureS2.A larger decrease in performance suggests a higher importance of the spectral feature.

Figure S3 .
Figure S3.The flow chart illustrating the process of calculating feature importance of classification (FIC) and feature importance of regression (FIR).

Figure S7 .
Figure S7.The plots of (A) the classification loss and accuracy and (B) the regression loss versus training epoch during the training and validation of the CoVari for spike proteins in saliva.

Figure S9 .
Figure S9.(A) The confusion matrix of the CoVari for detecting three coronavirus spike proteins and references.Regression results of the CoVari for (B) SARS-CoV-2 spike, (C) SARS-CoV-2 spike (BA 2. 75.2), (D) SARS-CoV-1 spike.The x-axis is log 10 (C act ) of testing spectra, and y-axis is log 10 (C pre ).The dashed lines represent log 10 (C act ) = log 10 (C pre ).The unit of the concentrations is g/mL.

Table S3 .
3ERS peak assignments for feature importance of classification and regression.3

Table S5 .
Output accuracy of the CoVari for unknown viral concentration test.