AI-Assisted Real-Time Immunoassay Improves Clinical Sensitivity and Specificity

Real-time biosensing systems can interrogate the association between the analyte and the biorecognition element across time. Typically, the resulting data are preprocessed to offer valuable bioanalytical information obtained at a single optimal point of such a real-time response; for instance, a diagnosis of certain medical conditions can be established depending on a biomarker (analyte) concentration measured at an optimal time, that is, a threshold. Exploiting this conventional approach, we previously developed a nanophotonic immunoassay for bacterial vaginosis diagnosis exhibiting a clinical sensitivity and specificity of ca. 96.29% (n = 162). Herein, we demonstrate that a real-time biosensing platform assisted by artificial intelligence not only obviates biomarker concentration (i.e., a threshold) determination but also increases sensitivity and specificity in the targeted diagnostic, thereby reaching values of up to 100%.


Table S1. Sensitivity and specificity of well-established bacterial vaginosis diagnostic methods.
Table S2.Performance of the evaluated architectures (assessed with 10-fold cross-validation).
Table S3.F1 Score for each fold in MLP architecture.

Principal Component Analysis (PCA)
Simplified datasets are easy to study and visualize, which are also recommended to feed machine learning algorithms and analyze data quickly and efficiently.PCA is a dimensionality reduction method, which is frequently used to simplify data sets by reducing the number of features (components) of large datasets.A combination of the initial variables is then used to create each dimension or principal component that results from PCA. 1 PCA facilitated the 2D visualization of the data recorded via the real-time immunoassay.

Neural Network Architectures employed
Dense neural networks (DNNs), also known as multilayer perceptrons (MLPs) or feedforward neural networks, typically consist of three layers: the input, hidden, and output layers.These layers are also called fully-connected layers (FCLs), where each neuron is connected to every neuron in the preceding layers and the layers are stacked together.In DNNs architectures, information flows from the back to the front, layer by layer.Within each layer, the network computes a weighted sum of the previous inputs and weights, followed by the application of a non-linear function.This process continues until reaching the output layer, which provides the network's prediction.DNNs are commonly employed for classification and regression tasks. 2 Convolutional neural networks (CNNs) are composed of convolutional, pooling, and dense layers, with alternating convolutional and pooling layers.In a CNN, convolutional layers perform crosscorrelation operations and work as feature extractors.The pooling layers conduct a down-sampling or dimensional reduction of the network features, being max-pooling the most common pooling layers.This layer promotes the loss of information and reduces the risk of overfitting, increases efficiency, and decreases complexity.The output is a dense layer or fully connected layers that yields classification and regression of patterns, which are based on the features extracted by the previous layers. 3ng short-term memories (LSTMs) are a recurrent neural network architecture designed to handle long-term sequential data and dependencies.LSTM improve the limitations of conventional RNNs by adding memory cells, which are able to retaining information over extended periods.These architectures comprise memory cells, input gates, output gates, and forget gates.The memory cells store and update information, though the gates regulate the flow of information within the network.
The input gate controls the information entering the memory cells.The forget gates determine which information should be discarded.The output gates decide which information go to the "memory cells".A distinguishing feature of LSTM is their ability to selectively retain or forget information based on the context and relevance of the input sequence.This enables LSTM to effectively capture long-term dependencies in sequential data, making them well-suited for speech recognition, time series analysis, and natural language processing. 4

Section 3 . Evaluation metrics
These metrics are evaluation measures used to determine or evaluate the performance of a classification model.

Confusion Matrix
Confusion matrices are commonly used in binary and multi-class classification problems.They allow us to visually and quantitatively evaluate the performance of a model. 5 example of a binary classification confusion matrix is depicted in the Figure below.Note that accuracy can be calculated by taking average of the values lying across the main diagonal.
In order determine the performance of the neural networks, the corresponding confusion matrixes were generated by means of the sklearn.metricsPython library. 6(Figure S5-S8).

F1 score
Precision and sensitivity are two crucial metrics, but they are often in conflict.Precision focuses on the accuracy of positive predictions, while sensitivity focuses on the ability of the model to capture all positive cases.The F1 score provides a balance between these two metrics.
Performance in classification task of our model were also calculated (Table S2-S5).This evaluation metric combines the precision and sensitivity (also called recall) scores of a model according to the expression below which is an harmonic mean: 7 1  = 2 ×  ×   +  (Eq.5S)

Figure S3 .
Figure S3.Confusion matrices for each fold in MLP architecture.

Figure S4 .
Figure S4.Confusion matrices for each fold in CNN architecture.

Figure S5 .
Figure S5.Confusion matrices for each fold in LSTM architecture.

Figure S6 .
Figure S6.Confusion matrices for each fold in B-LSTM architecture.

Figure S1 .%
Figure S1.Confusion Matrix for binary classification.The term, True Positive (TP) refers to a sample belonging to the positive class being classified correctly, whereas, True Negative (TN) refers to a sample belonging to the negative class being classified correctly.Additionally, False Positive (FP) refers to a sample belonging to the negative class but being classified wrongly as a positive sample, and False Negative (FN) refers to a sample belonging to the positive class but being classified incorrectly as belonging

Figure S2 .
Figure S2.Neural architecture for Bacterial Vaginosis diagnosis.The 1D-convolutional architecture is composed of three hidden layers (HLs) and one output layer.The first two HLs are 1D CLs and employs hyperbolic tangent activation functions with 30 kernels of size 3.The third hidden layer is a FCL composed of 20 neurons that employs ReLU functions.The output layer is made of one single sigmoid activation function, because we are dealing with a binary classification problem.

Figure S3 .
Figure S3.Confusion matrices for each fold in MLP architecture.MLP architecture made a classification without errors in folds 1 and 2, whereas in folds 3, 6, 7 and 10, it classified a negative sample as positive, in fold 4 it classified a positive sample as negative, in fold 5 it presented two false positives and finally, in fold 8 it presented a false positive and a false negative.

Figure S4 .
Figure S4.Confusion matrices for each fold in CNN architecture.1D-CNN architecture achieves 100% accuracy in the classification task since it did not obtain false positives or false negatives in any of the 10 folds.

Figure S5 .
Figure S5.Confusion matrices for each fold in LSTM architecture.LSTM architecture in folds 1, 2, 3, 6, 8, 9 and 10 correctly classifies both the 10 negative samples and the 5 positive samples.On the other hand, in fold 4, out of 5 positive samples, it classifies 1 as negative and in fold 5 out of 10 negative it classifies 1 as positive.

Figure S6 .
Figure S6.Confusion matrices for each fold in B-LSTM architecture.

Figure S7 .
Figure S7.Confusion matrix including unseen data (test set).70% of the data was used for the training set, 15% for the validation set, and 15% for the testing set.Employed model: 1D-CNN.
were trained using Adam optimizer 8 with a batch size of 16 and 200 epochs.It was set an initial learning rate of 0.001.

Table of contents
Section 1. Principal Component Analysis (PCA).

Table S4 .
F1 Score for each fold in CNN architecture.

Table S5 .
F1 Score for each fold in LSTM architecture.

Table S6 .
F1 Score for each fold in B-LSTM architecture used.

Table S7 .
Performance of the 1D-CNN architecture fed with different combinations of data groups.

Table S8 .
Results of the training process using unseen data.

Table S1 .
9ensitivity and specificity of commercial bacterial vaginosis diagnostic methods.9

Table S2 .
Performances of the tested architectures assessed with 10-fold cross-validation.

Table S3 .
F1 Score for each fold in MLP architecture.

Table S4 .
F1 Score for each fold in CNN architecture.

Table S5 .
F1 Score for each fold in LSTM architecture.

Table S6 .
F1 Score for each fold in B-LSTM architecture.

Table S7 .
Performance of the 1D-CNN architecture fed with different combinations of data groups.The most relevant data are marked in bold.

Table S8 .
Results of the training process using unseen data (employed model: 1D-CNN).