Deep Learning of Nanopore Sensing Signals Using a Bi-Path Network

Temporal changes in electrical resistance of a nanopore sensor caused by translocating target analytes are recorded as a sequence of pulses on current traces. Prevalent algorithms for feature extraction in pulse-like signals lack objectivity because empirical amplitude thresholds are user-defined to single out the pulses from the noisy background. Here, we use deep learning for feature extraction based on a bi-path network (B-Net). After training, the B-Net acquires the prototypical pulses and the ability of both pulse recognition and feature extraction without a priori assigned parameters. The B-Net is evaluated on simulated data sets and further applied to experimental data of DNA and protein translocation. The B-Net results are characterized by small relative errors and stable trends. The B-Net is further shown capable of processing data with a signal-to-noise ratio equal to 1, an impossibility for threshold-based algorithms. The B-Net presents a generic architecture applicable to pulse-like signals beyond nanopore currents.

experimental data of DNA and protein translocation. The B-Net results show remarkably small relative errors and stable trends. The B-Net is further shown capable of processing data with a signal-to-noise ratio equal to one, an impossibility for thresholdbased algorithms. The developed B-Net is generic for pulse-like signals beyond pulsed nanopore currents.
Nanopore sensing technology finds a wide scope of applications, including DNA sequencing, 1 protein profiling, 2 small chemical molecule detection 3 and nanoparticle characterization. 4 When analytes pass through a nanopore, characteristic pulses or spikes are generated on monitoring current traces. 5 Properties of the analytes, such as size, concentration, charge, dipole and shapes can be inferred from the amplitude, width (duration), frequency and waveform of such spikes. 2,4,6,7 Traditional procedures in several different variants to recognise and extract translocation events, i.e, spikes, from noisy current traces are typically based on a user-defined amplitude threshold as a criterion to separate the spikes from background noise fluctuations. 8,9 The flow of data processing for nanopore signals, as well as related algorithms, is a widely accepted establishment (see S.1 of Supporting Information (SI)). The determination of spikes is, thus, highly dependent on how the threshold is defined. There is apparently an evident risk that this approach becomes subjective. Although progress has been made in diminishing the subjectivity with empirical selection of the threshold, such as defining the threshold by referring to the background noise level, user intervention cannot be totally avoided. 10 Therefore, conventional techniques for extracting features from raw data have been historically limited in its capacity. To resolve this problem, an advanced algorithm based on a novel Deep Learning (DL) architecture in the form of a Bi-path Network (B-Net) is proposed in this work for spike recognition and feature extraction. The B-Net is capable of transforming directly raw data into an appropriate representation from which certain ending subsystems, such as a classifier, can detect patterns in the input. In other words, the three important features of the spikes, i.e. amplitude, frequency and duration, can be extracted as a package solution covering the demands for an appropriate nanopore sensing technology.
The B-Net is based on a highly consolidated DL architecture, the Residual Neural Network (ResNet), as depicted in Fig. 1. As an integral part of the ResNet architecture, the Convolutional Neural Network (CNN) can be utilised in any dimensional space although dimensions up to three are the most commonly used depending on the application. For example, three-dimensional CNNs are normally used for volumetric data in medical imaging. 11 On the other hand, two-dimensional CNNs are the most popular for images and matrices. 11 As is the case of the B-Net introduced here, one-dimensional CNNs have also been used for signals and time series such as in the case of automated detection of atrial fibrillation 12 and sleep arousal detection. 13 Figure 1: Architecture of the B-Net. This novel network features a two-way architecture with two ResNets. Each ResNet consists of a CNN and a Feed-forward Fully connected Neural Network (FFNN). ResNet 1 predicts the number of pulses-or translocation events-in a temporal window. ResNet 2 forecasts the average translocation amplitude and duration of all the pulses within the same window. ResNet 1 also feeds its output, as an internal input, to the FFNN of ResNet 2. The convolutional section in our implementation has been adapted to processing one-dimensional data. The fully connected architecture, on the other hand, outputs real valued predictions such as average amplitude and duration of the translocation events in a temporal window.
The B-Net is first evaluated on artificially generated datasets. It is then employed for experimental data of λ-DNA and streptavidin translocation in solid-state nanopores. Compared to its traditional algorithm counterparts, the B-Net shows an outstanding performance from the perspective of robustness, objectivity and stability. The developed B-Net is in essence applicable for feature extraction of any pulse-like signals, e.g., transverse tunnelling current for single-molecule detection, spikes from neural cells and system, electro-cardio pulses, etc. It should be clarified that the B-Net concept is designed for handling pulse-like signals. It is not meant for treating DNA/RNA sequencing data generated from a nanopore sequencer. Processing such sequencing data belong to a totally different field.

Neural network architecture
The holy grail in all DL architectures is depth. The rationale behind these successful methods is built on experimental evidence, which suggests that adding more layers to a Deep Neural Network (DNN) provides us with more abstract output vectors that would better represent the hidden features from raw input data. Such rich features would finally allow us to better perform the task for which the network is trained. Yet, this desirable phenomenon has important limitations, since adding more layers to a network comes with penalties, such as vanishing and exploding gradients. 14,15 ResNet is adopted in the B-Net. ResNet is an artificial DNN that uses CNNs and skip connections, or shortcuts that bypass some layers. 16,17 The main motivation for bypassing/skipping layers is to avoid the problem of vanishing / exploding gradients. The architecture of the B-Net is visualised in Fig. 1. It is a bi-path architecture composed of two parallel ResNets. Both ResNets receive the same input that is simply a temporal window, segmented from a complete nanopore translocation current trace. They also return predicted number of pulses and average amplitude and duration of such pulses in the window. As shown in Fig. 1, ResNet 1 is assigned to predict the number of translocation spikes (pulses) in the window, while ResNet 2 forecasts the average amplitude and duration of all the pulses found in the same window. ResNet 1 also provides its output as an internal input for the Feed-forward Fully connected Neural Network (FFNN) in ResNet 2. Additional description of the B-Net architecture is provided in Methods and S.2 of SI.

Results and discussion
Training and validation ResNet 1 and 2 in Fig. 1 were implemented by utilising another DNN, ResNet 18 in the Pytorch DL framework (https://www.pytorch.org). 18 Since ResNet is originally designed for image processing in two dimensions, necessary modifications were implemented on the architecture to adapt it for the one-dimensional data. Additionally, the last linear layer was replaced by a MultiLayer perceptron (MLP) with two layers and the necessary adaptations were adopted to feed an extra internal input from the ResNet 1 output into the ResNet 2 FFNN, in its final layer. Finally, some batch normalisation mechanisms in certain layers of the network were replaced by the group normalisation strategy. Our implementation is publicly available in [To be provided after the first submission].
The B-Net was trained and validated using artificially generated data. It is worth noting that the generated datasets are physics-based involving a set of well-established physical models for nanopore-based sensors. They include the nanopore resistance model, 19 spike generation model, 20 and noise model, 21 by entailing stochastic variations of corresponding parameters in accordance to the related physical mechanisms (Methods and S.3 of SI).
The B-Net was subsequently evaluated on and applied to both artificial and experimental data (Methods and S.4 of SI). Further, five different instances of the network were trained for five different SNRs in the artificial datasets. In all cases, smooth l 1 -loss and Stochastic Gradient Descent (SGD) optimisation were adopted. Additional details about training, such as batch sizes, learning rate schedules, number of epochs, time consumed and curves of loss and errors for all B-Net instances in this work can be found in S.5 of SI.

Features extracted from generated datasets
The general output of the B-Net and the process of performance evaluation are depicted in Fig. 2. The B-Net receives a temporal window from a signal trace and returns a prediction of the average amplitude and duration of all the pulses as well as the number of pulses in the window (Fig. 1). Typical windows with the ground-truth are shown in Fig. 2b along with the resultant predicted average values of amplitude and duration as well as the count of pulses for artificially generated traces with SNR=4.

Performance evaluation
The procedure to performance-evaluate the B-Net is shown in Fig. 2a 2b) at specified values of concentration of nanospheres (C np ) (correlated to translocation frequency), diameter of nanopore (D np ) (correlated to spike amplitude) and translocation dwell time (correlated to duration). The relative error of each data window is calculated for each parameter (i.e., number of spikes, amplitude and duration) referred to the groundtruth recorded during the data generation, following the formula shown in Fig. 2a (details in Methods). Then, the relative errors are averaged throughout the duration in each situation of C np and D np as displayed in Fig. 2c. The blue surface corresponds to the average value while the translucent red spike-look surfaces over and beneath the error depict the Standard Deviation (STD) of the error. The average errors and STDs for each duration in the dataset are plotted in Fig. 2d. Finally, total average errors and STDs are computed and shown above each plot (complete evaluation results in S.4 of SI).
The outputs and relative errors of the B-Net are compared to those of the traditional algorithm in Fig. 3. Different values of n in th n denote the user-defined thresholds of the amplitude as a criterion to distinguish true translocation-generated spikes from fluctuations (noise) in a current trace. Here, th is the abbreviation of threshold and n is defined as the number of multiples of the peak-to-peak value of the background noise (more details of the implementation of the traditional algorithm in Methods). The comparison has a focus on processing the artificially generated test dataset with SNR=4.  Inferred from the steepest slope for the count error in Fig. 4, ResNet 1, predicting number of pulses, is more affected than ResNet 2, predicting amplitude and duration, by the level of noise. Similarly, the duration prediction is more susceptible than the amplitude counterpart.
The smallest error for count is produced by ResNet 1 with an average error of 0.067% for SNR=4. On the other hand, ResNet 2 produces an error of 2.5% and 2.1% in its prediction of duration and amplitude, respectively. For SNR=1, ResNet 1 is still the best performing part of the B-Net with an average error of 0.86% for count while ResNet 2 presents 6.3% and 3.7% of error for duration and amplitude, respectively. Performance starts to degrade severely for SNRs below 1 with errors above 27% (count) for ResNet 1 and above 26% (duration) and 17% (amplitude) for ResNet 2.
The performance of the B-Net is compelling, given that it is almost impossible for the traditional algorithm to correctly recognise translocation spikes in a background noise that has a similar amplitude to the spikes, i.e. SNR=1. By using bigger architectures in combination with labelled generated datasets that are closer to real measured data it is possible to further reduce errors and facilitate correct processing of signals with even lower SNR.
In the B-Net, the features of spikes are acquired by the algorithm during the training process. They contain as much of the original information as possible, indicating not only picking up obvious features considered in the traditional algorithm, such as amplitude and duration, but also noticing details in the pulses, such as the waveform. These extra features assist the B-Net to better appreciate the difference between a real translocation spike and a background noise peak, even when the two have the same amplitude with SNR=1. Furthermore, the decision process is flexible and probabilistic, indicating a powerful method with robust performance. As the entire process minimizes the participation and intervention of users, it warrants maximum objectivity, see below.

Objectivity analysis of experimental data
The B-Net is also applied to two experimental datasets for nanopore translocation of λ-DNA and streptavidin from our previous work.  Compared with the λ-DNA translocation data, the dependence of amplitude and duration on bias voltage is weaker though also linear, in agreement with other reports. 27,28 The duration is insensitive to bias voltage, which could be related to the limited bandwidth at 10 kHz of the amplifier for data acquisition. When the translocation time is close to or shorter than the time constant defined by the cut-off frequency of the amplifier, the difference in the width of pulses is often smeared out. 29,30 Furthermore, the dispersion presented by the traditional algorithm is worse than the one with the λ-DNA translocation data. This difference can be related to a lower SNR of the streptavidin translocation data. It is remarkable in Fig. 5d-e that the amplitude and frequency predicted by the traditional algorithm are even more dependent on the choice of subjective voltage threshold.

Conclusions
The analyses and comparisons in preceding sections confirm eminently that the B-Net meets the essential and critical requirements of being objective, avoiding subjective parameters determined by the user. The adverse effects of a subjective and often blind input parameter adjustment are clearly appreciated in Figs. 3 and 5, where the predictions of the traditional algorithm sensitively depend on a threshold adjusted beforehand. In contrast to the traditional algorithm replying on user-defined input parameters, the advantages of the B-Net lie also in its clear, stable and consistent predictions as well as negligibly small relative errors.
All this is indicative of the robustness of the B-Net in being able to analyse noisy data thereby easing the otherwise strict demands on the control of experimental conditions. The impressive performance of the B-Net in singling out pulses from a noisy background with relative errors below 1% for count and 5% for amplitude and duration for input data of SNR=1 provides a validating example. Such performance is not anticipatable of traditional algorithms, since when thresholds are used for recognising translocation spikes, a noise peak, with similar amplitude, can easily be misclassified as a spike originated from a translocation event. Furthermore, the bi-path architecture in the B-Net assigns different categories of tasks (i.e., pulse count and average feature predictions) to distinct network branches, while the information processed in one branch (i.e. the pulse counter) is used by the other to predict extra average features. This strategy is naturally in agreement with the architecture of the human brain.
Regardless all favourable features reviewed above, the B-Net is a DL based method. As such, it is inherently a data hungry strategy that works better when there are thousands, millions or even billions of training examples. 31 In problems with limited data sources, DL is not an ideal solution. In the specific area of concern, real traces collected from nanopore translocation experiments could be abundant but they are not labelled. Recruiting staff for labelling such data is not viable given the extension of the datasets needed to train the B-Net. Instead, we have generated our own artificial datasets in this work in order to train, validate and then put in use of the B-Net. The comparison experiments conducted against the traditional approaches by processing experimental traces clearly demonstrate that our generated datasets retain a high statistical correlation with the experimental traces collected in the laboratory. Yet, it is imperative to clarify that beyond such favourable results, it is impossible to perfectly mimic the statistical distribution immersed in real traces.
General palliative methods to solve this problem are available to DL. There are pre-training stages of the networks with which looser requirements are demanded and smaller labelled datasets can later be used to fine-tune them. Pre-training could be tackled using alternative generated labelled datasets. In this work, we have shown the relevance of the datasets generated to train our B-Net. Our artificially generated datasets can also be used as a pre-training resource in the B-Net. Afterwards, adding labels to a much smaller dataset of traces collected in the laboratory can be a much more viable endeavour. This could result in a highly qualified network, fine-tuned with real traces. Such a network would show prominent performance differences from a network trained only using artificially generated datasets as the one introduced in this work.
In conclusion, our B-Net algorithm is highly flexible to the input signal, and it is not limited to signals from nanopore sensors. A myriad of pulse-like signals, found in biotechnology, medical technology, physical sciences and engineering, information and communication technology, environmental technology, etc., can be processed by implementing the robust and objective B-Net. Therefore, the B-Net is a generalisable and flexible platform owing to the flexibility of DL strategies.

Data preparation Artificial data generation
The artificially generated data is composed of three parts: 1) randomly appeared translocation spikes, 2) background noise and 3) baseline variations. The baseline current level, i.e., the open-pore current, is determined using the resistance model 19 , with given geometry properties, electrolyte concentration, and bias voltage (more details in S.3.1 of SI). In this system, differently-sized nanospheres are used to represent analytes. In the signal generation programme, the sampling rate is selected at 10 kHz to determine the time step of the signal.
According to our previous work, 32 the probability of appearance of translocation spikes at each time step is correlated to the concentration of nanospheres. The amplitude of spikes is assigned by our translocation model based on the resistance change by steric blockage during the translocation. 33 The waveform of translocation spikes is approximated using an asymmetrical triangle with adjustable ramping slopes (details in S.3.2 of SI).
According to the related studies, coloured Gaussian noise is adopted as the background noise 34 whose power spectrum density is determined by our integrated noise model. 21 At frequencies below 5 kHz (confined by the 10 kHz sampling rate), the noise has four components: flicker noise, electrode noise, white thermal noise and dielectric noise, whose importance increases successively from low to high frequencies. The related parameters are selected as the typical values of SiN x nanopores from our previous measurements. 21 The amplitude, reflecting the power, of the background noise can be tuned for datasets with different SNR (more details in S.3.3 of SI).
In addition, two kinds of variations of the baseline, i.e., sudden jumps and slow fluctuations, are introduced to represent the perturbation. The former generates randomly appeared steps in the baseline to mimic the temporary adsorption-desorption of some objects near and in the pore 5,35 . The latter simulates the instability of the nanopore, which can be caused It is worth noting that SNR is defined as the ratio of spike amplitude to the peak-to-peak value of the background noise. The peak-to-peak value of the Gaussian noise is estimated to be six times of its Root-Mean-Square (RMS) value 5 , while the noise RMS can be calculated by the root square of the integration of the noise power spectrum density in the range of the bandwidth. All the data is generated using a homemade programme on MATLAB.

Experimental data
In order to evaluate the performance of our algorithm on experimental data from laboratory, two groups of translocation experiments are implemented in Truncated Pyramid Nanopores

Traditional data processing method
The reorganisation of translocation events by means of traditional data processing methods is based on the amplitude of spikes. Here, we use a homemade MATLAB programme to locate the translocation spikes in current traces and extract the three parameters: amplitude, translocation frequency, and duration. In the programme, function findpeaks is adopted with the MinPeakProminence method. An amplitude threshold is defined by the user regarding to the RMS of background noise level. In the following discussion, this threshold is tuned from 4 to 25 multiples of the background noise RMS, to demonstrate the dependence of the results on the threshold selection.

Standard database
A standard database for testing of the performance of translocation signal has been established on [the website of our database will be cited after the first submission].
There are two categories of data, i.e., generated and experimental. For each dataset of the generated data, a current trace without background noise is also offered, apart from its counterpart trace with noise. Furthermore, the spike information, including amplitude, start loss. In some cases, it could prevent exploding gradients, which is desirable in networks with architectures as the one used in this work 39 .
Smooth l 1 -loss combines the advantages of l 1 -loss (producing steady gradients for large values of |x−y|) and l 2 -loss (producing less oscillations during updates when |x−y| is small).
For SNR equal to 4, 2 and 1, a batch size of 32 temporal windows and an initial learning rate of 0.001 were used. The learning rate was decreased by 90% every 10 epochs for ResNet For ResNet 2, the network with the minimum relative errors of amplitude and duration won and was saved in each epoch iteration. The relative error is defined as: where, x is the predicted value and x 0 the true value. ResNet 1, on the other hand, may process temporal windows without translocation spikes. Consequently it has a division-byzero risk if the relative error in Eq. 2 is applied. Therefore, for the ResNet 1 validation, Relative Percent Difference (RPD), defined by Eq. 3, was adopted to represent the relative error of predicted pulse number.
Evaluation (testing) process using artificially generated test dataset In order to evaluate the B-Net, a held-out artificially generated dataset, used for neither training nor validation, was used. This dataset has the same size as the validation dataset.
Since it is artificially generated, it is completely labelled. Therefore, for each temporal window from the traces, the ground-truth features were known during data generation, including the real number of pulses and the real average amplitude and duration of the pulses in the temporal windows. Then, the relative error between the labels and the predictions produced by the B-Net was calculated, following Eq. 2.
Unlike training and validation, during evaluation, the B-Net works as follows: First, ResNet 1 processes the temporal window and outputs an estimation of the number of pulses in the window. If the number of pulses predicted by ResNet 1 is 0, ResNet 2 will not process the input and the B-Net will predict 0 for pulses, 0 average amplitude and 0 average duration. On the other hand, if ResNet 1 predicts one or more pulses in the window, then ResNet 2 will process the input and predict two aforementioned features, the average amplitude and duration of the pulses in the window. It is important to highlight that ResNet 2 was trained using only temporal windows with one or more translocation events (pulses).
It never received windows without pulses, consequently, and only in case of correct prediction of number of pulses, ResNet 1 prevented ResNet 2 from processing windows without pulses, for which it was not trained.
In addition, considering the division-by-zero risk, those two rules below were followed for the evaluation stage: • If ResNet 1 correctly predicts 0 pulses in a window, then we consider 0% error for all features-number of pulses, and average amplitude and duration.
• If ResNet 1 erroneously predicts more than 0 pulses in a window when the groundtruth is actually 0, then we consider 100% error for all features-number of pulses, and average amplitude and duration.

Author information
Corresponding Author

Notes
The authors declare no competing financial interests. this work. This work was partially financially supported by Stiftelsen Olle Engkvist Byggmastare (No.194-0646).

Supporting Information Available
Signal processing flow for nanopore sensing; network architecture; physical models and data generation; evaluation (testing) results of artificially generated test dataset for different SNR; training history using artificially generated train and validation datasets for different SNR; comparison between the results from our neural network and the traditional algorithm; Translocation features of λ-DNA and streptavidin extracted by the B-Net (PDF).

Graphical TOC Entry
Deep learning of nanopore sensing signals using a bi-path network (Supporting Information)  • Step 1. Raw data is denoised to form clean data, which can be achieved using low pass filters in frequency domain [1]. In time domain, the baseline and blockage level can be traced and cleaned up from the background noise by averaging with a dynamically adjustable threshold, such as CUSUM algorithm [2]. Algorithms based on other theory, such as estimation theory, e.g., Karman filter [3], and wavelet transform [4] are also adopted. • Step 2. Translocation events, represented as spikes, are recognised and extracted from the current traces. This procedure is usually based on a user-defined threshold of the amplitude as a criterion to separate a true translocation generated spike from a noise fluctuation [5]. • Step 3. Features of these spikes are extracted based on physical models, such as ADEPT [6], peak analysis algorithms, such as DBC [7], and algorithms of feature analysis in frequency domain, such as Fourier transform and cepstrum [8]. • Step 4. Properties of the translocating analytes are inferred from the extracted features.

Contents
Based on simple physical models, the amplitude of spikes is correlated to the size and shape of analytes [9,10]. The duration is related to the translocation speed and nanoporeanalyte interaction, reflecting the physiochemical properties, such as mass, charge, dipole, and hydrophobicity [11,12]. The frequency of spikes concerns the concentration of analytes [12,13]. Furthermore, details of translocation waveform are considered by more sophisticated models [12,14], such as that using the fingerprint feature of blockage current distribution to distinguish 10 kinds of proteins [15]. In this step, Machine Learning (ML)based classification algorithms are widely adopted to cluster the events and associate them to different analytes, such as support vector machine [16], Convolutional Neural Network (CNN) [17], logistic classifier [18], and decision tree [19,20].

S.2 Network architecture
Nowadays, thanks to the advent of the Representation Learning theory [21], machines can automatically discover representations, which is relevant for the detection of specific features demanded by the network designer. In general, Deep Learning (DL) is known as a set of multi-layer representationlearning methods. Starting from the raw input, these networks transform the representations, one layer at a time by means of simple, but non-linear computations. Representations at higher layers are considered to be more acute or pertinent for the features pretended to be extracted from the network. DL has shown outstanding breakthroughs in the last years in areas ranging from Computer Vision (CV) to Natural Language Processing (NLP), from science to engineering, from medicine to material and computer sciences, etc [21,22,23]. By only individualising the correct cost function in a subsystem at the top of the network, errors are then backpropagated after each iteration and the weights of the network are automatically adjusted to do better predictions in response to subsequent inputs in a training dataset [22].
In this work, we introduce a new network named B-Net. In the B-Net, a specific architecture called Residual Neural Network (ResNet) is adopted. ResNet is an artificial Deep Neural Network (DNN) that uses CNNs 3 and skips connections, or shortcuts that jump over some layers [24,25]. The main motivation for skipping layers is to avoid the problem of vanishing and exploding gradients. By means of these skip connections the network can reuse activations from previous layers until posterior layers learn their weights. In the worst scenario, layer l + 1 is able to receive an intact output from layer l − 1 if layer l has not yet learned a proper weigh configuration. In this way, instead of making layer l + 1 receive a potentially harmful representation from an immature layer l, it can avoid utilisation of layer l and instead receive better information from previous layers in the network until a specific training point at which layer l eventually learns its correct weights. It is worth noting that l ± 1 is used here in a figurative way to favour a clearer explanation. In real implementations, skipping connections actually jump two or more layers and a more realistic scenario would describe something like l ± 2 or l ± 3. In this manner ResNet would never go off rail in terms of the correct manifold in its learning space. Nowadays, ResNet is considered a classic architecture usually employed as a backbone for many computer vision tasks. This architecture gained its privileged place among other highly effective DL architectures winning the ILSVRC 2015 in image classification, detection, and localisation, as well as the MS COCO 2015 detection, and segmentation [25].
In this work, we have implemented the B-Net using an architecture called ResNet 18. This is one of the standard architectures utilised in DL frameworks such as Pytorch.
PyTorch is an open source ML library based on the Torch library. It is essentially developed by Facebook's AI Research (FAIR) laboratory and is used for ML applications such as computer vision and natural language processing. Currently ResNet 18 presents the performance with Top-1 and Top-5 error rates of 30.24% and 10.92% on imagenet dataset respectively [24].

S.3 Physical models and data generation S.3.1 Resistance model and open-pore current
To determine the baseline of the current trace, i.e. open-pore current, our previously established resistance model based on the concept of effective transport length is adopted here [26]. The resistance of a cylindrical nanopore can be expressed as 3 A CNN is a highly utilized sub-type of DNN mostly applied to analyzing visual imagery.
where, ρ is the resistivity of the electrolyte, d p the diameter of the nanopore and L ef f the effective transport length of the nanopore that is defined as the sum of the distances from the location inside the nanopore where the electric filed is the highest to the two opposite points along the central axis of the pore where the electric fields both fall to e −1 of the maximum. For a cylinder pore, L ef f is [26] where, h is the thickness of nanopore. The resistivity of the electrolyte is determined by its salt concentration, c 0 and the mobility of cations, µ c and anions µ a .
where, q is the element charge and N A the Avogadro constant. The surface charge on the nanopore sidewall also contributes to the conductance. The surface conductance can be expressed as where, µ is the mobility of the counterions in the surface electric double layer and σ the surface charge density. Thus, at a given bias voltage V , the open-pore current is Values of the parameters used in this model are listed in Table S.1.

S.3.2 Translocation model and translocation spikes
The amplitude of the translocation spikes are simply determined by a steric blockage model [31], that concerns the ratio of the cross-section area of the nanopore to that of the translocating nanosphere.
where, D np is the diameter of the translocating nanosphere. The shape of translocation spikes is approximated by a triangle, as shown inFig. S.2. In a spike, the current decreases from the open-pore current I 0 in the first 40% of the duration time to reach the minimum I b , and then increases back to I 0 in the rest of the 60% duration. The probability of finding a spike at certain time point is set to be proportional to the nanosphere concentration C np and exponentially dependent on bias voltage.
where, k 0 is a coefficient, k B the Boltzmann constant and T temperature in Kelvin. During the signal generation, a time step ∆t of 0.1 ms, i.e. sampling rate of 10 kHz, is specified. Then, a state of either "open-pore" (o) or "blockage" (b) is randomly generated for each ∆t according to a two-point distribution based on probability P . Each pore is sequentially accessed during one ∆t and I 0 is assigned for pores in state "o" and I b for pores in state "b". It is worth to mention that a pore cannot change to state "o" until lasting for a certain duration time, once it is in state "b". The same algorithm of signal generation and the same dependence of spike appearance probability on analyte concentration and bias voltage have been adopted in our previous work for multiple nanopores [32]. Values of the parameters used in this model are listed in Table S

S.3.3 Noise model and background noise generation
A comprehensive noise model of solid-state nanopore has been established based on experimental characterisation of SiN X nanopores [33]. There are four distinct noise sources in the frequency range below 5 kHz: flicker noise S IF , electrode noise S IE , white thermal noise S IT , and dielectric noise S ID . The Power Spectrum Density (PSD) of these noise components are: where, α H is a constant named Hooge's parameter, N c the total number of conducting carriers in the nanopore, f the frequency, α e the current noise parameter for the electrodes, β i the factor for frequency dependency, d L dielectric loss factor of the nanopore membrane and C chip the parasitic capacitance of the membrane. The total PSD of the background current noise is the sum of these four components.
The time sequence of noise with the required length in time domain can be generated based on the white Gaussian noise generator in the MATLAB function library. Using the Fast Fourier Transform and Inverse Fourier Transform, the white Gaussian noise source can be modulated by the total PSD to become the coloured Gaussian noise. Details of the algorithm can be found in [31]. The amplitude of the noise can be tuned by a factor, so does SNR. Values of the parameters used in the noise model are listed in Table S.3.

S.3.4 Baseline variation
Two kinds of baseline variations are involved in the signal generation: sudden jump of the baseline and slow fluctuation. The sudden jump of the baseline is achieved by randomly appeared steps on the baseline. The height of these steps is set to be 30% of ∆I with a random fluctuation of 10%. The number of steps appeared in a 10 s period is also a randomly generated number with a expectation of 30. The slow fluctuation of baseline is represented by the superposition of 8 terms of sine and cosine functions.
I f luc = a 0 I 0 (a 1 sin(ωt) + a 2 sin(2ωt) + a 3 sin(3ωt) + a 4 sin(4ωt)+ b 1 cos(ωt) + b 2 cos(2ωt) + b 3 cos(3ωt) + b 4 cos(4ωt)) (S.14) The general amplitude of the slow fluctuation is controlled by the factor a 0 , which is 0.003 in the signal generation. a i and b i are amplitude coefficients.They are random numbers with an expectation of zero and Standard Deviation (STD) showing in Table S           DL has shown advances in noise reduction in general, such as in image noise reduction [37,38,39,40] and audio signals, for speech enhancement [41,42] and its application to cochlear implants [43].
From an architectural point of view, it is claimed that ResNets perform better than other simpler architectures which does not have skip connections. This hypothesis is based on experiments that suggest that ResNets have better noise stability, which is empirically supported for both simplified and fully-fledged ResNet variations [44].
The relative errors stay near 1% when SNR ≥ 2, while they can go straight up to 100% with SNR reaching 0.25. For a signal with SNR ≥ 2, the neural network can almost thoroughly recognise every spike from the background noise. Increasing SNR does not influence the errors, indicating that these errors come from the system itself, i.e. the neural network. When SNR is smaller than 1, the errors show a strong dependence on SNR, indicating that the interference on the spike recognition is from the background noise. It is almost impossible for traditional algorithms to correctly recognise translocation spikes in a background noise that has a similar amplitude to the spikes, i.e. SNR=1.

S.6 Comparison between the results from our neural network and the traditional algorithm
Enlarging the diameter of translocating nanospheres, increases the amplitude (Fig. S.18a), while increasing the concentration of the nanospheres tend to raise the translocation frequency ( Fig.  S.19b). The extracted duration agrees well with the set value in the generated signal (Fig. S.20 (c)). However, the average values of these features resulting from the traditional algorithm are dependent on the selection of threshold amplitude, as shown by the deviations among the dot-on lines in the respective figures. The number n after th (for threshold) in the legend denotes a specific threshold level, measured by the number of multiples of the peak-to-peak value of the background noise.