Analog Resistive Switching Devices for Training Deep Neural Networks with the Novel Tiki-Taka Algorithm

A critical bottleneck for the training of large neural networks (NNs) is communication with off-chip memory. A promising mitigation effort consists of integrating crossbar arrays of analogue memories in the Back-End-Of-Line, to store the NN parameters and efficiently perform the required synaptic operations. The “Tiki-Taka” algorithm was developed to facilitate NN training in the presence of device nonidealities. However, so far, a resistive switching device exhibiting all the fundamental Tiki-Taka requirements, which are many programmable states, a centered symmetry point, and low programming noise, was not yet demonstrated. Here, a complementary metal-oxide semiconductor (CMOS)-compatible resistive random access memory (RRAM), showing more than 30 programmable states with low noise and a symmetry point with only 5% skew from the center, is presented for the first time. These results enable generalization of Tiki-Taka training from small fully connected networks to larger long-/short-term-memory types of NN.

After ~1e7× pulses, we need to increase V+ up to 2.5 V and V-up to 1.5 V to achieve analogue synaptic potentiation and depression.However, the G window is now changed (Gmax is more than doubled).
Figure S4: (a) Symmetry point with high noise-to-signal (NSR) ratio (93%).The pulses up and down can hardly be discriminated.However, since they're so small compared to the full G swing, the parameter number of states is high (26 states).
(b) Symmetry point with low noise-to-signal (NSR) ratio (66%).The pulses up and down can easily be discriminated.However, since they're large compared to the full G swing, the parameter number of states is low (13 states).

Fabrication
A sketch of the cross-section of the RRAM devices is depicted in Figure 1 (a) of the manuscript.A 20 nm thick TiN bottom electrode is deposited by plasma-enhanced atomic layer deposition (PE-ALD) at 300 °C, using a tetrakis-(dimethylamino)titanium (TDMAT) precursor and (N2, H2) plasma.The 3.5 nm thick HfOx layer is deposited by PE-ALD at 290 °C using a tetrakis-(ethylmethylamino)hafnium (TEMAH) precursor and O2 plasma.These layers are covered by 5 nm of Al2O3 and 20 nm of SiN, deposited again by PE-ALD, using trimethylaluminum (TMA) and Si as precursors at 300 °C and 400 °C, respectively.
The patterning of the device geometry is performed in two steps.First, the SiN layer is etched by Reactive-Ion Etching (RIE), using CHF3 and O2.This process stops at the Al2O3layer.Then, we immerge the chip in 'AZ 726 MIF' developer, to selectively etch the Al2O3.Next, a 30 nm thick TaOx layer is deposited by reactive sputtering of a Ta target in mixed (Ar, O2 plasma.A 20 nm thick TiN TE is deposited by RF sputtering of a TiN target in a mixed (Ar, N2) plasma.A 50 nm W layer is sputtered on top.The sputtering of the W/TiN/TaOx proceeds without vacuum breaking between the deposition of the different layers, to avoid uncontrolled oxidation at the reactive interfaces.To isolate the top contacts of different devices, we etch the W/TiN/TaOx with inductively coupled plasma (ICP), using a mixed CHF3 and SF6 plasma, which stops at the HfO2 layer.A passivation layer of 100 nm thick SiNx is grown by plasma-enhanced chemical vapor deposition (PECVD).The via to access the device TE is etched with a mixed CHF3 and O2 plasma by RIE. 100 nm of W are sputtered and then RIE etched to define the device pads.
The described process flow avoids any lift-off steps, which could not be performed in foundries.

Electrical characterization
The electrical characterization is performed using a NI PXIe-5451 arbitrary waveform generator to source the generated pulses to the device TE, and an oscilloscope NI PXIe-5164 to read the current signal flowing through the device BE.The pulsed read scheme consists of alternating positive and negative pulses with amplitudes of +-200 mV and a duration of 10 µs, to cancel out any potential measurement offset.

Device modeling
1) We modeled the 20 device traces shown in our response to Question 4 using the "SoftBounds" weight update model, which is described in the reference [8] (see: Rasch, Malte J., et al. "Fast offset corrected in-memory training."arXiv preprint arXiv:2303.04721(2023)).We report here below the definition of the updates up and down (dw+ and dw-, respectively): Where  ± is an asymmetry linear correction,  ( ) is the maximum (minimum) value of the synaptic weight, w is the value of the synaptic weight before the update,  is the standard deviation from cycle-to-cycle variability,  is a Gaussian centered in 0 with standard deviation equal to 1 (normal distribution).Therefore, the dw updates are a function of multiple variables (such as α+, α-, bmax, bmin, etc.).
2) From the 20 device fits, we created two new variables (the vectors x and y), that are functions of (α+, α-, bmax, bmin, ...), in two steps:  We defined a vector Nstates,up as: 3) Then, we approximate the values of the x and y vectors by creating a multivariate Gaussian distribution (a 2D Gaussian).Such distribution is obtained by computing the mean values of vectors x and y, and their covariance.In the manuscript, when we write: "Fitting the correlated variation of multiple variables together", we refer to the covariance of x and y, where x and y are functions of multiple variables from the "Softbounds" model.
In the Figure 4 (c) of the revised manuscript we show in dark blue the 20 values of the vectors x and y, and in light blue many samples from their Gaussian models.The plot is also displayed report here below for your convenience:

Figure S1 :
Figure S1: cross-section of a Gen1-RRAM, with unit cell nominal area of (200 nm) 2 .Isotropic etching of the top layers causes uncontrolled definition of the device active area (the size of the TiN top electrode is measured to be only ~ 180 nm).Also, the sidewall concavity generates voids during the cladding of the active layers.

Figure S2 :Figure S3 :
FigureS2: AFM scans before (a) and after (b) the Al2O3 removal by wet etch.We measure a step variation of 4.9 nm, corresponding to the 5 nm Al2O3 deposited by ALD.Therefore, the underlying HfOx layer is not attacked.

Figure S5 :Figure S6 :
Figure S5: XRR profile of the HfOx layer used for Gen2 devices and the table of the parameters used for fitting.The substrate is TiN (20 nm)/Si

Figure S7 :
Figure S7: Grazing Incidence X-ray Diffraction (GIXRD) profiles of the TaOx layer used for Gen2 devices, with the peaks of the Gaussian deconvolution highlighted.

Figure S8 :Figure S9 :
Figure S8: (a) Circular Transmission Line Measurements (C-TLM) structures used to calculate the sheet resistance of the TaOx material.(b)Summary of the measurements, from which we extracted Rsheet = 98319.94Ohm/square and resistivity = 0.281195.


We defined two vectors x and y (see the figure 4 (c) in the manuscript, also reported below) as: and y reflect the modeled number of states and asymmetry for the 20 devices.The log() function enables to model the number of states using a lognormal distribution.

Figure 4
Figure 4 (c): Multivariate Gaussian fit (light blue) of the 20 device response models (in dark blue).