Documents
Research Manuscript
Exploration Methodology for BTI-Induced Failures on RRAM-Based Edge AI Systems
- Citation Author(s):
- Submitted by:
- Alexandre LEVISSE
- Last updated:
- 4 February 2020 - 8:19am
- Document Type:
- Research Manuscript
- Document Year:
- 2020
- Event:
- Presenters:
- Levisse
- Paper Code:
- 3904
- Categories:
- Log in to post comments
Resistive switching memory technologies (RRAM) are seen by most of the scientific community as an enabler for Edge-level applications such as embedded deep Learning, AI or signal processing of audio and video signals. However, going beyond a ``simple'' replacement of eFlash in micro-controller and introducing RRAM inside the memory hierarchy is not a straightforward move. Indeed, integrating a RRAM technology inside the cache hierarchy requires higher endurance requirement than for eFlash replacement, and thus necessitates relaxed programming conditions. By doing so, the reliability bottleneck is moved from programming to the read operations (i.e., read margin is reduced and the risk of read failure is increased). Based on this observation, in this work, we propose to explore how Edge-level applications running on a RRAM-based Edge device could fail because of Bias Temperature Instability (BTI). BTI causes threshold voltage (Vt) degradation on the transistors along the memory WordLines (WL), leading to a reduction of the read margin along regularly used WLs. We thereby propose a 3-steps methodology consisting in (i) characterizing the RRAM bitcell and identifying beyond which Vt shift the read operation is going to fail. (ii) characterizing applications and extracting the memory traces. And (iii) running a long term BTI simulation to extract the actual Vt shift of the bitcells sharing the same array WordLine. Based on this, we show that for a 1T1R bitcell featuring a 250% High/Low Resistance State (HRS/LRS) ratio, read failures tend to happen after less than a month in the case of a constantly running convolution kernel. These simulations highlight the fact that transistor-level reliability can be critical for embedded RRAM and that specific workload aware simulation frameworks are required to assess their effects.