# Low Swing and Column Multiplexed Bitline Techniques for Low-Vmin, Noise-Tolerant, High-Density, 1R1W 8T-bitcell SRAM in 10nm FinFET CMOS

J. P. Kulkarni, A. Malavasi, C. Augustine, C. Tokunaga, J. Tschanz, M. M. Khellah, V. De Intel Labs, E-mail: andres.f.malavasi@intel.com, muhammad.m.khellah@intel.com

## Abstract

A 1.09Mb, high density (HD), 1R1W 8T-bitcell SRAM is demonstrated in 10nm FinFET CMOS featuring Low Swing (LS) and Column Multiplexed (CM) bitline (BL) techniques. Read-Vmin and noise-tolerance is improved using a series NMOS clipper and a split input NAND for early keeper turnoff. Measurements show 30(40)mV lower read-Vmin, 18(30)% lower BL power for the proposed LS(LS+CM) BL schemes, with improved noise tolerance, and minimal area overhead.

### Motivation

1R1W 8T-SRAM arrays with decoupled read/write ports and single ended, large signal sensing using hierarchical read BLs (Fig.1) face bitcell density scaling challenges in advanced FinFET nodes, as bitcells are already scaled to 1-Fin transistors [1]. Minimum sized HD bitcells are susceptible to increased process variations resulting into higher Vmin and poor noise tolerance. Thus, high bit-density, low-Vmin and better noise tolerance often pose conflicting design tradeoffs. We propose LS and LS+CM BL techniques for 1R1W 8T SRAMs to achieve lower Vmin along with and improved noise tolerance.

## Proposed Low Swing Bitline (LS BL) Technique

As shown in Fig. 2, an NMOS transistor (N<sub>1</sub>) with the gate connected to a bias voltage (Vbias  $\leq$  Vcc) is inserted in series between the common read port of the bitcells (LS\_LBL) and the local read circuit (LBL node). The LS\_LBL node is precharged to the lower voltage Vbias-Vtn thus effectively lowering the dynamic switching capacitance (C<sub>DYN</sub>) of the read path. This partially offsets the read-delay degradation due to series connected clipper N<sub>1</sub>. To improve the read-path delay further, the low swing bitline node (LS\_LBL) is connected to the PMOS P<sub>3</sub> of the split input NAND gate. As the LS\_LBL node evaluates earlier than the full swing LBL node, it turns on P<sub>3</sub> faster which turns off the keeper stack earlier. This mitigates the keeper contention efficiently and improves the read path delay especially in a skewed slow-N and fast-P process corner.

During the precharge phase, the keeper is turned-off reducing the voltage stress induced keeper aging (in both baseline and proposed techniques). This improves the BL noise tolerance across the operational lifetime. If the P3 and N3 devices of the NAND gate are connected to a common LS LBL node, it would result in an even faster keeper turn-off but also would degrade the noise tolerance significantly as the NAND gate inputs are biased close to the switching trip-point. During read-1 scenario, the LS\_LBL and LBL nodes discharge to Vss. LBL node follows LS LBL node albeit with additional delay due to series NMOS in the path. However, during a transient noise event in a read-0 scenario, the LBL node has improved noise immunity due to full swing precharge voltage (Vcc) and the shielding effect due to series connected clipper which is operating in the sub-threshold region in the beginning of evaluation phase. Thus, by decoupling read-1/0 tradeoffs using series clipper and split-input NAND keeper, lower read-Vmin and better noise tolerance are achieved simultaneously.

## Low Swing + Column-Multiplexed Bitline Technique

The statically biased series clipper  $(N_1)$  in the LS BL technique can be configured into a column multiplexer control signal by splitting the LBL into two sub-parts (LS\_LBL-1,2) and connecting them to the full swing LBL node with shared local read circuits, similar to the split BLs used in 6T bitcells [2] (Fig. 3). CM control signals are asserted based on the address pre-decoder logic. Keeper control NAND gate now contains two PMOS paths P<sub>3</sub>-P<sub>4</sub> and P<sub>5</sub>-P<sub>6</sub> driven by respective LBL sub-parts. P<sub>4</sub>/P<sub>6</sub> avoids short circuit current in the NAND gate when the corresponding LBL sub-part is not evaluated. By splitting the LBL C<sub>DYN</sub> into two halves, LS+CM BL technique achieves higher BL power savings beyond the LS BL technique alone. The read path leakage is reduced due to stacking effect of the inactive CM clipper. The unselected LBL sub-part is not precharged in every read cycle resulting in further BL leakage savings. Smaller # bits/LBL sub-part with lower leakage allows keeper downsizing. This mitigates keeper contention effectively yielding higher Vmin savings. Vmin/C<sub>DYN</sub> benefits with CM can be traded off for higher bit density by sharing local read circuits across a higher # of bitcells. Note that, LS/CM BL techniques can be applied to Global BLs for additional gains. 10nm statistical simulations across multiple process corners show 60(80)mV lower read Vmin for LS(LS+CM) technique compared to the baseline case(Fig. 4).

### **Measurement Results**

Measured read P<sub>FAIL</sub> vs. V<sub>CC</sub> data from a 1.09Mb, 1R1W HD 8T SRAM test-chip (Fig. 9) fabricated in 10nm FinFET CMOS [3] when extrapolated to 1Mb array size demonstrate 30(40)mV lower read-V<sub>MIN</sub> at 950MHz for LS (LS+CM) BL technique (Fig. 5a). Vmin savings increase to up to 70mV at lower frequencies(Fig. 5b). Noise induced failures are captured by performing low frequency (10MHz), read-0 operation on the selected RWL and initializing rest of the LBL bits to '1' for maximum BL leakage. WLVss voltage of wordline drivers is increased gradually to induce higher BL leakage by weakly turning on unselected RWLs (Fig. 6). LS+CM BL with reduced # of bits/sub-LBL achieves superior noise tolerance than the LS BL; although both are better than the baseline case (Fig.7). Reduced LBL voltage swing achieves 18(30)% average total BL power reduction for LS (LS+CM) technique (Fig. 8a) which results into to 8% array level savings. Total BL power savings increase at lower Vcc as LBL swing is reduced by a fixed Vtn drop relative to Vbias (Fig. 8b). Vmin is reduced further by lowering the LBL voltage swing by lowering the NMOS clipper bias (Vbias). With 100mV lower Vbias, LS(LS+CM) BL technique achieves 20(30)mV lower Vmin compared to the Vbias=Vcc case(Fig. 10). BL power savings increase to 3.1 (1.7)% for the LS (LS+CM) BL technique (Fig.12) while noisetolerance is not degraded for Vcc>460mV/WLVss<50mV (Fig. 11). Area overhead for the LS(LS+CM) BL scheme is 0(1.8)%.

**References**: [1] K.-H. Koo et al., VLSI'15, pp.266-267 [2] J. Chang et al., ISSCC'17, pp.206-207 [3] C. Auth et al., IEDM'17 pp.673-676



Fig. 10 Measured read Vmin with 100mV reduced Vbias for LS, LS+CM BL techniques

Fig. 11 Measured increase in noise induced failures with 100mV reduced Vbias (a) iso-Vcc (b) iso-WLVss

Fig. 12 Measured % total BL power savings with Vbias reduction