# Dual-V<sub>CC</sub> 8T-bitcell SRAM Array in 22nm Tri-Gate CMOS for Energy-Efficient Operation across Wide Dynamic Voltage Range

Jaydeep Kulkarni, Muhammad Khellah, Jim Tschanz, Bibiche Geuskens, Rinkle Jain, Stephen Kim, Vivek De Circuit Research Lab, Intel Corporation, Hillsboro, OR, USA (E-mail: jaydeep.p.kulkarni@intel.com)

## Abstract:

A 14KB 8T-bitcell SRAM array is demonstrated in 22nm tri-gate CMOS with fine-grain dual- $V_{CC}$  assist techniques.  $V_{MIN}$  limiting 8T-bitcell nodes are boosted selectively during read and write to improve overall chip- $V_{MIN}$ . Measurements show 130-270mV lower  $V_{MIN}$  with 27-46% lower power at 0.4-1.6GHz for varying amounts of boosting, array activity and voltage regulator efficiency.

## Fine-grain Dual-Vcc Approach:

Dynamic Voltage and Frequency Scaling (DVFS) across a wide range to enable energy-efficient operation requires SRAM array designs that can achieve both high performance and low minimum operating voltage ( $V_{MIN}$ ). However, process variations induce device mismatches that limit both read and write-V<sub>MIN</sub> of the 8T bitcell array (Fig. 1). Word-line boosting using charge pump [1] or capacitive coupling [2] were proposed to lower the 8T array  $V_{\mbox{\scriptsize MIN}}$  but add design complexity with 11-25% array area overhead. Alternatively, dual-V<sub>CC</sub> based boosting selectively increases the voltage of critical nodes in an 8T-bitcell while incurring no array area overhead. A separate voltage  $V_{BOOST} \leq V_{MAX}$ , supplied externally or generated locally from a fixed high input voltage rail (V<sub>IN</sub>) using a step-down voltage regulator (VR), is used to "boost" selected Read/Write Word-Lines (R/WWLs) and cell-V<sub>CC</sub> (during read only) (Fig. 2). All remaining array circuits such as R/WWL pre-decoder, pre-charge logic, local and global bitline (LBL/GBL) sensing, timer, and column-I/O drivers are connected to the variable  $V_{CC} \leq V_{MAX}$  that is shared with core logic operating across a wide DVFS range. By decoupling the V<sub>MIN</sub>-limiting 8T bitcell from remaining array and core logic, overall chip-V<sub>MIN</sub> can be reduced, thus improving energy efficiency. During a read operation, selected RWL and associated bitcells are switched to  $V_{BOOST}$  to enable overdrive of the read port transistor stack (Table 1). This alleviates keeper contention and also improves LBL evaluation delay compared to the baseline single-V<sub>CC</sub> design. During a write operation, selected bitcells remain at V<sub>CC</sub> while WWL is boosted to mitigate contention between the pass NMOS and pull-up PMOS in the bitcell. WWL boosting also aids write completion by passing a strong "1" through the pass NMOS.

## **Dual-Vcc 8T Array Circuits:**

A dynamic level shifting NAND WL decoder replaces the static single-Vcc NAND implementation while fitting in the same area. The common RD/WR clock, driving the pre-charge/evaluate devices (P<sub>1</sub>-N<sub>1</sub>) in the dynamic NAND decoders, is level shifted and optimized for equal rise/fall delays (Fig. 3). A stacked delayed WL keeper (K<sub>1</sub>-K<sub>2</sub>) is used to speed-up the dynamic NAND evaluation and to recover the delay penalty due to RD/WR level-shifting clock. To switch the bitcell between V<sub>BOOST</sub> (read) and V<sub>CC</sub> (write), per column V<sub>CC</sub>-mux (M<sub>1</sub>-M<sub>4</sub>) is used in the local I/O (Fig. 4). Dual-output split level shifters drive the V<sub>CC</sub>-mux control signals to V<sub>BOOST</sub> and are placed in the pre-decoder gap area created by the LBL I/O logic [3](Fig. 4). At very low voltages,

dual- $V_{CC}$  read- $V_{MIN}$  is limited by the LBL merge NAND PMOS P<sub>2</sub> and not by the 'boosted' bitcell (Fig. 5). Similarly, dual- $V_{CC}$  write- $V_{MIN}$  is limited by peripheral logic and not by the bitcell as the pull-down NMOS N<sub>3</sub> (initially at  $V_{BOOST}$  from a preceding read operation) contends with the write driver PMOS P<sub>3</sub> (Fig. 6). For single- $V_{CC}$  design, the baseline bitcell is upsized to meet the  $V_{MIN}$  target, resulting in a large delay margin at/around  $V_{MAX}$  (Fig. 7). However with optimal boosting using dual- $V_{CC}$ , the bitcell can be downsized and/or converted to high- $V_T$  devices, to meet performance target across  $V_{MIN}$ - $V_{MAX}$  range.

#### **Measurement Results:**

We have implemented a 14KB zero area overhead, dual-V<sub>CC</sub> 8T-bitcell SRAM array in 22nm tri-gate CMOS (Fig. 13) [4]. Bit failure rates ( $P_{FAIL}$ ) are measured for different  $V_{BOOST}$ values above V<sub>CC</sub> and incremented in 50mV steps. Extrapolations of the measured P<sub>FAIL</sub> vs. V<sub>CC</sub> data to a 1MB target array size demonstrate 130mV lower read-V<sub>MIN</sub> and 290mV lower write- $V_{MIN}$  compared to the baseline single- $V_{CC}$ design at 1.6GHz (Fig. 8). At lower frequencies (< 1GHz) larger V<sub>MIN</sub> improvement is achieved with only 100mV of boosting since V<sub>MIN</sub> is now governed by contention during read/write operation as opposed to completion of the operation (Fig. 9). RWL-only boosting offers only 40 mV read- $V_{\text{MIN}}$ improvement while boosting the full read port (RWL and cell- $V_{CC}$ ) lowers  $V_{MIN}$  by 130mV at 1.6GHz (Fig. 10). Weakening the keeper on top of read port boosting improves read- $V_{MIN}$  marginally. Noise-induced failures increase marginally with read port boosting, and can be mitigated with a slightly stronger keeper (Fig. 10). Array-V<sub>MIN</sub> is reduced by 130mV, resulting in 27% lower total array power for optimal boosting of 150mV at 1.6GHz (Fig. 11). Operation of the dual-V<sub>CC</sub> 8T bitcell SRAM across a wide voltage range is achieved by gradually increasing  $V_{BOOST}$  value as  $V_{CC}$  is scaled down (Fig. 11). The total power savings depends on conversion efficiency  $(\eta)$  of the step-down VR used to generate V<sub>BOOST</sub> locally from the fixed high input voltage rail  $(V_{IN})$ , clock frequency, and array activity factor ( $\alpha$ ). For 50% VR efficiency and 10% array activity factor, the total power savings at V<sub>MIN</sub> is 27% (46%) at 1.6GHz (400MHz) (Fig. 12).

## Acknowledgements

The authors sincerely thank K. Ikeda, T. Hwa Foo, D. Jenkins, D. Finan, C. Tokunaga, T. Nguyen, P. Aseron, and R. Forand for their help and support. This research was, in part, funded by the U.S. Government under contract number HR0011-10-3-0007. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Government

#### References

- [1] A. Raychowdhury et al., ISSCC pp. 352-353 Feb. 2010
- [2] J. Kulkarni et al., ISSCC pp. 234-236 Feb 2012
- [3] S. Hsu et al., ISSCC, pp. 178-179, Feb. 2012.
- [4] C. H. Jan, et al., *IEDM*, pp.44-47, December 2012



Fig. 12 Measured P<sub>TOT</sub> savings at V<sub>MIN</sub> vs. clock frequency, array activity (α) and VR efficiency (η) Fig. 13 Die photo and summary