Modeling and Circuit Synthesis for Independently Controlled Double Gate FinFET Devices

Animesh Datta, Student Member, IEEE, Ashish Goel, Riza Tamer Cakici, Student Member, IEEE, Hamid Mahmoodi, Member, IEEE, Dheepa Lekshmanan, and Kaushik Roy, Fellow, IEEE

Abstract—Independent control of front and back gate in double gate (DG) devices can be used to merge parallel transistors in noncritical paths. This reduces the effective switching capacitance and, hence, the dynamic power dissipation of a circuit. However, efficient design of large-scale circuits with DG devices is not well explored due to lack of proper modeling and large-scale design simulation tools. In this paper, we propose several low-power circuit options using independent gate FinFETs. We developed semianalytical models for different FinFET logic gates to predict their performance. An efficient circuit synthesis methodology comprised of proposed low-power logic options in FinFET design library has been developed. Results show about 8.5% area savings and 18% power savings over conventional FinFET technology for ISCAS85 benchmark circuits in 45-nm technology with no performance penalty.

Index Terms—Analytical modeling, circuit synthesis, CMOS, FinFET, independent gate, low power.

I. INTRODUCTION

Currently, bulk CMOS technology is facing great challenges due to increased leakage and process variations with shrinking device dimensions [1]. Even with advanced fabrication techniques, the scalability of bulk CMOS is limited due to increased leakage and short-channel effects (SCEs) [2]. This has motivated researchers to look for nonclassical silicon devices to extend CMOS scaling beyond 45-nm node. A large number of recent works suggest that double gate (DG) devices are the best alternatives [2]–[4], [7]. Among the various types of DG devices, quasi-planar FinFET is easier to manufacture compared to planar double-gate devices [4]. FinFETs employ very thin undoped body to suppress subsurface leakage paths and, hence, reduced SCEs. An undoped or lightly doped body eliminates threshold voltage ($V_t$) variations due to random dopant fluctuations [2] and enhances carrier transport resulting in higher on current.

Based on the gate connection, FinFETs can be classified into two categories. 1) 3-T FinFET: FinFET with front and back gates connected together results in a three-terminal device. 3-T FinFETs can be used for direct replacement of conventional bulk CMOS devices in the standard CMOS circuit design. 2) 4-T FinFET: FinFETs with isolated gates and separate gate contacts result in four-terminal devices. Since one can choose to connect the back and front gates together or to control them separately, 4-T FinFET-based circuits provide more design options, as shown in [3], [5], and [11]. We can either have all gates in the circuit as 4-T FinFETs or selectively use 3-T and 4-T devices. Essentially, the selective use of 4-T FinFETs results in new compact circuit styles. For example, the back gate can be connected to either $V_{dd}$ or ground in order to save switching power. 4-T devices can also be used to selectively merge parallel transistors in the noncritical paths to reduce power dissipation and transistor count. This technique of selective use of 4-T devices in the circuit is known as independently controlled DG (IG) FinFET technology [11]. In the rest of this paper, we use the 4-T FinFET to represent both 4-T device and circuit selectively employing 4-T devices.

Recently, significant performance improvements have been demonstrated in IG FinFET technology (selectively employing 3-T and 4-T devices) for a variety of small-scale circuits like Schmitt Trigger, memory, and individual NAND/NOR logic [3], [7], [11]. However, to the best of our knowledge, there has been no earlier attempt to explore the benefits of the flexibility provided by selectively employing the 4-T devices in generic large-scale circuits. In particular, this paper makes the following contributions:

1) several novel low-power IG logic gate options containing one or more 4-T FinFETs;
2) semianalytical models to compute delay and short circuit power for different 3-T as well as IG FinFET logic cells;
3) IG FinFET-design-library-based circuit synthesis framework to achieve efficient low-power circuit design in IG technology.

The rest of this paper is organized as follows. In Section II, we describe several low-power logic gate options available in IG FinFET technology. In Section III, delay and power modeling of FinFET logic cells are presented. A low-power circuit synthesis framework for IG FinFET technology is presented in Section IV. In Section V, we present power and area saving results for a set of ISCAS85 benchmark circuits.

Manuscript received July 5, 2006; revised December 17, 2006. This work was supported in part by the Semiconductor Research Corporation (SRC) under Grant 1078.002 and in part by MARCO C2S2. This paper was recommended by Associate Editor F. N. Najm.

A. Datta is with Qualcomm Inc., San Diego, CA 92121 USA (e-mail: adatta@qualcomm.com).
A. Goel, R. T. Cakici, D. Lekshmanan, and K. Roy are with the School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907 USA (e-mail: ashishg@ecn.purdue.edu; cakici@ecn.purdue.edu; dlekshma@ecn.purdue.edu; kaushik@ecn.purdue.edu).
H. Mahmoodi is with the School of Electrical and Computer Engineering, San Francisco State University, San Francisco, CA 94132 USA (e-mail: mahmoodi@sfsu.edu).

Digital Object Identifier 10.1109/TCAD.2007.896320

0278-0070/$25.00 © 2007 IEEE
II. MOTIVATION: CIRCUIT DESIGN FLEXIBILITY USING 4-T FinFET DEVICES

Transistor gate sizing is a traditional and efficient technique for low-power and high-performance circuit design [15]. Usually, faster paths are downsized to save switching power (and area) [15]. However, in FinFET devices (Fig. 1), gate sizing is quantized to the number of fins ($n_{\text{Fin}}$). In FinFETs, each fin has two conductive channels on either side. Hence, a sizing step in FinFET is equivalent to adding a single fin, corresponding to a minimum discrete sizing step of

$$\Delta W = W_{\text{fin}} = 2 \times H_{\text{fin}}$$  

(1)

where $H_{\text{fin}}$ is the fin height, as shown in Fig. 1. This is commonly known as width quantization [4]. However, 4-T FinFETs provide designer more flexibility by doubling the resolution of the discrete sizing step. 4-T device offers minimum gate size $W_{\min} = H_{\text{fin}}$, half of that in the 3-T device.

A. Background Information

Fig. 1(b) shows a circuit having a critical path consisting of three logic gate delays. Now, consider a FinFET-based design of this circuit. 3-T FinFETs have smaller gate delays (due to higher $I_{\text{ON}}$) and can be used in the critical paths. However, the remaining two paths are noncritical [Fig. 1(b)]. Particularly, “in 2” to “out” timing arc of the NAND (shown in dashed line) is in noncritical path and, hence, can be downsized. Such skewed IG FinFET circuits can be effectively used in the noncritical timing paths to reduce dynamic power dissipation.

B. New Circuit Design Options Using 4-T FinFETs

A two-input NAND has one noncritical timing arc [Fig. 1(b)]. In this case, we can use 4-T logic options, with asymmetric timing arcs as shown in Fig. 2(a). Here, input “B” is in critical path while “A” is in noncritical path. Since NAND has two nMOS devices in series, we cannot use a 4-T FinFET in the pull-down network without affecting the critical path delay. However, we can use the 4-T FinFET in the pull-up network of the noncritical timing arc, corresponding to input “A.” This type of skewed logic gate with both 3-T and 4-T FinFET devices in parallel is termed as IG-Cell (Fig. 2). In IG FinFET logic, back gate is either connected to the ground (nMOS) or supply rail (pMOS). Hence, the power dissipation is reduced due to reduced switching capacitance. In case both the inputs of a two-input logic lie in the noncritical paths, we can merge two parallel 3-T FinFETs to a single 4-T FinFET (merged gate or MG-Cell), as shown in Fig. 2(b). Figs. 3 and 4 show similar options for NOR and INV cells.

C. Delay, Power, and Area of 4-T FinFET-Based Circuits

Introducing the 4-T FinFET in circuit has two implications on its overall performance. We explain these with reference to an IG NAND, as shown in Fig. 2(a) and (c).

1) The loading of the gate driving node “A” reduces, and hence, its input arrival time reduces.
TABLE I

<table>
<thead>
<tr>
<th>Gate name</th>
<th>Input drive $\Sigma N_{\text{Fin}}$</th>
<th>Input cap. $\Sigma C_{\text{gate}}$</th>
<th>$A_{\text{saving}}$ (%)</th>
<th>$C_{\text{sw}}$ saving (%)</th>
</tr>
</thead>
<tbody>
<tr>
<td>NOR2 (3-T)</td>
<td>5</td>
<td>10</td>
<td>-7</td>
<td>0</td>
</tr>
<tr>
<td>IG NOR2</td>
<td>4.5</td>
<td>9</td>
<td>-10</td>
<td>0</td>
</tr>
<tr>
<td>MG NOR2</td>
<td>2.5</td>
<td>9</td>
<td>26.7</td>
<td>50</td>
</tr>
<tr>
<td>MGx1 NOR2</td>
<td>1.5</td>
<td>3</td>
<td>26.7</td>
<td>70</td>
</tr>
<tr>
<td>MGx2 NOR2</td>
<td>1</td>
<td>2</td>
<td>26.7</td>
<td>80</td>
</tr>
<tr>
<td>NAND2 (3-T)</td>
<td>2</td>
<td>4</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>IG NAND2</td>
<td>1.5</td>
<td>3</td>
<td>-7.1</td>
<td>25</td>
</tr>
<tr>
<td>MG NAND2</td>
<td>1</td>
<td>3</td>
<td>14.3</td>
<td>50</td>
</tr>
<tr>
<td>INV (3-T)</td>
<td>3</td>
<td>3</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>IG INV</td>
<td>1.5</td>
<td>3</td>
<td>-9.1</td>
<td>50</td>
</tr>
<tr>
<td>IGx INV</td>
<td>1</td>
<td>2</td>
<td>-18.2</td>
<td>67</td>
</tr>
</tbody>
</table>

2) However, charging current due to switching at input “A” reduces more than 50% in the independent gate mode [2]. This increases the output signal transition time due to the corresponding input transition.

Due to these two opposing effects, the effective increase in delay (for rising output transition) of the noncritical node is less than 50% of the conventional 3-T FinFET-based circuit. However, loading of the previous stage reduces by 25%, thus improving its delay [Fig. 2(a)]. In fact, under certain situations, we can even get an improvement in the critical path delay due to the reduced switching capacitance in the critical path fan-out [Fig. 2(c)]. In Fig. 2(c), merging of two parallel 3-T devices in the MG-Cell, lying in the off-critical path, reduces capacitive loading of the cell driving the MG-Cell. If the previous stage cell is in the fan-out logic cone of a critical node, the corresponding critical path delay reduces. This can potentially improve the overall circuit delay and/or robustness due to reduction in the number of critical paths.

Both the IG- and MG-Cell options of FinFET have reduced switching capacitance, resulting in significant power savings. For a two-input merged NAND gate, we save about 50% switching capacitance (Table I), as compared to minimum-sized conventional 3-T FinFET-based NAND.

Similarly exploring the noncritical path of NOR cell, we get two different versions of independently controlled NOR cell, as shown in Fig. 3(a) and (b). It is important to note that we can get balanced rise and fall delays with the merged cells, as shown in Figs. 2(b) and 3(b). For balanced rise/fall transitions, NOR cell has higher $W_p/W_n$ ratio compared to NAND. This results in more switching capacitance saving (up to 80%), as presented in Table I. If we allow asymmetry in the rise and fall delays, we can have two more IG options ($\text{MG} \times 1$ and $\text{MG} \times 2$) for NOR and one more for INV ($\text{IG} \times 1$), as shown in Fig. 4. In Fig. 4(a) and (b), the $T_{\text{pdH}}$ is almost $2 \times$ and $4 \times$ of $T_{\text{pdH}}$, respectively.

To accurately estimate the impact of IG FinFET technology on the design area, we need to consider the effect of using 4-T FinFETs on the area of different logic cells. We perform cell layout based on a set of FinFET layout rules [4]. We also consider the back gate contact overhead in the IG FinFET cell while estimating the cell layout area. Fig. 5 shows the layout of an IG inverter corresponding to the independent gate inverter in Fig. 3(c). We express the cell layout area in terms of $\lambda$, the minimum spacing requirement, as shown in Fig. 5 [4]. In case of a conventional 3-T FinFET inverter, the ground line can be moved up by $\lambda$, as it does not require a poly to metal via contact. Therefore, in IG inverter, cell footprint area increases to $120\lambda^2$ from the original 3-T area of $110\lambda^2$. This shows 9.1% area penalty for adding an extra back gate contact to the nMOS (Table I). Similar area overhead can be observed in both the NAND and NOR IG-Cell (Table I).

In merged cells, the number of transistors reduces as compared to the corresponding 3-T FinFET cell. Hence, the cell area also reduces. In NAND MG-Cell, we get about 14% area savings. However, due to higher $W_p/W_n$ ratio of the NOR logic, we save 27% area for NOR MG-Cell. The relative area and switching capacitance ($C_{sw}$) savings of different 4-T gates over their 3-T counterpart parts are tabulated in Table I. Here, the second column represents the input drive in terms of the number of fins connected per input. The third column represents the capacitive loading for each input.

III. FinFET DELAY AND POWER MODELING

Analytical current–voltage models are proven to be very efficient for large-scale deep-submicrometer circuit design and analysis. However, due to several key factors existing bulk CMOS power-performance models are not suitable for the sub-45-nm FinFET devices [8]. Device simulators are accurate, but very slow for large-scale circuit simulation. Hence, there is a need to develop efficient circuit performance simulator tools.

A. Current Model for FinFET Devices

We use the $n$th power law [6] to compute FinFET current. This current model is the easy extraction of model parameters from a set of $I_{DS}-V_{DS}$ characteristics (generated from TAUROUS [9])

$$I_D = \frac{W_{\text{eff}}}{L_{\text{eff}}} B(V_{GS} - V_{th})^n$$

where \(V_{\text{DS}} \geq V_{\text{Dsat}}\) : saturated region

$$I_D = I_{\text{Dsat}} \left(2 - \frac{V_D}{V_{\text{Dsat}}} \right) \frac{V_D}{V_{\text{Dsat}}}$$

when $V_D < V_{\text{Dsat}}$ : linear region

$$I_D = 0, \quad \text{when } V_{GS} < V_{th} \quad \text{cutoff region}$$

This current model is the easy extraction of model parameters from a set of $I_{DS}-V_{DS}$ characteristics (generated from TAUROUS [9])

$$I_D = \frac{W_{\text{eff}}}{L_{\text{eff}}} B(V_{GS} - V_{th})^n$$

where \(V_{\text{DS}} \geq V_{\text{Dsat}}\) : saturated region

$$I_D = I_{\text{Dsat}} \left(2 - \frac{V_D}{V_{\text{Dsat}}} \right) \frac{V_D}{V_{\text{Dsat}}}$$

when $V_D < V_{\text{Dsat}}$ : linear region

$$I_D = 0, \quad \text{when } V_{GS} < V_{th} \quad \text{cutoff region}$$

Authorized licensed use limited to: San Francisco State Univ. Downloaded on December 10, 2008 at 18:04 from IEEE Xplore. Restrictions apply.
in the linear mode (i.e., low 

tained from TAURUS, in the saturation mode. Slight deviation 

different number of fins. Fig. 6 shows a close match of the 

Fig. 6. TAURUS simulation results and current model prediction of $I_{\text{DS}}$ 

versus $V_{\text{DS}}$ for a 4-T FinFET device.

Fig. 7. FinFET inverter delay during fall transition. $C_M$ represents the 

effective input–output coupling capacitance.

where $W_{\text{eff}}$ and $L_{\text{eff}}$ are the effective channel width and 

channel length, respectively. $V_{\text{th}}$ is the transistor threshold 

toltage, and $I_D$ is the drain current. Here, $n$, $m$, $K$, and $B$ are constants to describe SCEs in an empirical manner. 

From a few TAURUS simulated $I_{\text{DS}}$–$V_{\text{DS}}$ characteristics, we 

extract the values of these constants for both 3-T and 4-T FinFET devices. These current models are then used to predict 

the $I_{\text{DS}}$–$V_{\text{DS}}$ characteristics of the 3-T and 4-T devices with 

different number of fins. Fig. 6 shows a close match of the 

predicted currents in 4-T FinFET with the simulation results ob-

tained from TAURUS, in the saturation mode. Slight deviation 

in the linear mode (i.e., low $V_{\text{DS}}$ region) has negligible 

impact in rise/fall delay and transition time estimation 

(Section III-B).

B. Delay Model for FinFET-Based Inverter

Consider a falling output transition of an inverter with a 

rising input ramp $V_{\text{in}}$, having slope $S$, as shown in Fig. 7. At 

t = 0, $V_{\text{in}} = 0$, $V_{\text{out}} = V_{\text{dd}}$. At this time, the total charge at the 

output node $Q_{\text{out}}$ can be expressed as

$$Q_{\text{out}} = Q_M + Q_L = (C_M + C_L)V_{\text{dd}}$$

where $C_M$ is the effective Miller capacitance. $C_M$ consists of 

gate-to-drain (or source) overlap capacitance and other parasitic 

components. After a small time $t = dt$ assuming $V_{\text{out}} = V_o$

$$V_{\text{in}} = V_{i}(t) = \begin{cases} Sdt & \text{for } dt \leq t_{\text{in}} \\ V_{\text{dd}} & \text{for } dt > t_{\text{in}} \end{cases}$$

where $t_{\text{in}}$ is the input ramp transition time. We note that, at 

t = $dt$, some charge (for example, $Q_{\text{current}}$) is being removed 

from the output node due to nMOS and pMOS currents $I_n$ and 

$I_p$, respectively. At $t = dt$ using the conservation of charge at 

the output node of the inverter, we have

$$Q_{\text{out}} = Q_M + Q_L + Q_{\text{current}}$$  \hspace{1cm} (7)

$$= (C_M + C_L)V_o - C_M V_i + \int_0^{dt} (I_n - I_p) dt.$$  \hspace{1cm} (8)

Equating $Q_{\text{out}}$ from (5) and (8) and solving for $V_o$

$$V_o = V_{\text{dd}} + \frac{C_M}{C_M + C_L} \left( \frac{V_{\text{dd}}}{t_{\text{in}}} \right) dt - \int_0^{dt} (I_n - I_p) dt.$$  \hspace{1cm} (9)

Hence, using (2)–(9), $V_o(t = dt)$ can be expressed as

$$V_o = f(t_{\text{in}}, C_L, C_M, W_{\text{eff}}, I_p, I_n, dt).$$  \hspace{1cm} (10)

We use numerical integration in MATLAB [12] to compute $Q_{\text{current}}$. Hence, knowing the currents $I_n$ and $I_p$ at each 

simulation instant and using (10), we get an estimate of $V_o$, 

at $t = dt$. It is important to note that $C_M$ varies with $V_{\text{in}}$ 

and $V_{\text{out}}$. We observe that $C_M$ is inversely proportional to 

$t_{\text{in}}$ and $C_L$, but directly proportional to $W_{\text{eff}}$. Accordingly, $C_M$ can be 

represented as $C_M = g(t_{\text{in}}, C_L, W_{\text{eff}})$.

This method can be extended to determine the complete 

transient response of the inverter output voltage. We assume 

that at $t = T_o$, we have an accurate estimate of output node 

toltage $V_o(T_o) = V_{\text{old}}$ using the current values of $I_n$ and $I_p$ 

(Fig. 7). By using (10), we can get an initial estimate of 

$V_o(T_o + dt) = V_{\text{new}}$ as

$$V_{\text{old}} = f(t_{\text{in}}, C_L, C_M, V_{\text{old}}, W_{\text{eff}}, I_p, I_n, T_o + dt).$$  \hspace{1cm} (11)

Prediction error to estimate the $V_{\text{old}}$ at $t = T_o + dt$ is

$$\Delta V_o = V_{\text{pred}} - V_{\text{old}}.$$  \hspace{1cm} (12)

Now, we update the old estimate of $V_{\text{old}}$ as

$$V_{\text{old}} = V_{\text{old}} + \frac{\Delta V_o}{\text{noIter}}, \quad \Delta V_o > \eta$$  \hspace{1cm} (13)

where noIter is the number of iterations performed to get a 

close estimate of $V_{\text{new}}$. We then use this updated $V_{\text{old}}$ to 

modify the values of $I_n$ and $I_p$ using (2)–(4). These modified 

current values help to get a closer estimate of $V_{\text{new}}$. These steps 

of (11)–(13) are again repeated. These iterations terminate 

when the prediction error $\Delta V_o$ goes below a small predefined 

threshold value $\eta$. Thus, repeatedly applying this procedure, we 

obtain a reasonably accurate transient response of $V_o$ for an 

inverter. The parameters (noIter) $\eta$ and resolution of the time 

step $(dt)$ control the speed and accuracy of the inverter output 

transient estimation. Once the transient response is known, the 

propagation delays $t_{\text{pdHL}}, t_{\text{pdLH}}$ can be easily extracted, as 

shown in Fig. 8.

Fig. 8 shows the excellent agreement of the proposed model’s 

prediction of $V_o$ with TAURUS simulator results for a 3-T 

inverter gate with $t_{\text{in}} = 10$ ps and $C_{\text{L}} = 1$ fF. As mentioned 

in Section III-A, the predicted transient response is slightly off
Fig. 8. Comparison of the proposed delay model with TAURUS results for a 3-T inverter cell output transient with $t_{in} = 10$ ps, $C_L = 1$ fF.

Fig. 9. (a) Equivalent inverter $RC$ delay model. (b) Pull-down network of two-input NAND for falling transition due to switching at input 2. (c) Equivalent $RC$ model of the pull-down network.

from that obtained by TAURUS at the initial and final phase of the rise and fall transients. However, this has negligible impact on estimation of propagation delay ($T_{pd}$).

C. Delay Model for Two-Input Logic Gates

To extend the proposed FinFET inverter delay model for multi-input NAND and NOR gates, we consider the position of the switching transistor. We use the generic mapping technique for converting any multi-input NAND/NOR gate to its equivalent inverter, as described in [13], and modify it for FinFET-based circuits. Then, we apply our proposed semianalytical delay model (Section III-B) to estimate the transient output voltage transient.

The mapping technique is to find an equivalent inverter circuit for any two-input logic gate having an equivalent resistance ($R_{eq}$) and capacitive load ($C_{eq}$), as shown in Fig. 9(a). This ensures that both the circuits have equal first-order Elmore time constants [14]. The first-order moment of the NAND gate is given by

$$T_C = R_{eq}C_{eq} = (R_1 + R_2)C_0 + R_2C_1. \tag{14}$$

Since resistance is inversely proportional to the device width ($W_{eff}$), we need to have a correct estimate of the transistor equivalent width and capacitance values to model gate delays. Note that in two-input logic, only one transistor is assumed to be switching at a particular time instant. This is a valid assumption, since in critical path evaluation, only one signal per gate is activated. Thus, the generic mapping problem simplifies to find the equivalent widths for the pMOS and nMOS transistor in the equivalent inverter.

1) Modeling Equivalent Inverter Width ($W_{eq}$): For a NAND circuit, one of the nMOS devices will be in conducting state, and the other device will be switching from either low to high (or vice versa). As a result, the effective current driving capability of the switching device is only half as that of the other “on” device. Thus, we can take the effective width of the switching nMOS as half of the other device. Fig. 9 shows falling output transition due to switching of input “in 2” of a NAND circuit and its corresponding $RC$ model of the pull-down network. Input “in 1” is at $V_{dd}$; therefore, the corresponding pMOS is in cutoff region and is omitted. However, this mapping concept is well suited for bulk CMOS circuits, which can be sized for balanced rise and fall delays. In FinFET-based circuits, widths are integer multiples of the fin height [2]. Hence, it is not possible to achieve balanced rise and fall delays unless the mobilities of nMOS and pMOS devices are also integer multiples of each other, which is rarely the case. To consider the effect of unbalanced pull-up and pull-down drive currents in FinFET-based circuits, we incorporate a width correction term $\lambda_w$ in the equivalent width computation as

$$W_{eq} = W_{pk}\lambda_w = \frac{1}{W_{neq}} = \frac{1}{\lambda_w} \left( \frac{1}{W_{n2}} + \frac{1}{2W_{n1}} \right). \tag{15}$$

Here, $\lambda_w$ is the equivalent inverter width of nMOS in (16).

2) Modeling Equivalent Capacitance ($C_{eq}$): To preserve the first-order $RC$ time constants of the NAND circuit during the falling transition due to switching of input “in 2,” as in Fig. 9(c)

$$R_{eq}C_{eq} = \left( \frac{R_1}{2} + R_2 \right) C_0 + R_2C_1. \tag{17}$$

Considering the fact that device width is inversely proportional to resistance, (17) can be written as

$$\frac{C_{eq}}{W_{eq}} = \left( \frac{1}{2W_1} + \frac{1}{W_2} \right) C_0 + \frac{1}{W_2}C_1. \tag{18}$$

where $W_{eq}$ is the equivalent inverter width of nMOS in (16). The capacitances $C_0$ and $C_1$ consist of $C_L$ and gate-to-source/drain overlap capacitances, as depicted in Fig. 9(b)

$$C_0 = C_L + C_{on1} + C_{op1} \tag{19}$$

$$C_1 = C_{on1} + C_{on2} \tag{20}$$

where $C_{op}$ and $C_{on}$ are the gate-to-drain or source overlap capacitance [7] of pMOS and nMOS, respectively. Hence, for
The equivalent width and capacitance for NOR can be similarly obtained by reversing the suffixes n- and p- in the NAND circuit equations. This mapping technique can also be applied to the IG and MG FinFET cells with the corresponding $I_{DS}$–$V_{DS}$ current models for 4-T FinFET devices.

Thus, we can predict output voltage transients over the entire range of $t_{in}$ and $C_L$ with reasonable accuracy, as shown in Table II. This table lists the maximum estimation error in NAND, NOR, inverter delays, and transition times for a specific value of $t_{in}$ and $C_L$. The proposed model predicts delay and transition time within 6.5% of the TAURUS simulation results. Most of the error is associated for cells with the fast-rising inputs (i.e., $t_{in} \rightarrow 0$) driving small (i.e., $C_L \rightarrow 0$) loads, which rarely exists in actual circuits.

### TABLE II

<table>
<thead>
<tr>
<th>Gate type</th>
<th>Transition</th>
<th>$%$ Error in $T_{pd}$</th>
<th>$%$ Error in $T_{transition}$</th>
</tr>
</thead>
<tbody>
<tr>
<td>INV 1 fin</td>
<td>rise</td>
<td>5.09</td>
<td>3.52</td>
</tr>
<tr>
<td>(3-T)</td>
<td>fall</td>
<td>5.39</td>
<td>6.19</td>
</tr>
<tr>
<td>IG NOR</td>
<td>rise</td>
<td>5.53</td>
<td>4.13</td>
</tr>
<tr>
<td>(4-T)</td>
<td>fall</td>
<td>6.02</td>
<td>6.21</td>
</tr>
<tr>
<td>MG NAND</td>
<td>rise</td>
<td>5.98</td>
<td>4.85</td>
</tr>
<tr>
<td>(4-1)</td>
<td>fall</td>
<td>6.45</td>
<td>6.33</td>
</tr>
</tbody>
</table>

### IV. IG FinFET-Technology-Based Circuit Synthesis Methodology

In this section, we outline an efficient circuit synthesis methodology using the proposed modeling of the FinFET logic gates in sub-45-nm technology node. We organize our low-power FinFET-based synthesis flow in a three-step process: 1) modeling, 2) characterization, and 3) synthesis (Fig. 11). To explore the benefits of extended sizing options in IG FinFET technology, we have developed two separate libraries.

1) 3-T Library: It consists of different sizing options for 3-T FinFET logic cells. In this library, we have INV, two-input NAND, and NOR.

2) Extended lib: An extended FinFET design library, where five low-power IG FinFET logic cells (Sections II-B and C) are added to the 3-T library.
Section II-B. We have used a lookup-table-based characterization approach [10]. Each lookup table consists of an eight-by-six (eight different $C_L$ and six different $t_{in}$ values) data array corresponding to a particular timing arc of a cell. A timing arc has six such lookup tables corresponding to rise and fall delay, transition, and $P_{sc}$, respectively. In a two-input gate, two timing arcs are separately characterized. In our setup, $C_L$ is varied from 0.2 to 2 fF, and $t_{in}$ is varied from 2 to 22 ps (based on delay range of nominally sized individual cells with FO4 load condition at 45-nm technology).

B. Design Library Cell Characterization

We precharacterize each library gate for cell delay, output transition time, and short circuit power dissipation for different $t_{in}$ and $C_L$. In our library, we have considered four sizing options of 3-T FinFET devices: 1) minimum size (s1); 2) double size (s2); 3) triple size (s3); and 4) four times the minimum size (s4). Standby current data obtained from TAUrus are used to compute the leakage power for each logic cell. The cell areas are computed from the individual cell layouts, as described earlier in Section II-C. Scaled interconnect parameters corresponding to the 45-nm technology (scaled from 180-nm bulk CMOS technology) have been used in the design library.

C. FinFET-Device-Based Circuit Synthesis

Initially, both the standard cell design libraries (i.e., 3-T library and Extended library) are developed and compiled with the Synopsys Library Compiler. Then, by using Design Compiler tool [10] (step 3 in Fig. 11), each test circuit is synthesized from the two different design libraries. First, the Verilog netlist of the circuit is read, and then, different delay, area constraints are applied to the circuit.

V. SYNTHESIS RESULTS

We synthesize a set of ISCAS85 benchmark circuits in 45-nm technology node. The following device parameters are used in this paper: $H_{fin} = 40$ nm, $T_{hi} = 10$ nm, and $L_{eff} = 35$ nm.

A. Power and Area Savings

Fig. 12 shows increasing trend of power and area savings with target delays for an ISCAS85 benchmark circuit c880 in IG FinFET technology over conventional 3-T FinFET design library. This is due to the fact that, at a relaxed delay target, there are more opportunities for merging two 3-T devices to a 4-T FinFET device. In 3-T library-based synthesis, under certain delay constraints, a number of smaller sized (s1, s2, s3) cells are replaced with larger sized (s2, s3, s4) cells in critical paths to meet the timing requirement. In Extended-library-based circuit design, due to the presence of smaller sized 4-T cells in the fan-out logic cone of the critical nodes, more opportunities exist to use smaller sized gates in the critical paths to meet the same performance constraint. Hence, significant power and area saving is observed in the Extended-library-based circuit for the reduced switching capacitance in IG technology.

In order to have a uniform delay constraint ($T_{ckt}$) for different circuits, we choose the individual circuit delay target as the mean of minimum and maximum circuit delay synthesizable by 3-T design library as

$$T_{ckt} \approx \frac{T_{cktMin} + T_{cktMax}}{2}. \tag{24}$$

Here, $T_{cktMin}$ and $T_{cktMax}$ are the fastest and slowest synthesizable delay with the conventional 3-T design library. Accordingly, for c880 circuit, we use $T_{ckt} = 100$ ps using (24) as its delay ranges from 75 to 120 ps (Fig. 12).

Fig. 13 shows the power and area savings of several ISCAS85 benchmark circuits in IG FinFET design library at their corresponding mean circuit delays. On an average, across all ISCAS85 benchmarks, we obtain 18% saving in power dissipation and about 8.5% savings in design area.
TABLE III

<table>
<thead>
<tr>
<th>Gate type</th>
<th>Normal 3-T lib</th>
<th>Extended lib</th>
<th>count diff.</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>gate count</td>
<td>gate count</td>
<td>(3T - Extended)</td>
</tr>
<tr>
<td></td>
<td>90 ps</td>
<td>110 ps</td>
<td>90 ps</td>
</tr>
<tr>
<td>NOR2s1</td>
<td>168</td>
<td>179</td>
<td>51</td>
</tr>
<tr>
<td>NOR2s2</td>
<td>36</td>
<td>15</td>
<td>32</td>
</tr>
<tr>
<td>NOR2s3</td>
<td>3</td>
<td>0</td>
<td>3</td>
</tr>
<tr>
<td>NOR2s4</td>
<td>2</td>
<td>0</td>
<td>2</td>
</tr>
<tr>
<td>NOR2s1G</td>
<td>0</td>
<td>0</td>
<td>61</td>
</tr>
<tr>
<td>NAND2s1</td>
<td>198</td>
<td>214</td>
<td>44</td>
</tr>
<tr>
<td>NAND2s2</td>
<td>39</td>
<td>13</td>
<td>59</td>
</tr>
<tr>
<td>NAND2s3</td>
<td>10</td>
<td>1</td>
<td>13</td>
</tr>
<tr>
<td>NAND2s4</td>
<td>25</td>
<td>5</td>
<td>3</td>
</tr>
<tr>
<td>NAND2s1G</td>
<td>0</td>
<td>0</td>
<td>170</td>
</tr>
<tr>
<td>INV1s1</td>
<td>103</td>
<td>107</td>
<td>103</td>
</tr>
<tr>
<td>INV1s2</td>
<td>14</td>
<td>3</td>
<td>18</td>
</tr>
<tr>
<td>INV1s3</td>
<td>7</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>INV1s4</td>
<td>3</td>
<td>0</td>
<td>3</td>
</tr>
<tr>
<td>INV1s1G</td>
<td>0</td>
<td>0</td>
<td>3</td>
</tr>
<tr>
<td>Total</td>
<td>608</td>
<td>537</td>
<td>565</td>
</tr>
</tbody>
</table>

Fig. 13 in IG FinFET technology over the conventional 3-T FinFET-based design. As shown in Table I, the individual 4-T cell power savings are always about 2× higher than their cell layout area savings, as compared to the conventional 3-T cell.

Table III lists the statistics of different types of gates in the c880 circuits synthesized using the two FinFET design libraries at two different target delays. We observe a significant reduction in the minimum sized NAND/NOR gates with the Extended-library-based design (as they are mostly replaced by the corresponding IG-Cells). Moreover, the number of higher drive strength cells (s2, s3, s4) also reduces. In fact, due to less capacitive loading in the fan-out logic cone of critical paths, the total number of cells reduces. Overall, we obtain significant power (22%) and area (11.6%) savings at \( T_D = 90 \) ps in IG Technology-based c880 circuit synthesis (Fig. 13).

B. Shift in Path Delay Histogram of a Circuit

The area and power savings as offered by the Extended library are obtained by trading off extra delay slacks in the off-critical paths with the switching power. To explore the effect of power and area savings on the individual path delays and their distribution, we plot the path delay histogram of c499 circuit in Fig. 14.

1) 3-T library: Path delay histogram of c499 circuit synthesized with 3-T FinFET library to meet \( T_{d_{tst}} = 160 \) ps.
2) Extended lib: Path delay histogram of the circuit, synthesized with Extended library, meeting the same delay target of 160 ps.
3) Path diff.: The difference in the number of paths in the corresponding delay bins of the path histogram.

The negative portion of the Path diff. curve indicates the delay bins where 3-T library-based circuit has more paths than that of the Extended-library-based circuit. Power savings can be maximized using as many IG FinFET logic cells as possible in the off-critical paths.

C. Impact of 4-T Devices on Circuit Robustness

Selective use of 4-T devices in the noncritical timing arcs may increase the effective number of critical paths, as shown in Fig. 14. Higher number of probable critical paths give rise to circuit robustness and yield concerns under process variation. However, in general, any circuit would have a large number of noncritical paths, as shown in Fig. 14. Even if we consider a reasonable process variation tolerance window (\( T_{W_{process}} \)), as in Fig. 14, there will be some timing slack (although reduced) in a large number of noncritical paths. This extra slack can be used for inserting IG FinFETs.

As \( T_{W_{process}} \) increases, the power (and area) saving for IG FinFET technology gradually diminishes. Hence, design robustness and process tolerance requirements can potentially limit the power and area savings in IG FinFET technology. Therefore, we need to estimate the impact of increased process sensitivity in 4-T FinFET cells on the overall circuit robustness. To assess the impact on robustness, we compare the worst case performance of IG FinFET-technology-based design with that of the 3-T library.

Major sources of process variation for FinFETs are channel length and body thickness variation [2]. The worst corner timing analysis is performed using a universal worst corner library, where cells are characterized considering the International Technology Roadmap for Semiconductors [8] predicted 3σ variation in FinFET device parameters. In this paper, we have considered 3σLgate = 5 nm and 3σTsi = 2 nm.

Fig. 15 shows the steps for worst corner-based power saving analysis. First, a circuit is synthesized using 3-T library to meet a certain delay target, for example, \( T_{d_{3}} \) (step 1a, Fig. 15). Then, we perform timing analysis using the worst corner design library to obtain the worst corner delay \( T_{d_{3_{worst}}} \), considering that all cells are affected with 3σ variation. In the second step, by employing the IG FinFET design library, we synthesize the same circuit with a smaller delay target \( T_{d_{4_{worst}}} \). This introduces an extra slack in the noncritical paths to compensate for an increased variation in the IG cells. Next, as before, by using the worst corner design library, we compute the worst corner delay \( T_{d_{4_{worst}}} \). Then, we try to match these two worst corner delays (i.e., \( T_{d_{3_{worst}}} \approx T_{d_{4_{worst}}} \)) by suitably adjusting \( T_{W_{process}} \) parameter (Fig. 15). A close
match of worst corner delays implies that we have used right TW_{process}, such that both 3-T and IG FinFET-based circuits have similar worst corner performance. Hence, we get the iso-worst corner performance-based power and area savings in Extended library over the conventional 3-T library, as shown in Fig. 15.

Fig. 16 shows the iso-worst corner performance-based power and area savings in different ISCAS85 benchmarks. Overall, we achieve about 5% power and 3% area savings for ISCAS85 benchmarks under iso-worst corner performance constraint. The individual saving reduction is circuit topology dependent and shows wide range of variation (Fig. 16). Negative savings in a couple of circuits (i.e., c432, c1908) demonstrate that a pessimistic worst corner-based design approach may produce unreasonably large TW_{process}, leading to more power/area consumption in IG FinFET technology.

In reality, all the devices in a circuit will not experience the worst 3σ parameter variations. There will be a statistical distribution ranging from best to worst process parameters. Therefore, it is expected to have power savings close to the iso-performance power savings (i.e., 18%) in nominal corner. Essentially, this conservative analysis gives us an idea of minimum savings offered by the IG FinFET technology.

In this paper, we have considered a handful of two-input logic gates to demonstrate that the selective use of 4-T FinFETs leads to low-power circuit designs. In a commercial design, there will be various complex logic cells and more drive strengths for each cell. The proposed design approach can be extended to incorporate multi-input cells and other types of complex logics like, XOR, MUX, AND-OR-INV, etc.

VI. Conclusion

4-T FinFET technology, with independent gate-controlled FinFET devices, has good potential for area efficient low-power circuit design. In this paper, we developed semianalytical delay and power models for IG FinFET-based logic cells, and a generic efficient design library-based circuit synthesis framework. We demonstrate that the IG FinFET-based design provides substantial power and area savings over the conventional 3-T FinFET-based design for a set of ISCAS85 benchmark circuits. Power and area savings are achieved even with a conservative worst corner-based circuit synthesis approach.

References


Animesh Datta (S’03) received the B.Tech. degree in electrical engineering from the Indian Institute of Technology, Kharagpur, India, in 2001, and the Ph.D. degree from the Department of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, in 2006. Currently, he is at Qualcomm Inc., San Diego, CA, as Senior Engineer. During 2001 and 2002, he worked in the Advanced VLSI Design Laboratory, IIT Kharagpur, India, on analog and mixed-signal circuit design. His research interest includes yield-aware system design in scaled technologies, speed-binning aware design optimization, and power-aware system architecture.

Dr. Datta was a recipient of the 2006 IEEE Circuits and Systems Society VLSI Transactions Best Paper Award.
Ashish Goel received the B.Tech. degree in electronics and electrical communications engineering from the Indian Institute of Technology, Kharagpur, India, in 2004. He is currently working toward the Ph.D. degree in electrical and computer engineering at Purdue University, West Lafayette, IN.

His research interest includes modeling and estimation of process variation in deep submicrometer devices and circuits.

Riza Tamer Cakici (S’00) received the B.S. degree in electrical engineering and physics from Bogazici University, Istanbul, Turkey, in 2000. He is currently working toward the Ph.D. degree at the School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN.

His research interests include properties of single/multiple-gate device architectures and circuit design techniques to enable high-performance/low-power nanoscale integration.

Hamid Mahmoodi (S’98–M’06) received the B.S. degree in electrical engineering from Iran University of Science and Technology, Tehran, Iran, in 1998, the M.S. degree in electrical and computer engineering from the University of Tehran, Tehran, in 2000, and the Ph.D. degree in electrical and computer engineering from Purdue University, West Lafayette, IN, in 2005.

He is currently an Assistant Professor of electrical and computer engineering in the School of Engineering, San Francisco State University, San Francisco, CA. His research interests include low-power, robust, and high-performance circuit design for nanoscale technologies. He has many publications in journals and conferences and several patents pending.

Dr. Mahmoodi was a recipient of the 2006 IEEE Circuits and Systems Society VLSI Transactions Best Paper Award and the Best Paper Award of the 2004 International Conference on Computer Design.

Dheepa Lekshmanan received the Bachelor degree in electrical and electronics engineering from PSG College of Technology, Coimbatore, India, in 2001. She is currently working toward the M.S. degree in VLSI design at Purdue University, West Lafayette, IN.

In July 2001, she joined Texas Instruments India, Ltd., as a Design Engineer. Her research interest is low-power VLSI design.

Kaushik Roy (S’83–M’90–SM’95–F’02) received the B.Tech. degree in electronics and electrical communications engineering from the Indian Institute of Technology, Kharagpur, India, in 1983, and the Ph.D. degree from the Electrical and Computer Engineering Department, University of Illinois at Urbana-Champaign, in 1990.

He was with the Semiconductor Process and Design Center of Texas Instruments, Dallas, where he worked on FPGA architecture development and low-power circuit design. He joined the Electrical and Computer Engineering Faculty, Purdue University, West Lafayette, IN, in 1993, where he is currently a Professor and is the Roscoe H. George Professor of electrical and computer engineering. His research interests include VLSI design/CAD for nanoscale silicon and non-silicon technologies, low-power electronics for portable computing and wireless communications, VLSI testing and verification, and reconfigurable computing. He has published more than 400 papers in refereed journals and conferences and has coauthored two books on Low Power CMOS VLSI Design. He is a holder of eight patents.