Design and Analysis of Custom Clock Buffers and a D Flip-Flop for Low Swing Clock Distribution Networks

A Thesis presented

by

Mallika Rathore

to

The Graduate School

in Partial Fulfillment of the

Requirements

for the Degree of

Master of Science

in

Electrical Engineering

Stony Brook University

May 2014
Stony Brook University

The Graduate School

Mallika Rathore

We, the thesis committee for the above candidate for the Master of Science degree, hereby recommend acceptance of this thesis

Dr. Emre Salman - Thesis Advisor
Assistant Professor, Department of Electrical and Computer Engineering

Dr. Milutin Stanacevic - Second Reader
Associate Professor, Department of Electrical and Computer Engineering

This thesis is accepted by the Graduate School

Charles Taber
Dean of the Graduate School
Abstract of the Thesis

Design and Analysis of Custom Clock Buffers and a D Flip-Flop for Low Swing Clock Distribution Networks

by

Mallika Rathore

Master of Science

in

Electrical Engineering

Stony Brook University

2014

With higher integration, power has become a primary concern for IC design. Clock signal has the highest switching activity and can be responsible for up to 40% of the overall power dissipation due to large clock network capacitance. This dissertation presents an approach to reduce this power consumption by providing a 30% reduction in the clock swing, which is accomplished by custom reduced-swing buffers. The objective is to reduce the clock swing without implementing an additional low supply voltage, while also satisfying the slew constraints at multiple process, voltage and temperature (PVT) corners. The low swing buffer is designed using 1V 45nm NCSU technology and the clock frequency considered for this analysis is 1.5GHz. As compared to a conventional buffer, approximately 14% reduction in the dynamic power consumption is achieved while driving a load capacitance of 50fF and maintaining the same clock slew. A novel D flip-flop (DFF) architecture that can operate with a low swing clock is also proposed and compared with existing designs. These architectures are simulated considering a clock and data frequency of 1.5GHz and 150MHz, respectively. In comparison with the other low swing topologies, the proposed low swing DFF topology provides an average reduction of 33.6% and 39.5% in, respectively, the overall dynamic power dissipation and power-delay product. The robustness of the DFF
architectures is also evaluated for different PVT corners. This comparative analysis demonstrates an average improvement of 19.34% and 38.91% in, respectively, CLK-to-Q delay and dynamic power dissipation for the proposed topology. Finally, a low swing clock distribution network is designed and analyzed by combining the custom reduced swing buffers and the proposed DFF architecture. Reliable operation is demonstrated considering a large fan-out of 50 DFFs at the output of the reduced swing buffers.
Dedication Page

To my late grandparents, Mr. Ranvir Singh Shastri and Mrs. Snehlata Singh, and to my loving parents, Mr. Devendra Singh Rathore and Mrs. Nivedita Rathore.
Table of Contents

Abstract iii
List of Figures vii
List of Tables viii
Acknowledgement ix

1 Introduction 1

2 Custom Clock Buffer Design for Low Swing Operation 4
  2.1 Existing Reduced Swing Buffers ............................... 5
  2.2 Modified Reduced Swing Buffers ............................... 6
  2.3 Simulation Results ............................................. 8
    2.3.1 Comparative analysis ................................. 11
    2.3.2 Full swing to reduced swing buffer driving reduced swing to reduced swing buffer ......................... 13
    2.3.3 Robustness to PVT variations ......................... 14
    2.3.4 Robustness to buffer load and clock slew variations 16

3 Custom D Flip-Flop Design for Low Swing Operation 20
  3.1 Low Clock Swing D Flip-Flop Design .......................... 20
  3.2 Proposed Low Swing D Flip-Flop .............................. 24
  3.3 Simulation Results ........................................... 25
    3.3.1 Comparative analysis .................................. 25
    3.3.2 Response to different clock swings .................. 27
    3.3.3 Robustness to PVT variations ....................... 27

4 Low Swing Clock Distribution Network 30
  4.1 Simulation Results ........................................... 30
  4.2 Robustness to PVT Variations ................................ 32

5 Conclusions 34

References 36
## List of Figures

1. Power distribution in various VLSI circuits. ........................................... 2
2. Reduced swing buffer. ........................................................................... 5
3. Reduced swing buffer with diode-connected transistors. ....................... 6
4. Delay chain based reduced swing buffer. ............................................. 7
5. Modified version of the reduced swing buffer with diode-connected transistor. ................................................................. 7
6. Modified version of the delay chain based reduced swing buffer used in this project. ................................................................. 8
7. Reduced swing buffer: (a) FS-RS simulation, (b) RS-RS simulation. ....... 9
8. Reduced swing buffer with diode-connected PMOS: (a) FS-RS simulation, (b) RS-RS simulation. ......................................................... 10
9. Delay chain based reduced swing buffer: (a) FS-RS simulation, (b) RS-RS simulation. ................................................................. 11
10. FS-RS buffer driving a RS-RS buffer, which drives a 50fF load. ......... 14
11. I-V characteristics of NMOS with varying temperature. ................. 15
12. Delay chain based reduced swing buffer parametric analysis with variable clock slew: (a) FS-RS simulation, (b) RS-RS simulation. .... 18
13. Delay chain based reduced swing buffer parametric analysis with variable buffer load: (a) FS-RS simulation, (b) RS-RS simulation. .... 19
14. Low swing DFF architectures: (a) Low clock swing C$^2$MOS and sense amplifier (SA) (L$C^2$MOS_SA) DFF, (b) Reduced clock swing flip-flop (RCSFF), (c) NAND-type keeper DFF (NDKFF), (d) Contention reduced flip-flop (CRFF). ......................................................... 22
15. Clock sub-circuit used for DFF architectures RCSFF, NDKFF, CRFF, passgate DFF and the proposed DFF in this thesis. .............. 23
16. Passgate DFF for low swing clock signal. ............................................ 23
17. Proposed low swing DFF. ................................................................. 24
18. Response of DFFs to different clock swings: (a) CLK-to-Q delay (ps) vs clock voltage swing (mV), (b) Total power dissipation ($\mu$W) vs clock voltage swing (mV). ......................................................... 28
19. Simplified low swing clock distribution network. .............................. 30
20. Simulation results of a simplified low swing clock distribution network. ................................................................. 31
# List of Tables

1. Simulation results for conventional buffer and reduced swing buffers, considering both FS-RS and RS-RS operation. ........................................... 12
2. Simulation results of FS-RS buffer driving a RS-RS buffer, which drives a load of 50fF. .......................................................... 14
3. Worst-case corner simulation results for conventional buffer and delay chain based reduced swing buffer. .............................. 16
4. Simulation results with a clock swing of 700mV. ............................ 26
5. Worst-case corner analysis of the proposed DFF. ............................. 29
6. Simulation results for a simplified low swing clock distribution network. ................................................................. 32
7. Worst-case analysis of a simplified low swing clock distribution network. ................................................................. 33
Acknowledgement

I would like to take this opportunity to thank everyone who have helped me through this incredible journey in Stony Brook as a Master’s student. I express my thanks to the entire Department of Electrical and Computer Engineering, the faculty members and staff, and my friends, for being a part of the wonderful two years and supporting me at every point.

I express my sincere thanks and gratitude to my thesis Advisor, Dr. Emre Salman for his tremendous support and encouragement. He has been a constant inspiration for me over the past two years. He has been very patient with me answering my smallest doubts with excellent detailing. His valuable feedback at every point has been a great motivation towards my work, which would not have been possible without his guidance. I also thank Dr. Milutin Stanacevic, the second reader for the thesis, for taking out some time from his schedule to review the work and provide valuable suggestions.

I would also like to thank our industry liaisons in Semiconductor Research Corporation (SRC), Savithri Sundareswaran, Anis Jarrar and Benjamin Huang, for their constant guidance which has played a major role towards my research.

I would also like to thank the fellow students at Nanoscale Circuits and Systems (NanoCAS) Laboratory including Zhihua Gan, Hailang Wang, Weicheng Liu, Ph.D. candidates and Peirong Ji. They have been very helpful while discussing the technical details of my research work as well as fixing software issues in the lab.

Last but not the least, I extend my thanks to my family for their constant support, love and encouragement.

Thank you.
1 Introduction

Over the past several decades, IC design methodology has shifted from a logic-centric process to an interconnect-centric approach [1]. With reducing feature sizes and increasing complexity of digital ICs, power consumption has become one of the primary concerns in the semiconductor industry [2]. Synchronous systems, which are prevalent in commercial microprocessors, utilize a global clock signal distributed on-chip to drive the clocked elements. Clocking is therefore a crucial process since the data movement across an IC is accomplished via the clock signal [1].

To improve the global performance and satisfy on-chip timing constraints, there is an increase in the number of pipeline registers, which has increased the overall clock frequency to multi-gigahertz levels. Since all of the registers simultaneously switch with each clock edge, significant power is dissipated in clock distribution networks. According to [3], a clock distribution network can be responsible for 20-45% of the total power consumption on-chip (Fig. 1), out of which 90% of the power is dissipated at the leaves (last stage of interconnects and flip-flops) of the network.

Overall dynamic power dissipation to charge-discharge a load capacitance is

\[ P = \alpha C_{load} V_{DD}^2 f, \]  

where \( \alpha \) is the activity factor (which is equal to one if the signal is clock), \( C_{load} \) is the total switching capacitance, \( V_{DD} \) is the supply voltage and \( f \) is the frequency.

Based on this equation, one of the approaches to reduce power dissipation is to reduce the overall fan-out of the clock network, which is not a practical approach due to higher integration levels. Another solution is to scale down the \( V_{DD} \) supply at the expense of performance degradation [4]. However, a considerably efficient solution is to reduce the swing of the clock signal. Rewriting (1) to include the swing of a signal, \( V_{swing} \),

\[ P = \alpha C_{load} V_{DD} V_{swing} f. \]  

From (2), a linear reduction in the clock swing can theoretically result in a linear reduction in the overall dynamic power dissipation across a clock distribution network.
With technology scaling, global interconnects have become more resistive and capacitive, which has deteriorating effects on the signal integrity of critical signals such as clock. Furthermore, application of reduced swing clock in the clock distribution network increases the sensitivity of the signal to noise, further degrading the signal and noise integrity. Buffer insertion along the clock distribution network alleviates this issue and also improves the transition times of the clock signals (slew), at the cost of increased power and area. Moreover, to implement reduced swing clock in the network, regular buffers need an additional low supply voltage to reduce the swing of the clock signal, which increases the cost. An alternative approach to obtain low swing clock is to implement reduced swing buffers which reduce the swing of the clock signal using a single $V_{DD}$ supply. An issue to be considered while designing a reduced swing buffer is driving PMOS transistors with a low swing clock, which results in increased contention current. This issue is further exacerbated with process and environmental variations.

A low swing clocking methodology has been discussed in [5], [6], [7] and [8]. The design approach in these works utilizes a single $V_{DD}$ supply to achieve low swing clock signaling. However, as discussed later, the reduced swing signaling achieved in [6] and [7] has slew degradation which is alleviated by the design implemented in [8]. Although the latter achieves up to 22% power savings under nominal conditions [8], the restoration of the low swing clock signal to full swing at the clock sinks (before the flip-flops) reduces the power savings achieved from
low swing signaling throughout the clock distribution network.

To overcome this issue, several papers discuss the design of low clock swing based D flip-flops (DFF). The designs proposed in [9], [3], [10] and [11] implement clocked PMOS transistors in the DFF which creates reliability issues when driven by a low swing clock signal. This issue is alleviated in [12] which replaces the transmission gates in the conventional flip-flop with NMOS passgates. The DFF design proposed in this project has a similar approach, but achieves better performance and robustness as compared to other designs.

The rest of the dissertation is organized as follows. Chapter 2 discusses and analyzes the reduced swing buffers for low swing clocking. Chapter 3 analyzes the existing reduced swing clock based DFF topologies while proposing a novel low clock swing DFF architecture. The effect of the reduced swing buffer design and proposed DFF topology in low swing clock distribution network is discussed in Chapter 4 by driving the proposed low swing DFF with the reduced swing buffers. The thesis is concluded in Chapter 5.
2 Custom Clock Buffer Design for Low Swing Operation

On-chip clock signal has high switching activity among other global signals and can be responsible for up to 50% of the overall dynamic power consumption due to large clock network capacitance [1]. A practical approach to alleviate this issue is to reduce the swing of active on-chip signals, such as clock [6]. To improve signal and noise integrity, buffers are inserted along the clock distribution network at regular intervals. Traditionally, for full swing clocks, conventional buffers are used in the clock distribution network, but for low swing clock signaling, these full swing buffers should be replaced by reduced swing buffers. Based on the location and function, the reduced swing buffers are of three types [8]:

1. Full swing to reduced swing buffer (FS-RS)
2. Reduced swing to reduced swing buffer (RS-RS)
3. Reduced swing to full swing buffer (RS-FS)

While the FS-RS buffer is inserted at the clock source, i.e., the root of the clock tree, to reduce the swing of the incoming full swing clock signal, the RS-RS buffer is inserted throughout the clock distribution network to improve the signal integrity of the low swing clock signal and satisfy slew constraints. The RS-FS buffer is inserted at the leaves of the network to drive the leaf cells (flip-flops) operating at full $V_{DD}$ [8]. However, converting the low swing clock signal to full swing at the leaf cells sacrifices the power savings achieved with low swing clock signaling since most of the clocking power is consumed in the last stage of a clock distribution network. Thus, it is critical to drive the leaf cells with a low swing clock signal, as achieved in this thesis by introducing a novel low swing DFF (see Chapter 3).

Various buffer architectures have been proposed to accomplish on-chip reduced swing clock distribution, a few of which are discussed below. The primary objective, as mentioned previously, is to reduce the swing of the clock signal without using an additional supply voltage.
Section 2.1 provides an overview of existing reduced swing buffer topologies. In this project, these existing designs are topologically modified to satisfy the required design constraints. These modifications are discussed in Section 2.2. The simulation results of the modified buffers are analyzed in Section 2.3. The robustness is also evaluated for different process technologies and environmental variations. Finally, the reliability of the reduced swing buffer architectures is investigated by introducing variations to the buffer load and input clock slew.

2.1 Existing Reduced Swing Buffers

The effectiveness and robustness of low swing signaling have been discussed in [6]. The reduced swing buffer used to obtain low swing signal in the network is shown in Fig. 2. The two inverters driving the NMOS transistors prevent contention and provide a charge-discharge path for the output load capacitance. The NMOS transistor connected to $V_{DD}$ is incapable of transmitting a strong logical-1. This buffer architecture, therefore, reduces the swing of the signal based on the threshold voltage drop across the NMOS transistor (N1). The clock swing is, therefore, controlled by sizing the transistor, while considering the fan-out of the buffer.

![Figure 2: Reduced swing buffer [6].](image)

Another reduced swing buffer design is proposed in [7], which utilizes a diode-connected PMOS and NMOS (Fig. 3) to reduce the swing of the clock signal from (0 to $V_{DD}$) to ($V_{th}$ to $V_{DD} - |V_{tp}|$). Similar to [6], this design also obtains low swing clock based on the threshold voltage drop across diode-connected PMOS and NMOS transistors.

The dependence of the swing level on the threshold voltage of a transistor affects the reliability and integrity of the clock signal due to process variations. More importantly, there is a significant increase in the clock slew which has more deteriorating effects when considering process, voltage and temperature (PVT)
Figure 3: Reduced swing buffer with diode-connected transistors [7].

variations. Another design proposed in [8] removes this dependence on threshold voltage of the device by utilizing a delay produced by a delay chain to reduce the swing of the clock signal to a desired level (Fig. 4). As shown in this figure, the output of the inverter at the input stage is provided to the delay chain as well as the output inverter stage, i.e., N2 and P2. This delay chain provides a sufficient time window between signals X and Y during which the output is allowed to transition to a desired level. Outside this time window, the transistors N3 and P3 are turned off at, respectively, $V_L$ and $V_H$, such that $V_L > 0$ and $V_H < V_{DD}$. Thus, a reduced swing clock is obtained at the output. The swing of the clock signal depends on the delay time provided by the delay chain, the load capacitance of the buffer, and the output current of the last stage.

2.2 Modified Reduced Swing Buffers

The buffers discussed previously are modified to obtain the desired reduction in clock swing (reduced $V_{DD}$ while maintaining the same $V_{SS}$). The buffer design used for low swing clock signaling in [7] (Fig. 3) is slightly modified by removing the diode-connected NMOS (Fig. 5) so as to reduce the swing of the clock from 0 to $(V_{DD} - |V_{tp}|)$.

Similarly, the buffer design proposed in [8] is modified as shown in Fig. 6 by removing the NMOS transistor, N3, at the output stage of the buffer. The clock swing is, therefore, reduced from 0 to $V_H$ instead of $V_L$ to $V_H$, where $V_L > 0$ and $V_H < V_{DD}$. 
Figure 4: Delay chain based reduced swing buffer [8].

Figure 5: Modified version of the reduced swing buffer with diode-connected transistor.
Despite the advantage of reduced power consumption, a major drawback is the flow of large contention current while driving the input stage with a reduced swing clock signal since the PMOS transistors driven by low swing clock signal is not completely turned off. This issue can be alleviated by multi-threshold devices in the buffer architecture. For example, the threshold voltage of the transistors driven by low swing clock signal can be increased by using high $V_{th}$ transistors. The contention current can also be reduced to approximately 50% with a slight increase in the channel length of the transistor. Specifically, from the simulation results, it is observed that the leakage power is significantly reduced by increasing the channel length of one of the transistors (PMOS or NMOS) at the input stage inverter of the reduced swing buffer. This technique reduces the leakage current by 8.7% and 97%, respectively, for FS-RS and RS-RS buffer. In the design proposed, the length of the PMOS transistors at the input, P1, and within the delay chain is increased by, respectively, 2x and 1.4x. These changes are determined based on the clock slew requirement which should be less than 100ps, i.e., approximately 15% of the clock period.

2.3 Simulation Results

The reduced swing buffers discussed in the previous section are designed in Cadence using 1V 45nm NCSU technology, which relies on predictive technology.
models [13]. The clock frequency considered during the analysis is 1.5GHz with an input transition time of 10ps. The devices used in the design have nominal threshold voltages to ensure sufficient reliability during multi-corner analysis. In order to replicate the large fan-out of the buffers in a typical clock distribution network, the transistors in the buffers are sized to drive a load capacitance of 50fF. The buffers are designed to obtain approximately 30% reduction in the clock voltage swing at the output for both full swing to reduced swing (FS-RS) and reduced swing to reduced swing (RS-RS) operation.

The reduced swing buffer architecture in [6] reduces the swing of the clock signal based on the threshold voltage of the NMOS transistor $N_1$. Therefore, to drive a large load of 50fF and obtain a 30% reduction in the clock swing, the size of the transistor should be increased to at least 70µm, which is significantly large for a 45nm technology. The inverter driving this transistor is also sized accordingly. The reduction in the clock swing as obtained from the simulation results is shown in Fig. 7.

![Figure 7: Reduced swing buffer [6]: (a) FS-RS simulation, (b) RS-RS simulation.](image)

Another buffer architecture proposed in [7] also reduces the swing of the clock signal using threshold voltage drop across the diode-connected PMOS and NMOS transistors. As discussed previously, the design is modified to obtain the desired range of output clock voltage swing (Fig. 5). This buffer architecture provides a 30% reduction in the clock voltage swing with transistor sizes ranging between 20-40µm, which is still significantly large. Furthermore, the clock slew and the delay between input and output are significantly degraded due to diode-connected transistors, particularly during the low-to-high transition of the clock signal. This
behavior requires careful sizing of the transistors. The simulation results for the FS-RS and RS-RS operation of this buffer architecture are shown in Fig. 8.

![Simulation Results](image)

Figure 8: Reduced swing buffer with diode-connected PMOS [7]: (a) FS-RS simulation, (b) RS-RS simulation.

The buffer architecture proposed in [8] alleviates these issues resulting from the dependence of the output clock swing on the threshold voltage. This design utilizes a delay chain driven transistor at the output instead of a diode-connected transistor. The delay chain provides a time window which determines the output voltage swing of the clock. Due to sufficient clock slew at the output, this design permits increasing the channel length of the low swing clock driven transistor to reduce the overall leakage power. The maximum transistor width does not exceed 2\( \mu \)m while driving a load of 50fF, which is significantly smaller as compared to the other reduced swing buffers. The reduction in the clock swing at the output is obtained, as shown in the simulation results in Fig. 9.

Section 2.3.1 provides a comparative analysis of these buffers in terms of power, delay and slew. To verify the functionality of the designs, the RS-RS buffer is driven by FS-RS buffer to represent the connections within a clock tree. This analysis is presented in Section 2.3.2. The robustness of delay chain based reduced swing buffer to PVT variations is analyzed in Section 2.3.3. The reliability of the buffer is also evaluated for different fan-outs and input clock slew in Section 2.3.4.
2.3.1 Comparative analysis

The buffers discussed previously are compared with a conventional full swing buffer in terms of dynamic power consumption, leakage power, average delay between the input and output of the buffer, and the clock slew obtained at the output. The conventional buffer consists of two inverters and is sized to drive the same load capacitance (50fF). Considering the input clock frequency and slew to be 1.5GHz and 10ps respectively, Table 1 lists the data obtained from the simulation of reduced swing buffers. It should be noted here that the leakage power is averaged for high and low static values of the input clock. Similarly, the output slew is averaged for both the low-to-high and high-to-low transitions of the clock signal.

Full swing to reduced swing buffer - Considering the FS-RS operation of the buffers [Table 1], the performance of [6] and [7] is degraded considerably in comparison to the conventional buffer. For the delay chain based buffer, despite an increase in the leakage power by approximately 21%, the overall dynamic power consumption and the clock slew at the output are reduced, respectively, by 13.15% and 7.82%, in comparison with the conventional buffer. Furthermore, despite an increase in the number of transistors in the buffer design, the average delay of the buffer is increased by a negligible amount of 867fs, which is comparable to that of the conventional buffer.
Table 1: Simulation results for conventional buffer and reduced swing buffers, considering both FS-RS and RS-RS operation.

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>$V_{out}$ (mV)</td>
<td>1000</td>
<td>698.93</td>
<td>693.96</td>
<td>700.31</td>
</tr>
<tr>
<td>Dynamic power ($\mu$W)</td>
<td>78.83</td>
<td>145.6</td>
<td>148.8</td>
<td>301</td>
</tr>
<tr>
<td>Average delay (ps)</td>
<td>49.271</td>
<td>76.55</td>
<td>96.001</td>
<td>102.208</td>
</tr>
<tr>
<td>Output slew (ps)</td>
<td>58.617</td>
<td>62.154</td>
<td>66.533</td>
<td>65.524</td>
</tr>
<tr>
<td>Leakage power (nW)</td>
<td>22.85</td>
<td>411.02</td>
<td>1577</td>
<td>257</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td>7056.5</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td>27.695</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td>250.19</td>
</tr>
</tbody>
</table>
Reduced swing to reduced swing buffer -  For RS-RS operation, low swing clock signal (0-700mV) is applied at the input of the reduced swing buffers. The leakage power for the buffers in [6] and [7] increases by approximately 69x and 308x, respectively, as compared to the conventional buffer. Increase in leakage power is exacerbated when analyzing the circuit at the PVT corners, making these circuits less robust. The delay chain based reduced swing buffer also exhibits 11x increase in the leakage power, which is primarily due to the contention current flow at the input inverter, where a PMOS transistor driven by a low swing clock signal is not completely turned off. This leakage current is obtained after increasing the channel length of the clock-driven PMOS transistor by 2x. Despite the high leakage power, the dynamic power dissipation and the output clock slew are reduced, respectively, by 15.4% and 6.7% as compared to the conventional full swing buffer. These values are comparable to the FS-RS operation of the same buffer design. Contrary to the average delay obtained during the FS-RS operation of the buffer, there is approximately 11% increase in the delay across the RS-RS buffer in comparison to the conventional full swing buffer.

2.3.2 Full swing to reduced swing buffer driving reduced swing to reduced swing buffer

In the clock tree design, the full swing clock signal generated by the clock source, i.e., phase-locked loop (PLL), is converted to a reduced swing clock signal by the FS-RS buffer which then drives the RS-RS buffer. To represent this behavior and ensure correct functionality of the design, FS-RS buffer is used to drive a RS-RS buffer which further drives a load of 50fF. Since the FS-RS buffer drives only one RS-RS buffer, the design is slightly modified to accommodate this small fan-out by decreasing the number of inverters in the delay chain to 3 and downsizing the transistors at the output stage of the buffer. Using a similar clock frequency and slew, the design is simulated, as shown in Fig. 10.

It is shown in this figure that the desired reduced swing is obtained and maintained at the output of both buffers. There is an increase in the delay and power, as expected. To calculate the leakage power, piecewise-linear input is provided as the input to model the clock gating functionality, and the leakage power is analyzed over the simulation period when the clock is gated. This is required, because when the input clock is at logic-1, the output of the FS-RS buffer does not exceed 400mV, which results in excessive amount of leakage at the input of the RS-RS buffer. It is to be noted that the output slew and average delay are averaged for the low-to-high and high-to-low transitions of the signal. Similarly, the leakage
power is also averaged for low and high static input values.

Table 2: Simulation results of FS-RS buffer driving a RS-RS buffer, which drives a load of 50fF.

<table>
<thead>
<tr>
<th>Parameter</th>
<th>FS-RS buffer</th>
<th>RS-RS buffer</th>
</tr>
</thead>
<tbody>
<tr>
<td>$V_{out}$ (mV)</td>
<td>699.9</td>
<td>700.2</td>
</tr>
<tr>
<td>Output Slew (ps)</td>
<td>17.1</td>
<td>54.6</td>
</tr>
<tr>
<td>Average delay (ps)</td>
<td>21.7</td>
<td>77.3</td>
</tr>
<tr>
<td>Dynamic power ($\mu$W)</td>
<td></td>
<td>79.48</td>
</tr>
<tr>
<td>Leakage power (nW)</td>
<td></td>
<td>367.8</td>
</tr>
</tbody>
</table>

2.3.3 Robustness to PVT variations

A critical challenge in nanoscale ICs is the variations incurred during fabrication and fluctuations in operating voltage and temperature [1]. The delay chain based reduced swing buffer is simulated and analyzed for different process corners, operating voltage and temperature (PVT) variations. It is important to analyze the robustness and reliability of the design for varying PVT conditions, particularly at the fast and slow corners.

Due to an increase in the threshold voltage of the transistors, the slow corner is considered for the worst-case delay analysis. On the contrary, fast corner is considered for the dynamic power and leakage power analysis due to decrease in
the threshold voltage of the devices. Furthermore, the operating voltage of the buffer design is decreased and increased by 10% for, respectively, slow and fast corners, to consider voltage variations.

The threshold voltage of the transistors exhibits almost a linear dependence on temperature such that there is a reduction in the threshold voltage of the transistor as the operating temperature of the circuit is increased [14]. This relationship can be approximated by

$$V_t(T) = V_t(T_r) - k_{vt}(T - T_r),$$  \hspace{1cm} (3)

where $T$ is the absolute temperature, $T_r$ is the room temperature and $k_{vt}$ is approximately 1-2 mV/K [14]. On the contrary, the carrier mobility of the transistors decreases with increasing temperature according to the relation,

$$\mu(T) = \mu(T_r)(\frac{T}{T_r})^{-k_{\mu}}$$  \hspace{1cm} (4)

where $k_{\mu}$ is a fitting parameter approximately equal to 1.5 [14]. It is observed that the increase in mobility is more effective at lower temperatures due to which the worst-case delay and slew are analyzed at high temperature (165°C). However, due to reduced feature size, the threshold voltage variation at lower temperatures cannot be neglected. Therefore, the circuit is simulated at both high and low temperatures for the worst-case dynamic power analysis.

The process and voltage corners for leakage power analysis is similar to that considered for dynamic power analysis. From [14], the current driving capability of a transistor with variations in temperature depends on $V_{GS}$ of the transistor, according to which the transistor shows an exponential decrease in current when $V_{GS}$ drops below the threshold voltage level. This exponential decrease is the sub-

![Figure 11: I-V characteristics of NMOS with varying temperature [14.](image)](image)
threshold leakage current which increases with increasing temperature, as shown in Fig. 11. Therefore, the operating temperature considered for leakage power analysis of the designs is 165°C.

The worst-case analysis results of the delay chain based reduced swing buffer and the conventional buffer are summarized in Table 3. In comparison to the conventional buffer, the delay chain based reduced swing buffer exhibits an increase in the average delay. The slew and dynamic power consumption, however, are almost comparable to that of the conventional buffer. The leakage power for the FS-RS buffer is observed to be less than the conventional buffer, while there is an increase in the leakage power when a reduced swing clock is used to drive the buffer.

### 2.3.4 Robustness to buffer load and clock slew variations

All of the simulation results in the previous sections are obtained considering an input clock slew of 10ps and buffer output load of 50fF. While analyzing the buffer design at the worst case corners, it is observed that the clock slew at the buffer output can increase to approximately 190ps, which drives RS-RS buffers. The buffers should therefore operate reliably for input slew as high as 200ps. To

<table>
<thead>
<tr>
<th>PVT corner</th>
<th>Parameter</th>
<th>Conventional buffer</th>
<th>Delay chain based reduced swing buffer</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td>FS-RS</td>
<td>RS-RS</td>
</tr>
<tr>
<td>SS, 0.9V, 165°C</td>
<td>Delay (ps)</td>
<td>139.7</td>
<td>180.8</td>
</tr>
<tr>
<td></td>
<td>Output Slew (ps)</td>
<td>183.6</td>
<td>187.8</td>
</tr>
<tr>
<td>FF, 1.1V, 165°C</td>
<td>Dynamic Power (µW)</td>
<td>98.3</td>
<td>90.3</td>
</tr>
<tr>
<td></td>
<td>Leakage Power (µW)</td>
<td>1.6</td>
<td>1.1</td>
</tr>
</tbody>
</table>

Table 3: Worst-case corner simulation results for conventional buffer and delay chain based reduced swing buffer.
verify the robustness, the FS-RS and RS-RS buffers are analyzed for different input slews ranging from 40ps to 200ps. The simulation results obtained from the parametric analysis of the FS-RS and the RS-RS buffer for clock slew variations are shown in Fig. 12. From the figure, it is shown that variation in input clock slew does not affect the output clock slew or the output voltage level (700mV). The output, however, shifts slightly with every increment in the input clock slew which is due to an increase in the transition times of the input clock signal.

Another variation that needs to be considered to verify the reliability of this buffer is the variations in the fan-out of the buffer. The buffer design is highly sensitive to these variations, because the sizing of the output transistors is determined based on the output load of the buffer. The output load is varied from 20fF to 80fF, with iterations of 10fF, using parametric analysis. The simulations results obtained for the load variations are shown in Fig. 13. As observed from this figure, for constant transistor sizes, every iteration in the buffer load significantly varies the output voltage level. Since the buffer is designed and sized for 50fF, the desired voltage level of 700mV is obtained at that particular iteration. However, for 20fF and 80fF, the voltage levels increase and decrease to approximately 1V and 480mV, respectively.

These variations can have deteriorating effects, especially in terms of leakage power dissipation which increases considerably when a reduced clock swing drives another RS-RS buffer in the clock tree. Furthermore, at the leaf of the clock tree, the low swing DFF should work reliably at this low voltage level. Note that if the buffer loads in the clock tree are known in advance, the buffers inserted in the clock tree can be sized accordingly. Similarly, a clock tree synthesis algorithm can be built to insert these buffers while considering the output loads. These approaches will ensure that the reduced swing buffer is more robust to load variations.
Figure 12: Delay chain based reduced swing buffer parametric analysis with variable clock slew: (a) FS-RS simulation, (b) RS-RS simulation.
Figure 13: Delay chain based reduced swing buffer parametric analysis with variable buffer load: (a) FS-RS simulation, (b) RS-RS simulation.
3 Custom D Flip-Flop Design for Low Swing Operation

As described in the previous chapter, clock distribution network is a primary source of on-chip power consumption. Clock networks consume 20-45% of total on-chip power [3]. This power dissipation is the result of increased pipelining in the design which has led to an increase in the number of flip-flops and therefore, the total interconnect length of the clock network [15]. Approximately 90% of the clock power is consumed by the flip-flops and last branches of a clock tree [3]. It is therefore critical to drive the sink flip-flops with a low swing clock signal to maximize the power savings. Various D flip-flop (DFF) architectures are discussed in this chapter which can work efficiently when driven by a low swing clock signal. If a conventional full swing flip-flop is used in low swing operation, the PMOS transistors driven by low swing clock signal are not completely turned off. This behavior results in robustness and reliability issues as the transmission gates fail to maintain the logic-0 level, producing a glitch of approximately 380mV. There is also a 15% increase in clock-to-Q delay when a reduced swing clock is used to drive the conventional full swing flip-flop, which is exacerbated at the worst-case corners.

Section 3.1 provides an overview of the current low clock swing DFF topologies. It is followed by the proposal of a novel DFF topology in Section 3.2. A comparative analysis of the DFF architectures is provided in Section 3.3.

3.1 Low Clock Swing D Flip-Flop Design

With reducing feature size, minimizing the leakage current has become a significant issue towards achieving low power design in addition to lowering the clock voltage swing and reducing the charge-discharge capacitance [9]. A low swing clock topology based on clocked CMOS (C^2MOS) and sense amplifier (SA) is proposed in [9] (L_C^2MOS_SA, Fig. 14a). This design reduces the charge-discharge capacitance and implements a conditional pre-charge and discharge technique to achieve low power consumption. The design is area efficient and a considerable reduction in leakage is also obtained with this topology. How-
ever, although this design does not need an additional supply voltage for the clock sub-circuit, the use of diode-connected PMOS to reduce the clock swing is not an efficient approach as it significantly increases the low-to-high transition time of the low swing clock. This approach also considerably affects the robustness and reliability of the circuit at the worst-case corner. Another issue is the need of full swing clock signal at the slave stage of the DFF which defies our objective of using a low swing clock signal in the DFF.

Another topology, reduced clock swing flip-flop (RCSFF, Fig. 14b), proposed in [3], uses single low swing clock for the DFF operation. This topology uses an additional low voltage supply, $V_{DDL}$ (700 mV) in the clock sub-circuit to provide low swing clock to the respective transistors in the design. The clock sub-circuit used for this circuit is shown in Fig. 15. However, when the clock signal goes high ($V_{DDL} < V_{DD}$), the PMOS transistors are not completely turned off, resulting in large leakage current. To fix this issue, the design uses an additional voltage supply to connect the well of the clocked PMOS transistors at a higher voltage bias than the supply voltage, thereby increasing the threshold voltage of the clocked transistors. An additional well increases the design area as well as the complexity of the design.

A NAND-type keeper flip-flop design proposed in [10] (NDKFF, Fig. 14c), does not require a separate well, but it causes excessive leakage current flow through transistors P2, N1-N3 when node X goes low. Also the level-keeping transistors, i.e., P2, N4, N5 and I1-I2, have a race condition when node X transitions from low to high, resulting in an increase in the transition time of the output. This effect is exacerbated during worst-case delay analysis of the design and can be partially alleviated by carefully sizing the transistors.

Contention reduced flip-flop proposed in [11] (CRFF, Fig. 14d) utilizes a pulsed clock signal to provide a short transparency window during which the output is discharged through the NMOS transistors N1-N4. During this transparency window, the clocked transistors P5 and P6 disconnect the CMOS latch (I1-I2) to prevent any contention current. Furthermore, transistors P1 and P2 are controlled by input D through P3 and P4 which further reduces contention. However, there still remains the issue of completely turning off the PMOS transistor due to low swing clock signal.

In all of the topologies discussed above, driving PMOS transistors with low swing clock has been the primary issue resulting in large contention current flow. The DFF design provided in [12] alleviates this issue by using passgate NMOS transistors instead of transmission gates, as shown in Fig. 16. Due to the inefficient transmission of logic-1 across the passgate transistor, a weak keeper PMOS tran-
Figure 14: Low swing DFF architectures: (a) Low clock swing C\textsuperscript{2}MOS and sense amplifier (SA) (L_C\textsuperscript{2}MOS_SA) DFF [9], (b) Reduced clock swing flip-flop (RCSFF) [3], (c) NAND-type keeper DFF (NDKFF) [10], (d) Contention reduced flip-flop (CRFF) [11].
A transistor is included in the design to pull-up node X and Y to $V_{DD}$. Since no clocked PMOS transistors are used in this design, the total leakage power is considerably reduced. However, there is still contention current flow through the transistors P2, N2, N1 and P4, N4, N3 in the corresponding master and slave latches during the high-to-low transition of the nodes X and Y, respectively. This contention current can be reduced to some extent by carefully sizing the respective transistors.

Figure 16: Passgate DFF for low swing clock signal [12].
3.2 Proposed Low Swing D Flip-Flop

The proposed DFF architecture, shown in Fig. 17, is similar to the conventional full swing DFF where the primary difference is the use of passgates instead of transmission gates in the design. The clock sub-circuit used for this topology is shown in Fig. 15. The idea of using passgates is similar to the DFF design presented in [12], but there are some important differences, as shown in Fig. 17. Since there are no PMOS transistors driven by a low swing clock signal, the contention current is considerably reduced. However, NMOS passgate is incapable of pulling up the node Y to $V_{DD}$. A keeper PMOS and pull-down logic consisting of two NMOS transistors are added to improve this transition. Another PMOS transistor controlled by input D is added to avoid contention current flow through the transistors P4, N2, N1 when the node Y transitions from $V_{DD}$ to $GND$. Similarly, node X is used to drive the transistor P7 in the slave mode. The sizing of the transistors and the use of multi-threshold devices in the design are exploited to accomplish low power-delay characteristics and balance related trade-offs.

![Figure 17: Proposed low swing DFF.](image-url)
3.3 Simulation Results

The proposed DFF is designed along with $L\_C^2MOS\_SA$ [9], RCSFF [3], ND-KFF [10], CRFF [11] and passgate DFF [12] topologies using 1V 45nm NCSU technology. The clock and data frequencies considered for these simulations are, respectively, 1.5 GHz and 150 MHz. The input clock slew is 10ps. For fair comparison, the transistors in each design are sized to obtain a CLK-to-Q delay in the range of 65-70ps for a fan-out of 5fF. High $V_{th}$ and low $V_{th}$ transistors are also used in the designs to accommodate the power-delay trade-off. Both NDKFF and CRFF use pulsed low swing clock signal which should be long enough to ensure proper functionality. To achieve a fair comparison, for all of the DFF architectures, the swing of the clock signal is reduced by using an additional low supply voltage of 700mV, which is used as $V_{DDL}$ for the inverter chain. The output of this inverter chain drives the clocked transistors in the flip-flop. The $L\_C^2MOS\_SA$ design as mentioned in [9] drives the slave stage with a full swing clock and the clock sub-circuit uses a diode connected PMOS to reduce the clock swing. To compare with other low swing clock driven DFF topologies, the same topology is modified by integrating low swing clock for the slave stage and using an additional low supply voltage in the clock sub-circuit.

Section 3.3.1 compares the simulation results of the DFF architectures discussed in the previous chapter. The reliability of the DFF topologies is verified for different clock voltage swings in Section 3.3.2. The robustness of the proposed DFF design is analyzed for PVT variations in Section 3.3.3.

3.3.1 Comparative analysis

Assuming the clock swing is 700mV, a comparative analysis of the DFF architectures is achieved. The results are listed in Table 4. The leakage power is averaged for the leakage values obtained while considering four possible non-transitional combinations of data and clock signals. From the table, it is shown that although the leakage power for $L\_C^2MOS\_SA$ [9] is approximately half of the leakage of the proposed topology, the overall dynamic power and power-delay product (PDP) of the proposed topology exhibit an average improvement of 33.6% and 39.5%, respectively. Furthermore, comparing the proposed DFF design with [12], the proposed architecture provides a considerable improvement in performance in terms of PDP and leakage power. A rough estimate of the area is obtained by comparing the total width of PMOS and NMOS transistors in the designs. As listed in Table 4, the proposed topology achieves less area than modified version of [9], [3], [10]
Table 4: Simulation results with a clock swing of 700mV.

<table>
<thead>
<tr>
<th>DFF Topology</th>
<th>CLK-to-Q delay (ps)</th>
<th>Power (μW)</th>
<th>PDP (fW.s)</th>
<th>Leakage power (nW)</th>
<th>Total transistor width (nm)</th>
<th>Setup time (ps)</th>
<th>Hold time (ps)</th>
</tr>
</thead>
<tbody>
<tr>
<td>PROPOSED</td>
<td>64.1</td>
<td>6.6</td>
<td>0.42</td>
<td>68.1</td>
<td>6100</td>
<td>-1.7</td>
<td>17.8</td>
</tr>
<tr>
<td>PASSGATE [12]</td>
<td>70.5</td>
<td>7.3</td>
<td>0.51</td>
<td>109.5</td>
<td>5025</td>
<td>6.7</td>
<td>5.1</td>
</tr>
<tr>
<td>L$_2$C$_2$MOS-SA [9]</td>
<td>69.7</td>
<td>8.0</td>
<td>0.56</td>
<td>37.2</td>
<td>5345</td>
<td>27.9</td>
<td>0.9</td>
</tr>
<tr>
<td>L$_2$C$_2$MOS-SA-modified</td>
<td>71.3</td>
<td>9.3</td>
<td>0.67</td>
<td>34.5</td>
<td>7745</td>
<td>17.5</td>
<td>-9.2</td>
</tr>
<tr>
<td>RCSFF [3]</td>
<td>70.6</td>
<td>17.2</td>
<td>1.22</td>
<td>82.0</td>
<td>12340</td>
<td>-26.9</td>
<td>42.6</td>
</tr>
<tr>
<td>NDKFF [10]</td>
<td>69.4</td>
<td>12.0</td>
<td>0.83</td>
<td>196.6</td>
<td>8950</td>
<td>-6.0</td>
<td>-62.7</td>
</tr>
<tr>
<td>CRFF [11]</td>
<td>70.3</td>
<td>10.5</td>
<td>0.74</td>
<td>201.5</td>
<td>9950</td>
<td>-20.2</td>
<td>92.9</td>
</tr>
</tbody>
</table>
and [11]. It is also observed that although the proposed topology does not exhibit the minimum values of setup-hold time, they are still reasonably low to support a reliable operation [16], [17].

### 3.3.2 Response to different clock swings

While analyzing the robustness of the reduced swing buffers for varying buffer loads in Section 2.3.4, it is observed from Fig. 13 that the voltage swing of the clock can be as low as 480mV. Therefore, the low clock swing DFF driven by a reduced swing buffer should provide reliable operation for varying clock voltage swings. Furthermore, lower swing voltages achieve higher reduction in dynamic power.

Hence, the functionality of the DFFs is evaluated by reducing the clock voltage swing to as low as 400mV. It should be noted here that the L\textsubscript{C2}MOS\_SA design is not considered since the low voltage swing of the clock in the design is obtained through a diode-connected PMOS in the clock sub-circuit and therefore, depends on the threshold voltage drop of the transistor.

From the simulations results, the proposed topology is observed to provide proper functionality even when the clock voltage swing is reduced to \(V_{DD}/2\) while the RCSFF design [3] is functionally correct for the clock voltage swing as low as 400mV. The RCSFF topology provides comparatively less (but similar) CLK-to-Q delay. However, the leakage and PDP are, respectively, 26 and 2.5 times greater than the proposed topology at a voltage swing of \(V_{DD}/2\). The CLK-to-Q delay and overall power consumption of all of the architectures are compared, respectively, in Fig. 18a and Fig. 18b for different clock voltage swings. From these figures, it is demonstrated that the proposed topology exhibits the lowest delay and power as compared to other topologies.

### 3.3.3 Robustness to PVT variations

The proposed DFF architecture is analyzed by varying the operating conditions of the circuit including process variations, supply voltage and operating temperature. SS-0.9V-165°C corner is considered for the worst-case delay analysis, the results of which are listed in Table 5. From the table, it is observed that L\textsubscript{C2}MOS\_SA [9] and RCSFF [3], fail at this PVT corner. The proposed topology exhibits the lowest CLK-to-Q delay in comparison with other DFF designs.

Traditionally, mobility increase at lower temperatures is responsible for greater dynamic power dissipation in the circuit, but with reduced feature sizes, threshold
Figure 18: Response of DFFs to different clock swings: (a) CLK-to-Q delay (ps) vs clock voltage swing (mV), (b) Total power dissipation (µW) vs clock voltage swing (mV).
Table 5: Worst-case corner analysis of the proposed DFF.

<table>
<thead>
<tr>
<th>DFF Topology</th>
<th>CLK-to-Q delay (ps) at SS-0.9V-165°C</th>
<th>Dynamic power (µW) at FF-1.1V</th>
<th>Leakage power (µW) at FF-1.1V-165°C</th>
</tr>
</thead>
<tbody>
<tr>
<td>PROPOSED</td>
<td>150.8</td>
<td>8.9 (T=-40°C)</td>
<td>1.1</td>
</tr>
<tr>
<td>PASSGATE [12]</td>
<td>160.7</td>
<td>11.0 (T=165°C)</td>
<td>1.7</td>
</tr>
<tr>
<td>L,C^2MOS-SA [9]</td>
<td>FAIL</td>
<td>10.8 (T=165°C)</td>
<td>0.6</td>
</tr>
<tr>
<td>L,C^2MOS_SA-modified</td>
<td>178.8</td>
<td>11.3 (T=165°C)</td>
<td>0.6</td>
</tr>
<tr>
<td>RCSFF [3]</td>
<td>FAIL</td>
<td>21.5 (T=-40°C)</td>
<td>1.5</td>
</tr>
<tr>
<td>NDKFF [10]</td>
<td>210.0</td>
<td>17.2 (T=165°C)</td>
<td>2.5</td>
</tr>
<tr>
<td>CRFF [11]</td>
<td>175.7</td>
<td>17.7 (T=165°C)</td>
<td>3.8</td>
</tr>
</tbody>
</table>

Voltage variations due to temperature fluctuations cannot be neglected. The DFF designs are, therefore, simulated for both low and high operating temperatures to analyze the overall dynamic power dissipation in the circuit. A 10% increase in supply voltage is also considered. The dynamic power analysis results at FF-1.1V corner are listed in Table 5. The proposed DFF provides comparatively low dynamic power dissipation at the PVT corner considered for this analysis.

From Fig. 11, it is observed that the sub-threshold current increases with an increase in temperature. Thus, the FF-1.1V-165°C corner is considered for worst-case analysis of the leakage power. The leakage power obtained is averaged over four possible logic-0 and logic-1 iterations of the input data and clock signals. From the table it is shown that the proposed topology exhibits approximately twice the leakage power as compared to L,C^2MOS_SA design [9], which is similar to the results obtained when simulated in typical PVT corner (Table 4). However, the L,C^2MOS_SA design [9] fails during the worst-case delay analysis at SS-0.9V-165°C corner.
4 Low Swing Clock Distribution Network

The reduced swing buffer and low swing D flip-flop architectures discussed in the previous chapters are designed and analyzed in a simplified low swing clock distribution network. Section 2.3.2 discusses the operation of RS-RS buffer when driven by FS-RS buffer. To complete the clock distribution network, the RS-RS buffer is used to drive the proposed DFF architecture, as shown in Fig. 19. Traditionally, each buffer in the clock tree has a large fan-out due to which the RS-RS buffer is used to drive 50 DFFs. Each DFF drives an individual load of 5fF. Since a DFF requires two low swing clock signals CLK and .CLK, the FS-RS buffer is used to drive two RS-RS buffers to generate opposite phase clock signals. Note that the circuit utilizes a single power supply voltage of 1V. The simulation results for the low swing clock distribution network are presented in Section 4.1. The robustness to environmental variations is discussed in Section 4.2.

4.1 Simulation Results

The clock distribution network is designed in Cadence using 1V 45nm NCSU technology considering 1.5GHz clock frequency and 10ps input clock slew. The data frequency considered for each DFF in the network is 150MHz. Fig. 20 shows the simulation results obtained for the low swing clock distribution network. As shown in this figure, the output of the RS-RS buffer varies with the transitions in input D due to high sensitivity to load variations (Fig. 13). The gate capacitance
Figure 20: Simulation results of a simplified low swing clock distribution network.

of the clocked NMOS transistors varies with voltage variations at the drain/source of the transistor, which is controlled by input D. This load sensitivity causes variations in the clock swing. The simulation results of the clock distribution network are listed in Table 6. According to the results, the clock swing driving the DFFs varies by >100mV due to buffer load variations. Also note that, there is a skew of 17ps between the clocks which is due to the additional inverter required in the RS-RS buffer to generate two out-of-phase clock signals. Moreover, the clock slew remains the same for the two clock signals. The buffer delay for each buffer is measured from the clock source and is averaged for both low-to-high and high-to-low transitions of the clock signals. Similarly, the leakage power is also averaged for the four possible iterations of logic-0 and logic-1 for data and clock signals.
Table 6: Simulation results for a simplified low swing clock distribution network.

<table>
<thead>
<tr>
<th>Parameter</th>
<th>FS-RS Buffer</th>
<th>RS-RS Buffer (CLK)</th>
<th>RS-RS Buffer (_CLK)</th>
<th>Proposed DFF</th>
</tr>
</thead>
<tbody>
<tr>
<td>Voltage swing (mV)</td>
<td>698.6</td>
<td>722.9-822.4</td>
<td>701-917.8</td>
<td>N/A</td>
</tr>
<tr>
<td>Clock slew (ps)</td>
<td>16.9</td>
<td>52.8</td>
<td>52.8</td>
<td>N/A</td>
</tr>
<tr>
<td>Buffer delay (ps)</td>
<td>21.7</td>
<td>83.4</td>
<td>100.1</td>
<td>N/A</td>
</tr>
<tr>
<td>CLK-to-Q delay (ps)</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>130.8</td>
</tr>
<tr>
<td>Dynamic power (µW)</td>
<td></td>
<td></td>
<td></td>
<td>336.6</td>
</tr>
<tr>
<td>Leakage power (µW)</td>
<td></td>
<td></td>
<td></td>
<td>3.64</td>
</tr>
</tbody>
</table>

4.2 Robustness to PVT Variations

To verify the reliability of the low swing clock distribution network, the design is analyzed for different environmental and process variations. Similar to the previous analyses discussed for the custom reduced swing buffers and proposed DFF, the worst-case corner for the delay and slew analysis is SS-0.9V-165°C while the corner considered for leakage and dynamic power analysis is FF-1.1V-165°C. The design is also evaluated for lower temperatures for the fast corner during power analysis since with reduced feature sizes, the threshold voltage variations should be considered. Indeed, from the simulation results, it is observed that the worst-case dynamic power is obtained at the highest temperature. The delay and power analysis results are listed in Table 7. Note that the buffer delay and CLK-to-Q delay are measured from the clock source and, averaged for low-to-high and high-to-low transitions. The results demonstrate a reliable operation of the low swing clock distribution network considering the large fan-out at the output of the RS-RS buffers.

A primary concern is the sensitivity of the reduced swing buffer to load vari-
Table 7: Worst-case analysis of a simplified low swing clock distribution network.

<table>
<thead>
<tr>
<th>PVT corner</th>
<th>Parameter</th>
<th>FS-RS Buffer</th>
<th>RS-RS Buffer (CLK)</th>
<th>RS-RS Buffer (_CLK)</th>
<th>Proposed DFF</th>
</tr>
</thead>
<tbody>
<tr>
<td>SS, 0.9V, 165°C</td>
<td>Buffer delay (ps)</td>
<td>61.8</td>
<td>436.2</td>
<td>835.7</td>
<td>N/A</td>
</tr>
<tr>
<td></td>
<td>Clock slew (ps)</td>
<td>49.1</td>
<td>161.1</td>
<td>157.4</td>
<td>N/A</td>
</tr>
<tr>
<td></td>
<td>CLK-to-Q delay (ps)</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>358.3</td>
</tr>
<tr>
<td>FF, 1.1V, 165°C</td>
<td>Dynamic power (µW)</td>
<td></td>
<td>489.1</td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>Leakage power (µW)</td>
<td></td>
<td></td>
<td>56.15</td>
<td></td>
</tr>
</tbody>
</table>

ations as it degrades the robustness of the design. This issue can be alleviated by designing a DFF with constant input capacitance such as a tristate DFF.

Another issue is the routing complexity arising from distribution of two clock signals for each DFF at the output of the RS-RS buffer. Since the two clock signals are out-of-phase, signal and noise integrity are degraded. Besides, the routing capacitance is also doubled in addition to an increase in area. This issue can be partially alleviated by distributing a single low swing clock signal and generating the opposite phase clock inside the DFF architecture. This approach provides an area overhead, but will reduce the routing complexity and hence, the capacitance at the leaf of a clock distribution network.
5 Conclusions

The primary objective of this thesis is to reduce the swing of a clock signal along the distribution network using a single $V_{DD}$ supply. To accomplish this objective, custom delay chain based reduced swing buffer architecture [8] is discussed and compared with other reduced swing buffers introduced in [6] and [7]. For the project, the buffers in [7] and [8] are topologically modified to obtain the desired reduced clock swing (0-700mV). Design solutions are also applied to reduce the leakage power. Low clock swing based D flip-flops are also analyzed while proposing a novel low swing DFF architecture. All of the simulations are performed using 1V 45nm NCSU technology considering the clock frequency and input slew of 1.5GHz and 10ps, respectively.

For a fan-out of 50fF, the delay chain based reduced swing buffer provides approximately 14% and 7% improvement in, respectively, dynamic power dissipation and clock slew, with negligible increase in the buffer delay as compared to the conventional buffer. This buffer is also analyzed for different process and environmental variations such as operating voltage and temperature. Although the dynamic power dissipation and clock slew at the output are reduced as compared to the conventional buffer, the delay chain based reduced swing buffer exhibits an increase in the buffer delay and leakage power. This buffer is also shown to be considerably reliable to input clock slew variations since the output voltage level of the buffer is maintained at the desired 700mV with negligible variations in the output slew. This buffer, however, is sensitive to output load variations which drastically affects the output clock swing. This issue can be alleviated by pre-determining the fan-out of the buffer and accordingly sizing the transistors or by designing a low clock swing based DFF which can operate reliably at the clock voltage swing as low as 500mV.

Various low clock swing DFF architectures have been previously proposed, which are discussed and analyzed. A novel low swing DFF design is also proposed. The data frequency considered for the simulations is 150MHz. The proposed DFF is an area efficient design with considerable improvement in the performance in terms of power and power-delay product, which are reduced by, respectively, 33.6% and 39.5%, as compared to other low swing DFFs. The proposed flip-flop is also analyzed for different clock swings. It is observed that the
proposed DFF operates reliably, with minimum PDP values, even if the clock voltage swing is reduced to $V_{DD}/2$. The robustness of the architectures is also compared considering PVT variations where the proposed design exhibits 19.34% and 38.91% average improvement in, respectively, CLK-to-Q delay and dynamic power dissipation.

A simplified low swing clock distribution network is also analyzed by driving 50 proposed DFFs with the reduced swing buffers. The simulation results demonstrate a robust design which is reliable at the worst-case corners. The limitation is area overhead due to routing complexity at the leaf of the clock distribution network. This issue can be alleviated by modifying the DFF topology, which remains as future work.
References


