# Open Source Cell Library *Mono3D* to Develop Large-Scale Monolithic 3D Integrated Circuits

Chen Yan, Scott Kontak, Hailang Wang, and Emre Salman Department of Electrical and Computer Engineering Stony Brook University (SUNY), Stony Brook, New York, USA 11794 {chen.yan, emre.salman}@stonybrook.edu

*Abstract*—Monolithic three-dimensional (3D) integrated circuits (ICs) achieve ultra-high density device integration through fine-grained connectivity enabled by monolithic inter-tier vias (MIVs). In this paper, an open source standard cell library for design automation of large-scale transistor-level monolithic 3D ICs is proposed. A 128-point, highly parallelized FFT core with 330K cells is implemented with the proposed library. Power and timing characteristics of monolithic 3D ICs are quantified. The effect of signal integrity and routing congestion on timing characteristics is investigated. The primary clock tree characteristics of monolithic 3D ICs are also discussed. The proposed open source cell library facilitates future research on multiple aspects of monolithic 3D technology.

### I. INTRODUCTION

During the past decade, through silicon via (TSV) based three-dimensional (3D) integrated circuits (ICs) attracted significant attention due to promising characteristics in reducing global interconnect delay, increasing transistor density, and enabling heterogeneous integration [1], [2]. TSVs, however, are long vertical vias etched within the silicon substrate with typical diameters in the range of several micrometers. Thus, TSVs are several orders of magnitude larger than nanoscale devices, thereby limiting the integration density and the power and performance advantages of vertical integration due to significant TSV capacitance [3]–[5].

Monolithic inter-tier via (MIV) based 3D integration enables significantly higher interconnect density since MIVs have comparable size to conventional on-chip metal vias, provided by the high alignment precision and thin top layer [6]. Thus, circuit-level research on monolithic 3D technology has significantly grown, particularly after the highly encouraging developments on sequentially fabricating multiple transistor layers with a controlled temperature [7].

Three design methods have been proposed for monolithic 3D integration: transistor-, gate-, and block-level [8]. In transistor-level monolithic 3D integration, as focused in this paper, nMOS and pMOS transistors within a circuit are separated into two different tiers, as depicted in Fig. 1. This approach not only achieves fine-grained 3D integration with intra-cell MIVs, but also enables the individual optimization of the bottom and top tier devices. In gate-level monolithic 3D integration, multiple cells within a functional block are partitioned into multiple tiers. MIVs are utilized for inter-cell

This research is supported by the National Science Foundation CAREER grant under Grant CCF-1253715.



2D Design

TL-Mono3D Design

Fig. 1. Cross-sections of the conventional 2D and transistor-level monolithic (TL-Mono) 3D technology with two tiers. The top tier hosts the nMOS transistors whereas the pMOS transistors are placed within the bottom tier.

communication. Finally, block-level monolithic 3D integration represents a more coarse-grain integration where the partitioning of the IC is achieved based on individual functional blocks.

In this work, an open source cell library [9] based on fullcustom design of each cell is developed and fully integrated into design flow for transistor-level monolithic 3D integration. A 128-point FFT core [10] is implemented, providing useful insight on power, timing, clock tree, and routing/congestion characteristics of large-scale monolithic 3D ICs.

The rest of the paper is organized as follows. Related previous work and contributions of this paper are summarized in Section II. The details of the proposed open source cell library, characterization, and comparison with 2D cells are provided in Section III. Power/timing and several important physical design characteristics of a FFT core with monolithic 3D implementation are investigated in Section IV. Finally, the paper is concluded in Section V.

# II. SUMMARY OF PREVIOUS WORKS AND CONTRIBUTIONS OF THIS PAPER

Liu and Lim have investigated the design tradeoffs in monolithic 3D ICs, providing physical design guidelines and insight into the routability issue [11]. Authors, however, have assumed that the monolithic 3D gates and traditional 2D gates have the same power and timing characteristic. Lee *et al.* have fixed this limitation by individually characterizing transistor-level monolithic 3D cells [12]. The power characteristics of several 3D monolithic benchmark circuits have been investigated and compared with 2D versions at similar timing performances. The authors, however, have adopted the cell-folding method and used the same pull-up and pull-down networks as in 2D cells. As a result, the proposed 3D cells are not optimized for footprint. In addition, the timing constraints are relatively relaxed, which may prevent to investigate the behavior of the monolithic 3D technology under tighter clock frequency constraints. Power benefits of transistor-level monolithic 3D ICs through custom design of a cell library in 14 nm technology have also been demonstrated [13]. A cell-level RCextraction methodology is described. The authors, however, did not investigate the timing characteristics and the effect of routing congestion on timing.

The primary contributions of this paper are as follows: (1) monolithic 3D cells are developed in full-custom methodology with cell-stacking technique, while optimizing the footprint. Higher reduction in footprint is achieved as compared to [12]. The automated cell characterization results are verified with SPICE-level simulations, (2) detailed data such as the effect of coupling capacitance on power/timing is provided to investigate the important issue of routing congestion in monolithic 3D ICs, (3) both the performance and power characteristics of a large-scale 3D monolithic IC are investigated at tight timing constraints, (4) detailed data on clock tree characteristics are provided, (5) finally, the proposed cell library and all of the related automation files are made publicly available [9] to facilitate future research on important aspects of 3D monolithic integration such as thermal integrity, designfor-test, and interaction between the manufacturing/device development and the design process. An example of such effort for 3D hardware security is provided in [14]. To the best of the authors' knowledge, this study is the first open source library with full integration into design flow for monolithic 3D ICs.

# III. OPEN SOURCE CELL LIBRARY FOR MONOLITHIC 3D ICS

The characteristics of the proposed cell library are described in Section III-A. The design flow to integrate the proposed library into the design process is discussed in Section III-B. Cell-level simulation results are provided in Section III-C.

# A. Library Development

In this work, the *Mono3D*, an open source standard cell library for transistor-level monolithic 3D technology is developed in 45 nm technology [9]. *Mono3D* consists of two tiers where each tier is based on the 2D 45 nm process design kit *FreePDK45* from North Carolina State University (NCSU) [15]. Thus, the process and physical characteristics (transistor models and characteristics of the on-chip metal layers) are obtained from the *FreePDK45*. Similar to [12], [13], the pull-down network of a CMOS gate (nMOS transistors) is built within the top tier whereas the pull-up network (pMOS transistors) is fabricated within the bottom tier. The nMOS and pMOS device characteristics are the same as in 2D *FreePDK45*. However, the impact of novel devices and manufacturing steps for 3D monolithic integration can be captured by replacing/modifying the device models within the



Fig. 2. Comparison of the layout views of a D-flip-flop in traditional 2D and transistor-level monolithic 3D technology. The top and bottom tiers are separately depicted for the 3D technology.



Fig. 3. Integration of the proposed open source cell library into design flow, illustrating the required modifications.

provided design kit. System-level effects of varying device characteristics (due to, for example, the processing of the top tiers and impact of high temperature) can therefore be investigated.

In the proposed *Mono3D*, five metal layers are allocated to the bottom tier (metal1\_btm to metal5\_btm), as illustrated in Fig. 1. These metal layers are primarily for power delivery, but can also be used for signal routing when the metals on the top tier are not sufficient due to routing congestion caused by smaller footprint. The top tier is separated from the bottom tier with an inter-layer dielectric (ILD) with a thickness of 100 nm. Inter-tier coupling is minimized at this thickness [6]. The 10 metal layers that exist in 2D *FreePDK45* are maintained the same for the top tier in *Mono3D*. The intra-cell connections that span the two tiers are achieved by MIVs. Each MIV has a width of 100 nm and height of 270 nm [8].

Currently, 16 standard cells exist in *Mono3D*, each cell is developed with a full-custom design methodology using a cell stacking technique. The cell height in *Mono3D* is 1.135  $\mu$ m, which is 54% smaller than the standard cell height (2.47  $\mu$ m) in *Nangate* 45 nm cell library. The layout of a 2D and *Mono3D* D-flip-flop cell are compared in Fig. 2, illustrating the MIVs. Note that the width of the 3D flip-flop cell increases by approximately 7% due to MIVs and intra-cell routing.

#### B. Design Flow

The design flow and the modifications required for 3D monolithic technology are depicted in Fig. 3. A new technology file (*.tf*) is generated for *Mono3D* to include all of the new layers (interconnects, via, ILD, and MIV). Based on these modifications, a new display resource file (*.drf*) is generated to

TABLE I AVERAGE DELAY AND POWER CHARACTERISTICS OF 2D AND MONOLITHIC 3D CELLS.

| Celle    |       | Delay (ps)      | Power ( $\mu$ W) |                |  |
|----------|-------|-----------------|------------------|----------------|--|
| Cells    | 2D    | 3D              | 2D               | 3D             |  |
| AND2X1   | 17.60 | 18.09 (-2.77%)  | 2.82             | 2.61 (7.24%)   |  |
| AOI21X1  | 13.68 | 14.29 (-4.45%)  | 3.32             | 3.49 (-4.84%)  |  |
| BUFX2    | 17.89 | 17.01 (4.88%)   | 14.04            | 13.10 (6.65%)  |  |
| CLKBUF1  | 27.01 | 30.58 (-13.22%) | 64.07            | 67.91 (-5.99%) |  |
| DFFPOSX1 | 41.62 | 34.06 (18.17%)  | 26.75            | 24.99 (6.58%)  |  |
| INVX1    | 6.73  | 7.30 (-8.44%)   | 4.69             | 4.97 (-5.98%)  |  |
| INVX2    | 6.54  | 6.53 (0.09%)    | 9.31             | 9.33 (-0.23%)  |  |
| INVX4    | 6.44  | 6.99 (-8.69%)   | 18.29            | 18.72 (-2.36%) |  |
| MUX2X1   | 16.25 | 16.37 (-0.77%)  | 5.81             | 5.91 (-1.75%)  |  |
| NAND2X1  | 10.06 | 9.75 (3.09%)    | 1.63             | 1.60 (2.03%)   |  |
| NOR2X1   | 11.33 | 11.81 (-4.26%)  | 1.61             | 1.68 (-4.35%)  |  |
| OAI21X1  | 12.89 | 12.69 (1.50%)   | 3.27             | 3.23 (1.20%)   |  |
| OR2X1    | 18.33 | 19.67 (-7.30%)  | 2.54             | 2.71 (-6.79%)  |  |
| XNOR2X1  | 36.05 | 39.50 (-9.58%)  | 12.66            | 13.65 (-7.80%) |  |
| XOR2X1   | 35.49 | 39.34 (-10.86%) | 12.53            | 13.42 (-7.18%) |  |
| Average  | 18.53 | 18.93 (-2.20%)  | 12.22            | 12.49 (-2.18%) |  |

develop full-custom layouts of the 3D cells. The design rule check (DRC), layout versus schematic (LVS) and parasitics extraction (PEX) are performed using *Calibre*. The DRC rule file is modified to include new features for the additional metal layers, vias, transistors, ILD and MIV.

The LVS rule file is also modified for the tool to be able to independently identify transistors located in separate tiers. The extracted netlist with MIVs is analyzed to accurately extract the interconnections between nMOS and pMOS transistors. The RC extraction rule file is modified to be able to extract the impedances of the additional metal layers and MIVs. A single MIV is characterized with a resistance of 2  $\Omega$ s and a capacitance of 0.1 fF [8].

After *RC* extraction, 3D cells are characterized with *Encounter Library Characterizer (ELC)* to obtain the timing and power characteristics of each cell. The extracted 3D cell netlists are also simulated with *HSPICE* to ensure the accuracy of the characterization process. More details on the footprint, timing, and power characteristics of the 3D cells and comparison with 2D cells are provided in Section III-C.

The *.lib* file for the *Mono3D* generated by *ELC* is converted into the *.db* format, which is used for circuit synthesis, placement, clock tree synthesis, and routing. Since all of the I/O pins of the 3D cells are located within the top tier, existing physical design tools can be used for these steps.

# C. Cell-Level Evaluation

1) Footprint: Cell-level footprint reduction varies from 27% to 68%, depending upon the specific cell. An average improvement of 46% is achieved. Note that despite more than 50% reduction in cell height, the average area reduction is less than 50% since, on average, the cell width slightly increases due to MIVs and intra-cell routing.

2) Delay and Power Consumption: HSPICE simulations are performed on the extracted 3D netlists to compare monolithic 3D technology with the conventional 2D technology at the cell level. At 1.1 V power supply, 50 ps transition time, and 27°C temperature, average delay and power consumption are analyzed, as listed in Table I. According to this table, *Mono3D* 



Fig. 4. The layout views of a highly parallelized 128-point FFT core in (a) conventional 2D technology, (b) transistor-level monolithic 3D technology with two tiers.

cells have, on average, 2.2% higher propagation delay and 2.18% higher power consumption as compared to the 2D standard cells. This slight increase in delay and power is due to denser cell layout, producing additional coupling capacitances and MIV impedances. Note that in a DFF cell, both delay (clock-to-Q delay) and power are improved as compared to 2D cells since the DFF cell has relatively longer average interconnect length where the monolithic 3D technology is helpful. Also note that the standard cells can be further optimized to reduce delay and power at the expense of reduced improvement in footprint.

## IV. EXPERIMENTAL RESULTS

A parallel 128-point FFT core operating at 1.5 GHz is analyzed in this section to quantify the benefits of transistorlevel 3D technology. Note that in 3D FFT core, 10 metal layers are not sufficient to route the placed design due to significant reduction in footprint. Thus, 15 metal layers (both bottom and top tiers) are used to provide sufficient metal resources for signal routing. Both the 2D and 3D versions of the FFT core are depicted in Fig. 4. In the 3D version, the footprint and overall wirelength are reduced by, respectively, 51% and 20%, as listed in Table II. No DRC violations are reported for 2D (with 10 metal layers) and 3D (with 15 metal layers) designs.

The 20% reduction in wirelength enables approximately 22% reduction in net power. The internal power is also reduced by approximately 10%, partly due to the type of cells used in the design and partly due to reduction in short-circuit power (since the interconnect lengths are shorter and signal transitions are faster). For example, there are approximately 97K flip-flops in the design and at the cell-level, a 3D flip-flop consumes 6.58% less power than a 2D flip-flop (see Table I). Overall, the monolithic 3D technology achieves approximately 13% reduction in power, as listed in Table III.

The timing characteristics of the 2D and monolithic 3D circuits are compared in Table IV where the worst negative slack (WNS), total negative slack (TNS), and number of timing violations are listed. There is timing degradation in the 3D FFT design due to both routing congestion and

TABLE II Comparison of footprint and wirelength in 2D FFT and monolithic 3D FFT with 15 (3D\_15) metal layers. *IMP* refers to improvement over 2D technology.

| Circuit | Design      | Area               | Imp  | Wirelength   | Imp  |
|---------|-------------|--------------------|------|--------------|------|
|         | style       | (mm <sup>2</sup> ) | (%)  | (m)          | (%)  |
| FFT128  | 2D<br>3D_15 | 2.55<br>1.25       | - 51 | 13.3<br>10.6 | - 20 |

TABLE III Comparison of power consumption in 2D FFT and monolithic 3D FFT with 15 (3D\_15) metal layers. *INT, SWI*, and *LK* refer, respectively, to internal, switching (net), and leakage power. *IMP* refers to improvement over 2D technology.

| Circuit | Design | Power component (mW) |                        |     |              |  |
|---------|--------|----------------------|------------------------|-----|--------------|--|
|         | style  | INT                  | SWI (Imp)              | LK  | Total (Imp)  |  |
| FFT128  | 2D     | 8,830                | 2,781 (-)              | 149 | 11,760 (-)   |  |
|         | 3D_15  | 7,907                | 2,181 ( <b>21.6</b> %) | 145 | 10,233 (13%) |  |

#### TABLE IV Comparison of timing characteristics in 2D FFT and monolithic 3D FFT with 15 metal layers with (3D\_15\_CC) and without (3D\_15\_NO\_CC) coupling capacitance. WNS and TNS refer, respectively, to worst negative slack and total

| NEGATIVE | SLACK |
|----------|-------|

| Circuit | Design<br>style | WNS<br>(ns) | TNS<br>(ns) | Number of violations |
|---------|-----------------|-------------|-------------|----------------------|
|         | 2D              | -0.102      | -462.905    | 13365                |
| FFT128  | 3D_15_CC        | -0.144      | -714.179    | 14624                |
|         | 3D_15_NO_CC     | -0.082      | -180.563    | 8152                 |

average gate-level delay increase in the proposed 3D library. Specifically, the WNS increases by 42 ps and there are 1,259 more timing violations, which increase the TNS by 252 ns. If the coupling capacitances are ignored in the 3D design, the WNS is reduced from 144 ps to 82 ps, which is 20 ps less than the 2D design, producing 6,472 less number of timing violations. Thus, ignoring the coupling capacitance causes the 3D designs outperform 2D designs, demonstrating the importance of interconnects and routing congestion in large circuits. The effect of coupling capacitance on timing is stronger in 3D technology, where the WNS changes by 62 ps due to coupling capacitance (as opposed to 52 ps in 2D technology). Thus, routing congestion and signal integrity induced timing degradation should be carefully considered in large-scale monolithic 3D ICs. For example, for relatively low performance applications with relaxed timing constraints, monolithic 3D technology can be leveraged to achieve the highest reduction in footprint (therefore cost) by developing highly dense 3D cell layouts. For high performance applications with tighter timing constraints, however, interconnects and the routing process play a significant role in system timing and power consumption. In this case, 3D cells should be optimized to provide additional routing space to alleviate routing congestion and signal integrity induced timing degradation at the expense of reduced savings in footprint.

Since clock networks play a significant role in both performance and power in large circuits, the clock tree synthesis (CTS) results of the FFT core are also reported to quantify the benefits of monolithic 3D technology in clocking. The number of sinks for both designs is 96,755. Both the skew and slew constraints are set to 100 ps. Due to reduced footprint, the number of clock buffers is reduced from 6,836 to 5,744, which reduces the clock internal power by approximately 30%. The clock wirelength is also reduced by 22% and the clock net power is reduced by approximately 28%. The overall clock power is reduced by 29%.

Both the 2D and 3D designs exhibit slew violations, but the slew is significantly enhanced in the 3D clock network (from 149 ps to 111 ps) due to shorter and therefore less resistive clock nets. The global skew slightly increases, from 85.9 ps in 2D FFT to 113.1 ps in 3D FFT. Despite this slight increase in global skew, the 3D design exhibits lower clock insertion delays. Lower insertion delays are helpful in reducing the variation-induced skew or corner-to-corner skew variation.

#### V. CONCLUSION

An open source transistor-level monolithic 3D cell library is developed and integrated into an existing design flow. Important characteristics of monolithic 3D ICs (such as footprint, timing and power consumption, routing congestion, signal integrity induced timing degradation, and clocking) have been investigated. The entire library and related files are publicly available to facilitate future research in monolithic 3D integration technology [9].

### REFERENCES

- [1] E. Salman and E. G. Friedman, *High Performance Integrated Circuit Design*. McGraw-Hill Professional, Aug. 2012.
- [2] V. F. Pavlidis and E. G. Friedman, *Three-Dimensional Integrated Circuit Design*. Morgan Kaufmann, July 2010.
- [3] D. H. Kim, K. Athikulwongse, and S. K. Lim, "A Study of Through-Silicon-Via Impact on the 3D Stacked IC Layout," in *Proc. of the ACM Int. Conf. Computer-Aided Design*, Nov. 2009, pp. 674–680.
- [4] I. Savidis and E. G. Friedman, "Closed-Form Expressions of 3D Via Resistance, Inductance, and Capacitance," *IEEE Transactions on Electron Devices*, vol. 56, no. 9, pp. 1873–1881, Sep. 2009.
- [5] H. Wang and E. Salman, "Decoupling Capacitor Topologies for TSV-Based 3-D ICs With Power Gating," *IEEE Trans. on Very Large Scale Integration (VLSI) Systems*, vol. 23, no. 12, pp. 2983–2991, Dec. 2015.
- [6] P. Batude et al., "GeOI and SOI 3D Monolithic Cell Integrations for High Density Applications," in Proceedings of the IEEE International Symposium on VLSI Technology, June 2009, pp. 166–167.
- [7] O. Billoint *et al.*, "From 2D to Monolithic 3D: Design Possibilities, Expectations and Challenges," in *Proceedings of the ACM International Symposium on Physical Design*, Mar. 2015, pp. 127–127.
- [8] S. A. Panth, K. Samadi, Y. Du, and S. K. Lim, "Design and CAD Methodologies for Low Power Gate-Level Monolithic 3D ICs," in *Proceedings of the ACM International Symposium on Low Power Electronics and Design*, Aug. 2014, pp. 171–176.
- [9] "Mono3D, Open Source Cell Library for Monolithic 3D Integration." [Online]. Available: http://nanocas.ece.stonybrook.edu/mono3d/
- [10] P. Milder, F. Franchetti, J. C. Hoe, and M. Püschel, "Computer Generation of Hardware for Linear Digital Signal Processing Transforms," *ACM Transactions on Design Automation of Electronic Systems*, vol. 17, no. 2, pp. 15:1–15:33, Apr. 2012.
- [11] C. Liu and S. K. Lim, "A Design Tradeoff Study with Monolithic 3D Integration," in *Proceedings of the IEEE International Symposium on Quality Electronic Design*, Mar. 2012, pp. 529–536.
- [12] Y.-J. Lee, D. Limbrick, and S. K. Lim, "Power Benefit Study for Ultra-High Density Transistor-Level Monolithic 3D ICs," in *Proceedings of the ACM Design Automation Conference*, May 2013, pp. 104:1–104:10.
- [13] J. Shi et al., "On the Design of Ultra-High Density 14nm Finfet Based Transistor-Level Monolithic 3D ICs," in Proceedings of the IEEE Computer Society Annual Symposium on VLSI, July 2016, pp. 449–454.
- [14] J. Dofe, C. Yan, S. Kontak, E. Salman, and Q. Yu, "Transistor-Level Camouflaged Logic Locking Method for Monolithic 3D IC Security," in *Proc. of the IEEE Asian Hardware-Oriented Security and Trust*, December 2016.
- [15] "FreePDK45." [Online]. Available: http://www.eda.ncsu.edu/wiki/ FreePDK45:Contents