### AC Computing Methodology for Wirelessly Powered Devices

A Dissertation Presented

by

### Tutu Wan

to

The Graduate School

in Partial Fulfillment of the

Requirements

for the Degree of

### **Doctor of Philosophy**

in

### **Electrical Engineering**

Stony Brook University

December 2019

#### **Stony Brook University**

The Graduate School

#### Tutu Wan

We the dissertation committee for the above candidate for the Doctor of Philosophy degree, hereby recommend acceptance of this dissertation.

Dr. Emre Salman - Advisor of Dissertation Associate Professor, Department of Electrical and Computer Engineering

Dr. Milutin Stanaćević - Chairperson of Defense Associate Professor, Department of Electrical and Computer Engineering

Dr. Fan Ye - Defense Committee Member Associate Professor, Department of Electrical and Computer Engineering

#### Dr. Samir R. Das - Defense Committee Member Professor and Chair, Department of Computer Science

This dissertation is accepted by the Graduate School

Eric Wertheimer Dean of the Graduate School

#### Abstract of the Dissertation

#### AC Computing Methodology for Wirelessly Powered Devices

#### by

#### Tutu Wan

#### **Doctor of Philosophy**

in

#### **Electrical Engineering**

Stony Brook University

#### 2019

Limited energy is a significant challenge for Internet-of-things (IoT) based devices since frequent battery replacement is not a practical approach. Various energy harvesting techniques have been previously proposed to alleviate this challenge such as photovoltaic, piezoelectric, thermoelectric, and radio frequency (RF) power harvesting.

In this thesis, an alternating current (AC) computing methodology is proposed to significantly enhance the power efficiency of wirelessly powered devices such as computational RF tags and sensor nodes. Contrary to traditional platforms that integrate direct current (DC)-powered computational logic along with the rectification and regulation stages, in the proposed approach, the harvested RF signal is directly used to power the data processing circuitry by leveraging charge-recycling and adiabatic circuit theory. A near-field based wireless power harvesting system with an 8-bit arithmetic logic unit (ALU) is developed to evaluate the proposed framework. Simulation results in 45 nm technology demonstrate that the overall power consumption can be reduced by up to 16 times as compared to the conventional approach that relies on AC-to-DC conversion and static CMOS logic. This reduction in power enables significant computation capability for RF-powered devices. Furthermore, to address some of the critical security requirements in the resource-constrained IoT devices, a lightweight encryption algorithm has been implemented in AC computing-based hardware (a bit-serialized SIMON core with 32-bit plain text and 64-bit key), exhibiting significantly higher energy efficiency as compared to conventional approach. This circuit was fabricated in a commercial 65 nm CMOS technology to experimentrally evaluate the proposed methodology.

Some of the possible future directions of this research include establishing an ultra-low power communication unit and its system-level integration, increasing the frequency of RF power signals to the ultra high frequency (UHF) band, exploring possible solutions to AC energy storage, and evaluating the security characteristics of AC computing methodology.

This work is dedicated to my parents,

Xujiang Wan and Rong Chen.

# **Table of Contents**

| Li | st of I | ligures                                                 | xii     |
|----|---------|---------------------------------------------------------|---------|
| Li | st of ] | fables                                                  | xiii    |
| Ac | know    | vledgements                                             | XV      |
| 1  | Intr    | oduction                                                | 1       |
| 2  | Bac     | kground<br>Historical Damagative to Adiabatic Computing | 5       |
|    | 2.1     | Wireless Dewer Transfer                                 | 3       |
|    | 2.2     | Fraiting Works on AC Converting                         | 9<br>11 |
|    | 2.3     | Existing works on AC Computing                          | 11      |
|    | 2.4     |                                                         | 13      |
| 3  | WP      | •ECRL based AC Computing Framework                      | 15      |
|    | 3.1     | ECRL Operation Principles                               | 16      |
|    | 3.2     | Charge-Recycling 4-Bit Carry Ripply Adder               | 18      |
|    | 3.3     | Wireless Link                                           | 18      |
|    | 3.4     | Phase Shift Circuitry                                   | 20      |
|    | 3.5     | Peak Detector                                           | 21      |
|    | 3.6     | Simulation Results                                      | 23      |
|    | 3.7     | Summary                                                 | 26      |
| 4  | WP      | PAL and WP-CEPAL based AC Computing Framework           | 28      |
|    | 4.1     | WP-PAL based Computing Framework                        | 29      |
|    |         | 4.1.1 PAL Operation Principles                          | 29      |
|    |         | 4.1.2 Signal Shaper                                     | 30      |
|    | 4.2     | WP-CEPAL based Computing Framework                      | 33      |
|    |         | 4.2.1 CEPAL Operation Principles                        | 33      |
|    |         |                                                         |         |

|   | 4.3  | Simula | ation Results                                     | 35 |
|---|------|--------|---------------------------------------------------|----|
|   | 4.4  | Summ   | ary                                               | 37 |
| 5 | AC ] | Powere | d ALU for Deep Brain Implantable Devices          | 38 |
|   | 5.1  | Overv  | iew of the Application                            | 39 |
|   | 5.2  | Induct | ively Coupled Wireless Link                       | 39 |
|   | 5.3  | AC Pc  | wered 8-bit ALU                                   | 41 |
|   | 5.4  | Auxili | ary Circuitry                                     | 46 |
|   |      | 5.4.1  | RF-DC Converter                                   | 46 |
|   |      | 5.4.2  | Phase Shifter Optimization                        | 48 |
|   |      | 5.4.3  | More about Signal Shaper                          | 49 |
|   | 5.5  | Simula | ation Results                                     | 52 |
|   |      | 5.5.1  | Adiabatic Logic <i>RC</i> Model                   | 54 |
|   |      | 5.5.2  | Phase Tolerance                                   | 56 |
|   |      | 5.5.3  | Power Evaluation                                  | 57 |
|   |      | 5.5.4  | Effect of Circuit Size                            | 58 |
|   |      | 5.5.5  | Effect of Lower Operating Voltages                | 58 |
|   | 5.6  | Desigi | n Tradeoffs                                       | 60 |
|   | 5.7  | Summ   | ary                                               | 64 |
| 6 | AC   | Powere | d Digital Core for Lightweight Encryption         | 65 |
| Ū | 6.1  | SIMO   | N Block Cipher                                    | 66 |
|   |      | 6.1.1  | Round Function                                    | 66 |
|   |      | 6.1.2  | Key Expansion                                     | 68 |
|   |      | 6.1.3  | Bit-Serial Architecture                           | 68 |
|   | 6.2  | Propos | sed SIMON Hardware Architecture for AC Computing  | 69 |
|   |      | 6.2.1  | Adiabatic Registers                               | 69 |
|   |      | 6.2.2  | Merged Blocks                                     | 69 |
|   |      | 6.2.3  | Compute and Transfer Paths                        | 72 |
|   | 6.3  | Schem  | natic-level Simulation Results                    | 73 |
|   | 6.4  | Post-L | ayout Simulation Results                          | 74 |
|   |      | 6.4.1  | Physical Implementations                          | 78 |
|   |      | 6.4.2  | Impact of Physical Layout on Power Consumption    | 78 |
|   |      | 6.4.3  | Block-Level Post-Layout Power Consumption Results | 80 |
|   | 6.5  | Test C | hip Overview                                      | 84 |
|   |      | 6.5.1  | Core Circuit Design                               | 87 |
|   |      | 6.5.2  | I/O Circuit Design                                | 87 |

| 7  | Con    | clusion | and Future Directions                                | 91 |
|----|--------|---------|------------------------------------------------------|----|
|    | 7.1    | Thesis  | Summary                                              | 91 |
|    | 7.2    | Future  | Work and Directions                                  | 92 |
|    |        | 7.2.1   | Integration of Sensing, Communication and Power Man- |    |
|    |        |         | agement                                              | 92 |
|    |        | 7.2.2   | AC Computing at Higher Frequencies                   | 93 |
|    |        | 7.2.3   | Monolithic 3D Technology for AC Computing            | 93 |
|    |        | 7.2.4   | Energy Storage for AC Computing                      | 94 |
|    |        | 7.2.5   | Side-Channel Resistance of AC Computing              | 94 |
| Bi | bliogi | aphy    |                                                      | 96 |

# Bibliography

# **List of Figures**

| 1.1        | AC computing methodology vs. conventional approach for RF-<br>powered devices.                                                                                                              | 3  |
|------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 2.1<br>2.2 | Equivalent <i>RC</i> circuit to determine the energy loss in adiabatic logic.<br>Equivalent model of a weakly coupled inductive energy harvesting                                           | 6  |
| 2.3        | system                                                                                                                                                                                      | 11 |
|            | approach.                                                                                                                                                                                   | 13 |
| 3.1        | WP-ECRL: (a) Requirement for two phase shifters and a peak de-<br>tector. (b) AC power-clock signal. (c) Inverter gate. (d) Cascaded                                                        |    |
|            | gates.                                                                                                                                                                                      | 17 |
| 3.2        | Block diagram of a 4-bit ECRL carry ripple adder utilizing the <i>gen</i> -                                                                                                                 | 10 |
|            | erate and propagate signals.                                                                                                                                                                | 19 |
| 3.3        | Phase shift circuitry modeled as a $\pi$ - <i>LC</i> low pass network                                                                                                                       | 20 |
| 3.4        | Simulated power-clock signals with $90^{\circ}$ phase difference                                                                                                                            | 21 |
| 3.5        | Diode-connected MOS transistor used as peak detector to properly                                                                                                                            |    |
|            | bias the bulk terminals of the pMOS transistors                                                                                                                                             | 22 |
| 3.6        | Simulated input and output waveforms of the peak detector                                                                                                                                   | 23 |
| 3.7        | Four power-clock signals at 13.56 MHz frequency and with 90° phase difference, as generated by the phase shifter. These sinusoidal power-clock signals drive the ECRL adder in the proposed |    |
|            | annroach                                                                                                                                                                                    | 24 |
| 3.8        | Output signals of the 4-bit charge-recycling adder driven by four<br>AC signals. The AC signals are obtained from the phase shifter,                                                        | 24 |
|            | which takes the wirelessly harvested AC signal as the input                                                                                                                                 | 25 |
|            |                                                                                                                                                                                             |    |

| 4.1  | WP-PAL: (a) Requirement for two signal shapers and AC power-                  |    |
|------|-------------------------------------------------------------------------------|----|
|      | clock signals. (b) Inverter gate. (c) Cascaded gates                          | 29 |
| 4.2  | Signal shaper for WP-PAL based AC computing.                                  | 31 |
| 4.3  | Simulated output voltage and current waveforms of a signal shaper.            | 32 |
| 4.4  | Simulated output waveform of a wirelessly powered inverter with               |    |
|      | and without signal shapers.                                                   | 32 |
| 4.5  | WP-CEPAL: (a) Requirement for two signal shapers and a peak                   |    |
|      | detector. (b) Two out-of-phase AC power-clock signals. (c) Inverter           |    |
|      | gate. (d) Cascaded gates.                                                     | 34 |
| 4.6  | Comparison of the average power consumed by the 16-bit carry                  |    |
|      | select adder operating at 13.56 MHz and designed in both existing             |    |
|      | and the proposed methods.                                                     | 36 |
|      |                                                                               |    |
| 5.1  | Lumped model of an inductively coupled wireless power harvesting              |    |
|      | system.                                                                       | 40 |
| 5.2  | Wireless link simulation setup: transmitting and receiving coils at           |    |
|      | $D_{imp}$ distance.                                                           | 42 |
| 5.3  | Model of the human head for deep brain implantable devices. GM                |    |
|      | and WM refer, respectively, to gray matter and white matter                   | 42 |
| 5.4  | Available power for the operation of the implant with two different           |    |
|      | sizes of the receiving coil as a function of distance between trans-          |    |
|      | mitting and receiving coils.                                                  | 43 |
| 5.5  | Block-level diagram of the 8-bit arithmetic logic unit                        | 44 |
| 5.6  | Merging multiple gates into a single complex adiabatic gate to mit-           |    |
|      | igate the overhead of additional buffers required for synchronization.        | 46 |
| 5.7  | Circuit diagram of a low complexity RF-DC converter and regulator             |    |
|      | required for traditional approaches.                                          | 47 |
| 5.8  | Simulated output waveforms of the wireless link, rectifier, and reg-          |    |
|      | ulator. The final regulated output voltage is approximately 1 V,              |    |
|      | which is used to drive the conventional 8-bit ALU running at 13.56            |    |
|      | MHz clock frequency.                                                          | 47 |
| 5.9  | An <i>RLC</i> model of <i>LC</i> phase shifter and <i>RC</i> load             | 48 |
| 5.10 | Analysis of <i>LC</i> phase shifting network. (a) Effect of load resistance   |    |
|      | on <i>LC</i> phase shifter, (b) Effect of load capacitance on <i>LC</i> phase |    |
|      | shifter, (c) Effect of parallel resistance on <i>LC</i> phase shifter         | 50 |
|      |                                                                               |    |

| 5.11  | Waveforms of the signal shaper to better understand the high effi-<br>ciency operation. From top to bottom: input and output voltages,<br>overall current (at source node), drain current, bulk current, and |     |
|-------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
|       | gate current.                                                                                                                                                                                                | 51  |
| 5.12  | Example output waveforms of the ALU for each of the proposed methods as well as the conventional approach.                                                                                                   | 53  |
| 5.13  | The effect of phase difference deviation on the power consumption<br>of WP-ECRL                                                                                                                              | 56  |
| 5.14  | Power scaling with transistor numbers for each of the methods, (a)<br>dependence of overhead power on circuit size. (b) dependence of                                                                        | 20  |
|       | processing power on circuit size                                                                                                                                                                             | 59  |
| 6.1   | Structure of a SIMON round function.                                                                                                                                                                         | 67  |
| 6.2   | Structure of a SIMON key expansion function for m=4                                                                                                                                                          | 67  |
| 6.3   | Proposed adiabatic architecture for round function of the bit-serialized                                                                                                                                     |     |
|       | SIMON32/64 cipher                                                                                                                                                                                            | 70  |
| 6.4   | Proposed adiabatic architecture for key expansion of the bit-serialized SIMON32/64 cipher.                                                                                                                   | 71  |
| 6.5   | Simulated output waveform of the SIMON32/64 cipher blocks in                                                                                                                                                 |     |
|       | each approach, demonstrating functional verification.                                                                                                                                                        | 74  |
| 6.6   | Layout view of ECRL-based SIMON32/64.                                                                                                                                                                        | 75  |
| 6.7   | Layout view of static CMOS-based SIMON32/64                                                                                                                                                                  | 76  |
| 6.8   | Layout views of multipliers implemented in different approaches:                                                                                                                                             |     |
|       | (a) static CMOS , (b) ECRL, and (c) PAL                                                                                                                                                                      | 77  |
| 6.9   | ECRL-based inverter: (a) physical layout, (b) extracted <i>RC</i> network.                                                                                                                                   | 81  |
| 6.10  | PAL-based inverter: (a) physical layout, (b) extracted <i>RC</i> network.                                                                                                                                    | 82  |
| 6.11  | Static CMOS-based inverter: (a) physical layout, (b) extracted RC                                                                                                                                            | •   |
| ( 1 9 | network.                                                                                                                                                                                                     | 83  |
| 6.12  | Current profile of static CMOS-based SIMON32/64 cipher                                                                                                                                                       | 84  |
| 6.13  | Current profile of ECRL-based SIMON32/64 cipher. Current pro-                                                                                                                                                | ~ ~ |
| < 1 A | file for each power-clock waveform (4-phase) is shown                                                                                                                                                        | 85  |
| 6.14  | Current profile of static CMOS-based 4-bit multiplier                                                                                                                                                        | 85  |
| 6.15  | Current profile of ECRL-based 4-bit multiplier. Current profile for                                                                                                                                          | 0.0 |
| ( ) ( | each power-clock waveform (4-phase) is shown                                                                                                                                                                 | 86  |
| 6.16  | Current profile of PAL-based 4-bit multiplier. Current profile for                                                                                                                                           | 0.0 |
| ( 17  | each power-clock waveform (2-phase) is shown                                                                                                                                                                 | 80  |
| 0.1/  | Iop-level layout view of the test chip.                                                                                                                                                                      | 88  |
| 6.18  | Microscope photo of the entire die                                                                                                                                                                           | 89  |

# **List of Tables**

| 2.1               | Common radio frequencies and their corresponding wavelengths                                                                                                                                                          | 9        |
|-------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|
| 3.1<br>3.2<br>3.3 | Wireless link specifications.       Phase shifter specifications.         Phase shifter specifications.       Phase shifter specifications.         Energy dissipation comparison of the proposed and traditional ap- | 19<br>21 |
| 5.5               | proach.                                                                                                                                                                                                               | 26       |
| 5.1<br>5.2        | Parameters of $T_x$ and $R_x$ coils                                                                                                                                                                                   | 43       |
| 5.3               | ventional approaches                                                                                                                                                                                                  | 45       |
| 5.4               | phases for each of the proposed methods and conventional approach.<br>Comparison of received power at the wireless link, overhead and                                                                                 | 55       |
|                   | logic power consumption for each of the proposed methods and conventional approach.                                                                                                                                   | 57       |
| 5.5               | Effect of voltage scaling on power consumption and performance under nominal operating conditions.                                                                                                                    | 60       |
| 5.6               | Effect of voltage scaling on power consumption and performance under slow process corners and nominal temperature of $27^{\circ}C.$                                                                                   | 61       |
| 5.7               | Effect of voltage scaling on power consumption and performance under slow process corners and high temperature of 127°C                                                                                               | 61       |
| 5.8               | Qualitative comparison of the proposed methods and conventional approach listing advantages and limitations.                                                                                                          | 62       |
| 6.1               | Performance of the bit-serialized SIMON32/64 cipher implemented                                                                                                                                                       | 72       |
| 6.2               | Summary of the physical area consumed by digital blocks. MULT4                                                                                                                                                        | 73       |
| 6.3               | Comparison of schematic-level and post-layout power consumption                                                                                                                                                       | /8       |
|                   | of standard cells implemented in conventional and proposed                                                                                                                                                            | 79       |

| 6.4 | Summary of the power consumed by digital blocks                    | 80 |
|-----|--------------------------------------------------------------------|----|
| 6.5 | List of circuit blocks that are implemented in the fabricated ASIC |    |
|     | chip                                                               | 90 |

#### ACKNOWLEDGEMENTS

Different people, from different countries, so generously contributed to the thesis work in the past five years. I would like to extend thanks to all the people here, who made the journey such memorable.

Firstly, I would like to express my sincere gratitude to my advisor, Prof. Emre Salman. I thank him not only for his tremendous contribution of time, ideas, and funding to support my work, but also for providing so many great opportunities for me to work with academic and industrial teams. I could not imagine having a better advisor and mentor for my doctoral study. I benefited so much from his patience, motivation, and immense knowledge.

Besides my advisor, I would like to acknowledge our collaborator and also my co-advisor Prof. Milutin Stanacevic for providing guidance and insightful suggestions. My sincere thanks also go to Yasha Karimi, and Yuanfei Huang for their technical support, impressive teamwork, and enjoyable friendship.

I would also thank two other members of my defense committee: Prof. Fan Ye and Prof. Samir Das, for their valuable feedback to improve my research work.

Furthermore, thanks go to the previous and current members of NanoCAS Lab: Hailang, Weicheng, Chen, Mallika, Manav, and Ivan, for giving me such a wonderful memory during my career. I wish all of you best of luck and a bright future. I am also grateful to all of my friends in both China and the United States for making it easier to be so far away from home.

Last but not least, special thanks go to my parents. I would never accomplish the degree without their endless support and selfless love in both my life and doctoral career.

# **Chapter 1**

# Introduction

Internet of things (IoT) has emerged as a novel computational paradigm connecting enormous number of network nodes with the everyday physical realm [1]. It has applications in many areas ranging from transportation to healthcare. The enabling factor of IoT is the development and integration of identification, sensing, computational logic, and wireless communication techniques. However, the existing IoT devices suffer from the fundamental limitation of power delivery. The advances in conventional electrochemical battery technologies have been relatively limited as compared to unprecedented progress in computing capabilities [2, 3, 4]. The wide and growing gap between these two trends is a serious concern to the predicted scalability of IoT devices.

Power harvesting methods have long been considered for traditional wireless sensor nodes and emerging IoT devices. Some examples include photovoltaic, electrostatic or piezoelectric, thermoelectric, and RF or inductive energy converters [5, 6, 7]. Computing techniques in the presence of unpredictable power sources have also received attention [8]. Wireless power transmission has become more common during the past decade due to popular applications such as RFIDs, wireless charging, and RF-powered bio-implantable devices [9, 10, 11]. Harvesting RF power has the potential to provide a relatively more stable energy, considering the presence of dedicated energy sources (RF exciter or RFID reader) or the abundance of ambient communication and broadcast signals (such as TV/radio broadcast, mobile and Wi-Fi transmitters) [12, 13].

RF power harvesting has been applied to many areas. In the near-field electromagnetic (EM) region, RF harvesting through inductive coupling is utilized in brain implantable micro-systems [14, 15, 16, 17]. In the far-field EM region where antennas are used at much longer distances [18, 19, 20], RF energy harvesting mitigates the issue of frequent battery replacement in sensors scattered inside a building, structure, or outdoor space, which are used for environmental and structural health monitoring [21, 22].

In a conventional wireless energy harvester, the propagating electromagnetic wave is received with an antenna or coupling coil. The harvested alternating signal is then converted into a DC signal through rectification, voltage multiplication, and regulation [23]. Unfortunately, the power efficiency of these processes is typically low, particularly when the harvested input power is very small as in typical energy harvesting systems [24, 25, 26].

In this thesis, an AC computing methodology is proposed for wirelessly powered devices, as conceptually illustrated in Fig. 1.1. This framework uses the harvested AC signal to power the digital logic while eliminating the lossy rectification and regulation stages. The AC signal is used as both the power and clock signals by leveraging adiabatic and charge-recycling circuit theory. Significant power is



Figure 1.1: AC computing methodology vs. conventional approach for RF-powered devices.

saved by (1) eliminating AC-to-DC conversion and (2) recycling charge during the operation of digital circuits. Charge-recycling logic is re-engineered for the implementation of the processing circuitry that is powered by the AC signal. In the first case study, a wirelessly powered implantable computational logic is developed to demonstrate the advantages of the proposed methodology and investigate various characteristics of AC computing-based circuit implementations. In the second example, a lightweight encryption algorithm, SIMON, is implemented by utilizing the proposed framework to achieve ultra-low power encryption for RF-powered devices. This second example is also fabricated as a test chip for experimental evaluation.

The rest of the thesis is organized as follows. In Chapter 2, background on AC computing methodology is provided and existing related research is summarized. Then, one of the proposed implementations, wirelessly powered efficient charge recovery logic (WP-ECRL) based computing framework, is presented in Chapter 3. Additionally, two alternative AC computing methods, wirelessly powered

pass-transistor adiabatic logic (WP-PAL) and wirelessly powered complementary energy path adiabatic logic (WP-CEPAL), are described in Chapter 4. To evaluate the performance of all three proposed methods and their counterpart (conventional static CMOS), a comprehensive case study (schematic-level) is developed in Chapter 5 with application to a wirelessly powered implantable device. Furthermore, a lightweight encryption algorithm for IoT security is implemented with the proposed methodology and fabricated in 65 nm CMOS technology. The details of this implementation, corresponding results, and the test chip are described in Chapter 6. Finally, the thesis is concluded in Chapter 7, where various future directions are also discussed.

# Chapter 2

# Background

By interfacing wireless energy harvesting with adiabatic circuit switching principle, the proposed methodology mitigates some of the fundamental limitations of both traditional charge-recycling logic and conventional wirelessly powered devices. Section 2.1 and Section 2.2, describe, respectively, the fundamental principles of adiabatic computing (charge-recycling operation) and wireless power harvesting. Then, existing research work is discussed in Section 2.3. Finally, the contributions of this work are highlighted in Section 2.4.

# 2.1 Historical Perspective to Adiabatic Computing

The term adiabatic originates from the field of thermodynamics and is used to describe a process where no heat or matter is transferred between a physical system and its surrounding environment [27, 28]. Therefore, in an adiabatic process, no energy would be lost in the form of dissipated heat at thermodynamic equilibrium.

Decades before power consumption has become a primary design concern in



Figure 2.1: Equivalent RC circuit to determine the energy loss in adiabatic logic.

VLSI circuits, several theoretical physicists explored the fundamental correlation between heat generation and computing energy dissipation [29, 30]. In 1961, physicist R. W. Landauer demonstrated that a computing system must dissipate energy at least equal to  $kT \ln 2$  if one bit of information is erased during computation [31]. Later, C. H. Bennett developed the concept of reversible computing, describing that a physical computer can be made thermodynamically reversible, and hence the dissipated energy can be less than  $kT \ln 2$  or asymptotically approach zero if operated sufficiently slowly [32]. Even though reversible computing has not yet been fully demonstrated in practice, following these ideas, the charge-recycling operation and adiabatic switching were invented in the early 1990s [33]. Charge-recycling or adiabatic switching leverages these principles in circuit design to minimize power consumption.

Adiabatic switching operation is illustrated in Fig. 2.1. Consider the equivalent circuit for an adiabatic logic gate, where *C* is the load capacitance and *R* is the on-resistance of transistors along the charging path [34]. Contrary to the conventional charging that is achieved by a constant DC voltage, a time-varying voltage source is used as the power supply. If the transition time  $t_r$  is sufficiently long, capaci-

tance voltage  $v_C(t)$  approximately follows the input signal v(t) [*i.e.*,  $v_C(t) \approx v(t)$ )]. Therefore, the charging current is

$$i(t) = C\frac{dv(t)}{dt} = \frac{CV_{DD}}{t_r}.$$
(2.1)

The energy dissipated during a charging event is calculated by integrating the instantaneous power p(t) during the transition time  $t_r$ ,

$$E = \int_0^{T_r} [v_R(t) + v_C(t)] \cdot i(t) dt = \frac{RC}{t_r} C V_{DD}^2.$$
(2.2)

A complete cycle consists of charging and recovery. Since the recovery process consumes the same amount of energy, the overall dissipation during one cycle is expressed by

$$E_{AL} = 2\frac{RC}{t_r}CV_{DD}^2.$$
(2.3)

As indicated by (2.3), in adiabatic operation, energy is inversely proportional with the transition time. This characteristic is unlike static CMOS operation where energy to charge a capacitance does not depend upon the transition time, as indicated by

$$E_{Static} = \alpha \frac{1}{2} C V_{DD}^2, \qquad (2.4)$$

where  $\alpha$  represents the switching activity factor. Equating (2.3) with (2.4) yields

$$E_{Static} = E_{AL},$$

$$\alpha \frac{1}{2} C V_{DD}^2 = 2 \frac{RC}{t_{critical}} C V_{DD}^2,$$

$$\rightarrow t_{critical} = \frac{4RC}{\alpha}.$$
(2.5)

At the critical transition time  $t_{critical}$ , both adiabatic and static energy consumption are equal. Therefore, it is important to note that the transition time should be larger than  $4RC/\alpha$  for the adiabatic operation to outperform static CMOS. Unlike self-powered IoT applications, this behavior can be a disadvantage for conventional applications that demand high performance. It is also important to note that the parameter *RC* is highly technology dependent. In modern technologies, this parameter is in the range of picoseconds. Thus, reasonable power savings can be achieved by adiabatic operation at sufficiently high frequencies.

A typical adiabatic circuit consists of two primary parts, a digital core consisting of charge-recycling gates and a circuitry for the generation of the AC power supply signal [35]. The generated AC signal behaves as both the power supply and clock signal and commonly referred to as the power-clock signal. Thus, adiabatic circuits are typically inherently pipelined. Due to its highly desirable characteristic, significant research has been conducted on charge-recycling operation since 1990s [36, 37, 38]. Despite these research efforts, the practical application of charge-recycling has remained limited due to well understood limitations, particularly on the inefficient generation of the aforementioned power-clock signals, with up to 10 to 30% efficiency [39, 40]. When charge-recycling is applied to wirelessly powered devices, this significant limitation is eliminated since the harvested signal

| Frequency      | Band | Wavelength | $\lambda/2\pi$ |
|----------------|------|------------|----------------|
| 125 - 134 KHz  | LF   | 2.3 km     | 367 m          |
| 13.56 MHz      | HF   | 22 m       | 3.5 m          |
| 865 - 868 MHz  | UHF  | 35 cm      | 5.5 cm         |
| 902 - 928 MHz  | UFH  | 33 cm      | 5.2 cm         |
| 2.4 - 2.48 GHz | UFH  | 12 cm      | 2 cm           |
| 5.8 GHz        | SFH  | 5.1 cm     | 0.8 cm         |

Table 2.1: Common radio frequencies and their corresponding wavelengths.

is already in the form of an AC signal. Another traditional limitation is the reliable distribution of multiple power-clock signals throughout the entire die while maintaining certain phase difference among the power-clock signals. This limitation is also alleviated in wirelessly powered devices due to much smaller die areas (and therefore less complexity) as compared to high performance circuits.

# 2.2 Wireless Power Transfer

The primary power source used in this work is radio frequency (RF) waves, generated from either a dedicated RF transmitter or ambient RF energy sources, such as Wi-Fi, TV, and radio broadcasting signals [12, 13, 26, 41]. Ambient RF harvesting is free and ubiquitous, but the transmitted power fluctuates significantly, from 10 MW for TV broadcasting tower, to 0.1 W for mobile communication devices [18]. Consequently, a supercapacitor or microbattery is required to sustain a relatively stable power level. In contrast, RF harvesting from a dedicated source is able to provide predictable energy and is therefore more reliable for applications that demand continuous power [19, 42, 43]. In this thesis, the primary focus is on RF power harvesting from dedicated sources.

RF power harvesting covers a large range of frequencies, as listed in Table 2.1 [19]. Furthermore, determined by the transmission range r, wavelength  $\lambda$  and radiating antenna size D, wireless power transfer can be divided into three major communication mechanisms between the RF source and the receiver, as expressed by [44]

$$\frac{\lambda}{10} < D < \frac{\lambda}{2} \rightarrow \begin{cases} r \le 0.62\sqrt{\frac{D^3}{\lambda}}, & \text{Near Field Inductive} \\ 0.62\sqrt{\frac{D^3}{\lambda}} < r \le \frac{2D^2}{\lambda}, & \text{Near Field Radiative} \\ r \gg \lambda, r \gg D. & \text{Far Field Radiative} \end{cases}$$
(2.6)

Based on the wireless energy harvesting technique used in RFID tags and certain biomedical implantable devices, the target IoT device in this research extracts the power supply voltage through near-field inductive coupling at 13.56 MHz [45]. In this scenario, a primary coil is driven by an RF power amplifier to transmit a dedicated radio wave. The target wireless device harvests the electromagnetic energy by secondary coils (that are inductively coupled).

The percentage of energy extracted by the secondary coil can be evaluated from the coupling factor k, which, in the air, can be empirically expressed by [11]

$$k = \frac{r_1^2 r_2^2}{\sqrt{r_1 r_2} (\sqrt{x^2 + r_2^2})^3},$$
(2.7)

where  $r_1$  and  $r_2$  are the radii of, respectively, the primary and secondary coils. x represents the distance between the two coils. This weakly coupled topology can be accurately modeled with an *RLC* circuit and an ideal transformer, as depicted in Fig. 2.2. In this electrical model,  $L_1$  and  $L_2$  of the ideal transformer represent



Figure 2.2: Equivalent model of a weakly coupled inductive energy harvesting system.

the two separate coils.  $R_p$  represents the parasitic resistive loss in the front-end harvesting circuitry and  $C_t$  is the tuning capacitance for boosting the coil voltage level. The ratio N inside the transformer is given as [11]

$$N = k \sqrt{\frac{L_2}{L_1}}.$$
(2.8)

### 2.3 Existing Works on AC Computing

In earlier research on AC computing, a very low frequency (60 Hz - 300 Hz) AC power supply was proposed for wirelessly powered devices [46]. Since the power supply frequency is several orders of magnitude lower than the typical digital data rates, the AC signal behaves as if the voltage level is approximately constant. The processing circuit undergoes three phases of operation: turn-on (once the power supply exceeds the threshold voltage), perform computation, and turn-off [47]. An accurate power-on-reset is required to guarantee correct operation. Furthermore, a dynamic memory cell is needed to retain states between on and off cycles. As a significant limitation, this approach can only be applicable to wireless power harvesting systems with very low frequency.

In 2004, an AC-only RFID tag was proposed for bar-code replacement [48]. The logic adopted in the circuitry is a combination of quasi-static energy recovery logic [49] and a group of transmission gates that switch on and off during each half cycle of the AC power supply signal. The RFID tag chip was fabricated in 0.13  $\mu$ m CMOS process and consumes 0.002 mm<sup>2</sup>, approximately three times smaller than a conventional RFID-tag. The proposed approach, however, increases the overall power consumption.

In 2007, a quasi-static adiabatic logic consisting of diodes [50] was used for a low frequency (LF) ID tag [51]. The circuit consists of a pair of cross-coupled nMOS transistors, a pair of complementary NP functional blocks, and two diodes in series with P logic. The proposed approach alleviates the issue of dynamic power in conventional adiabatic circuits, but suffers from relatively large energy dissipated by the diodes.

In 2016, an RFID tag with RF powered digital logic (without rectification) was proposed [52]. The proposed RF-Only logic is similar to the quasi-static energy recovery logic in terms of topology [49]. The top and bottom power supply transistors behave as a switch controlled by the AC signal rather than acting as diodes. The proposed logic is similar to static CMOS (therefore relatively easier to design complex circuits), but suffers from degraded robustness caused by the floating output node during part of the operation. Furthermore, only a small portion of the charge stored at the load capacitance is recycled, which is not energy-efficient. According to simulation results, the tag area was reduced by approximately 80%. The overall power consumption, however, increased by an order of magnitude.

Contrary to these existing studies that primarily focus on device footprint (and therefore cost), the primary emphasis of the proposed approach in this thesis is on



Figure 2.3: Summary of the three proposed AC computing methodologies for wireless power harvesting and required auxiliary circuitry for each approach.

achieving an order of magnitude reduction in power consumption. Such significant reduction has the potential to expand the application domain of wirelessly powered devices such as RFIDs due to stronger data processing capability under the same power budget.

# 2.4 Contributions of This Work

In the proposed approach, the wirelessly harvested AC signal is used to power the digital logic that relies on charge-recycling/adiabatic circuits [53]. The rectification and regulation steps that exist in conventional methods are eliminated. As depicted in Fig. 2.3, three approaches are proposed, each exhibiting different requirements and tradeoffs, as described in the following chapters. The major contributions of this thesis can be summarized as follows:

- Three different circuit implementations of the proposed AC computing methodology are developed and evaluated with a comprehensive case study for brain implantable devices with near-field based wireless power transfer.
- The effect of circuit size on both overhead (consumed by supporting circuitries in the proposed methods) and processing power is investigated for each of the proposed methods and the conventional approach.
- The behavior of the proposed methods under scaled voltages is investigated and compared with the conventional approach.
- The advantages and limitations of the proposed methods are qualitatively discussed, thereby identifying related tradeoffs and providing useful design guidelines.
- The proposed AC computing methodology is demonstrated via a test chip in 65 nm CMOS technology, where a lightweight encryption algorithm, SI-MON, is implemented. The ultra-low power implementation of encryption enables more secure operation in highly resource-constrained IoT devices that rely on wireless energy.

# Chapter 3

# WP-ECRL based AC Computing Framework

In the proposed computing approach, the harvested AC signal (in the form of a sine wave) is directly used for computation by leveraging existing efficient charge recovery logic (ECRL) circuit. In this charge-recycling based logic, power dissipation is significantly reduced by 1) steering the currents across the transistors with small voltage differences, and 2) by gradually recovering part of the energy stored in the parasitic capacitances [38]. The following sections in this chapter are organized as follows. In Section 3.1, the basic principles of ECRL circuit are described. The design details of an ECRL-based 4-bit carry ripple adder are illustrated in Section 3.2. The auxiliary circuits used in WP-ECRL computing framework are discussed in Section 3.3, Section 3.4, and Section 3.5. Finally, the simulation results are presented in Section 3.6 and this work is summarized in Section 3.7.

### **3.1 ECRL Operation Principles**

The efficient charge recovery logic (ECRL) was invented in mid 1990s as an alternative computing method to static CMOS operation [35]. Its applicability, however, has remained highly limited due to the inefficient generation of the required AC (or trapezoidal) signal from a DC supply voltage, sacrificing most of the power savings [54].

In this chapter, ECRL, powered by a four-phase AC signal (known as the powerclock signal), is chosen as the logic family, as shown in Fig. 3.1(c). Power-clock signal is a sinusoidal waveform and has four phases: evaluation (E), hold (H), recovery (R), and wait (W). As shown in Fig. 3.1(b), an ECRL inverter has two crosscoupled PMOS transistors for precharge/evaluation and recovery phases, and two NMOS transistors for the functionality. Once the *PCLK* signal reaches the threshold voltage of M1, *outbar* starts to follow *PCLK* signal, assuming signal *in* is at logic high (so that *outbar* is at logic low). During the hold phase, the output voltage is stable so that the next stage can properly evaluate. During the recovery phase, *PCLK* gradually falls, recycling the charge stored on the load capacitance. For symmetry, a wait phase is inserted to complete the four-phase power-clock operation. There is 90° phase difference between the power-clock signal of adjacent gates, requiring four power-clock signals. Thus, ECRL logic is inherently pipelined.

For WP-ECRL, as depicted in Fig. 3.1, a peak detector and two phase shifters are introduced. The peak detector consists of a diode-connected PMOS transistor where the output signal is connected to the bulk terminals of all of the PMOS transistors to properly bias the substrate [55]. Note that the current that flows through the peak detector is negligible and therefore the power loss is minimized. Two *LC* phase shifters are used to obtain four AC signals with 90° phase difference.



Figure 3.1: WP-ECRL: (a) Requirement for two phase shifters and a peak detector. (b) AC power-clock signal. (c) Inverter gate. (d) Cascaded gates.

# 3.2 Charge-Recycling 4-Bit Carry Ripply Adder

A 4-bit adder is chosen to demonstrate the functionality of the computational block that operates with the wirelessly harvested AC signal. Considering the inherently pipelined characteristic of ECRL logic, output signals should be carefully synchronized. Specifically, the overall number of ECRL gates from input to output should be the same for each output. This behavior requires the insertion of ECRL buffers along certain output paths.

A 4-bit carry ripple adder consisting of four cascaded 1-bit full adders requires 4 cycles (since 1-bit full adder requires 1 cycle) to complete an addition operation, assuming a standard cell based adder design. Instead, the *propagate* and *generate* signals are utilized for the carry ripple adder, as depicted in Fig. 3.2. Furthermore, the AND gate and OR gate are merged into one complex ECRL gate, referred to as the ECRL group PG cell [35]. Finally, due to inherent pipelining, buffers are inserted to synchronize each output. A logic depth of 1.25 cycles is achieved. This 4-bit ECRL carry ripple adder can be used as a building block to develop high bit ECRL adders.

### 3.3 Wireless Link

The parameters of the wireless link for the proposed approach are listed in Table 3.1. The two inductors are assumed to be implemented on the board. Note that the two inductors within the secondary coupling circuit are configured such that the two harvested RF signals have 180° phase difference, thereby providing the first and third power-clock signals required for the ECRL logic. The remaining two phases are obtained by the proposed phase shifter, described in the following section.



Figure 3.2: Block diagram of a 4-bit ECRL carry ripple adder utilizing the *generate* and *propagate* signals.

| Parameter | $L_1$            | $L_2$   | $R_p$ | $C_t$          | N |
|-----------|------------------|---------|-------|----------------|---|
| Value     | 228.4 <i>n</i> H | 3.656µH | 1Ω    | 330 <i>p</i> F | 3 |

Table 3.1: Wireless link specifications.



Figure 3.3: Phase shift circuitry modeled as a  $\pi$ -*LC* low pass network.

# **3.4 Phase Shift Circuitry**

Phase shifters generate a fixed phase angle along a transmission line driven by an electromagnetic wave of a certain frequency. Switched low pass and high pass topologies are commonly used in monolithic microwave ICs for achieving a flat band of 180° phase shift [56]. Inspired from this topology, the low pass arm is extracted from the switched line phase shifter to generate the four phase powerclock signals. The proposed phase shift circuitry can be modeled as a  $\pi$ -*LC* low pass network, as shown in Fig. 3.3. For a  $\theta$  phase shift, the values of inductor (*L*) and capacitor (*C*) in the model are determined from

$$L = \frac{Z_0 \sin \theta}{\omega}$$
 and  $C = \frac{1 - \cos \theta}{\omega Z_0 \sin \theta}$ , (3.1)

where  $Z_0$  is the parallel impedance to alleviate the effect of load variations. The design parameters of the proposed phase shifter are listed in Table 3.2. The resistor and inductor are assumed to be implemented on the board. As mentioned in the previous subsection,  $0^\circ$  and  $180^\circ$  power-clock signals are obtained from the secondary

| Parameter | L     | С              | $Z_0$ |
|-----------|-------|----------------|-------|
| Value     | 796µH | 100 <i>f</i> F | 100kΩ |

Table 3.2: Phase shifter specifications.



Figure 3.4: Simulated power-clock signals with 90° phase difference.

coupling circuit. Thus, when these two signals propagate through the proposed  $90^{\circ}$  phase shifter, the third and fourth power-clock signals are generated to complete the full operation of ECRL computational block. The output of the phase shifter is illustrated in Fig. 3.4, demonstrating  $90^{\circ}$  phase difference.

# 3.5 Peak Detector

Elimination of the rectification stage makes it difficult to bias the NWell of the cross-coupled PMOS transistors in both WP-ECRL and WP-CEPAL. If these bulk


Figure 3.5: Diode-connected MOS transistor used as peak detector to properly bias the bulk terminals of the pMOS transistors.

nodes are connected to the AC power-clock signal (with negative voltage components), the bulk-to-drain junction diodes are turned on when the junction voltage exceeds the forward-on threshold, dissipating unnecessary power due to significant forward bias diode current. To prevent this issue, a peak detector is used, as depicted in Fig. 3.5.

Thus, an unregulated DC voltage (approximately equal to the peak value of the input AC signal) is produced for the bulk nodes. Forward biased junction diodes are prevented. Output of the peak detector is shown in Fig. 3.6 where the input is the harvested AC signal. The relatively large ripple voltage at the output does not degrade performance since this voltage is used only to bias the bulk terminals.

Note that unlike a conventional rectifier used for generation of DC supply voltage that transmits significant current to the output load, the current across the proposed peak detector is negligible. Thus, the energy loss is sufficiently low.



Figure 3.6: Simulated input and output waveforms of the peak detector.

## 3.6 Simulation Results

To investigate and quantify the benefits of the proposed approach, both the traditional method that rectifies and regulates the harvested AC signal and the proposed method are designed using 45 nm technology. Both approaches have the same wireless link, as described in Section 3.3 where the transmission frequency is 13.56 MHz. Note that this is the standard frequency for silicon based item-level RF identification [57].

For the conventional approach, an efficient and low complexity rectifier and regulator are designed. The output voltage is regulated at approximately 1 V. This voltage powers a conventional 4-bit carry ripple adder operating at 13.56 MHz. All of the primary inputs and outputs are latched into flip-flops.

Alternatively, for the proposed approach, the phase shifter described in Sec-



Figure 3.7: Four power-clock signals at 13.56 MHz frequency and with  $90^{\circ}$  phase difference, as generated by the phase shifter. These sinusoidal power-clock signals drive the ECRL adder in the proposed approach.

tion 3.4 is designed to generate four power-clock signals with 90° phase difference, as depicted in Fig. 3.7. These sinusoidal power-clock signals are used to drive the ECRL adder, described in Section 3.2. Note that the peak amplitude of the powerclock waveforms shown in Fig. 3.7 is not identical with approximately 100 mV variation. This variation is due to the slightly different load impedance seen by each power-clock signal. This difference, however, does not affect the proper operation of the ECRL adder since the peak voltage is greater than the threshold voltage. The output signals of the ECRL adder driven by the wirelessly harvested AC signal are shown in Fig. 3.8 when a cyclic input data pattern is provided as  $C_{in} = 11110000$ ,  $A_i = 01010101$ ,  $B_i = 11001100$ , where i = 1, 2, 3, 4. As demonstrated by the output



Figure 3.8: Output signals of the 4-bit charge-recycling adder driven by four AC signals. The AC signals are obtained from the phase shifter, which takes the wire-lessly harvested AC signal as the input.

| Conventional            | Design Propos    |               | ed Design        |  |
|-------------------------|------------------|---------------|------------------|--|
| Block                   | Energy Loss (fJ) | Block         | Energy Loss (fJ) |  |
| 4-bit Adder             | 5958             | 4-bit Adder   | 500              |  |
| Rectifier and Regulator | 70502            | Phase Shifter | 11490            |  |
| Total                   | 76460            | Total         | 16490            |  |

Table 3.3: Energy dissipation comparison of the proposed and traditional approach.

signals, the ECRL adder accurately and reliably works with the harvested AC signal where the logic high voltage levels are sufficiently distinguishable.

To compare energy consumption, both the traditional design (wireless link, rectifier, regulator, and static CMOS based 4-bit adder running at 13.56 MHz) and proposed approach (wireless link, phase shifter, and ECRL based 4-bit adder running with four harvested AC signal, each at 13.56 MHz) are simulated for 4  $\mu$ s and the consumed energy is analyzed, as listed in Table 3.3. According to this table, the overall energy consumed by the proposed approach is five times less than the energy consumed by the traditional approach, even though both systems operate at the same 13.56 MHz frequency. Furthermore, the reduction in energy consumption is expected to further increase with a larger computational block since charge-recycling adder consumes approximately eleven times less energy than the static CMOS adder. Thus, the overhead incurred due to the phase shifter is further reduced as the size of logic circuit grows.

## 3.7 Summary

A novel WP-ECRL based computing method has been proposed for IoT devices. The proposed method revitalizes the existing charge-recycling theory through application to wireless power harvesting. Despite the well-known limitations of the charge-recycling circuits in conventional systems, in the proposed application, charge-recycling can achieve significant power savings since the wirelessly harvested signal is already in the form of an AC signal. Electrical models and circuits have been developed for the wireless link, phase shifter, rectifier, and regulator. A comprehensive analysis method has been generated to achieve a fair comparison. Simulations in 45 nm technology demonstrate that the proposed approach can reduce the overall energy by approximately five times.

## Chapter 4

# WP-PAL and WP-CEPAL based AC Computing Framework

In the previous chapter, the proposed WP-ECRL framework introduces the phase shifters to generate four power-clock signals with 90° phase difference. Phase shifter requires very high quality and large inductors (particularly at low frequencies) to minimize loss, which can be highly challenging to implement. These blocks also reduce the overall power savings achieved by AC computing. Two alternative charge-recycling based methods are proposed and described in Section 4.1 and Section 4.2, respectively. Simulation results and related discussion are presented in Section 4.3. Finally, the chapter is concluded in Section 4.4.



Figure 4.1: WP-PAL: (a) Requirement for two signal shapers and AC power-clock signals. (b) Inverter gate. (c) Cascaded gates.

## 4.1 WP-PAL based Computing Framework

In this chapter, pass-transistor adiabatic logic (PAL) [58] is used for AC computation with an additional block, referred to as signal shaper, as illustrated in Fig. 4.1.

#### 4.1.1 PAL Operation Principles

Similar to ECRL, a PAL gate consists of two NMOS transistors N1, N2, and a pair of cross-coupled charging/recovering PMOS transistors P1, P2. The primary advantage of PAL over ECRL is the ability to fully recover charge since the NMOS transistors are connected to AC power supply (unlike ECRL where the NMOS transistors are connected to ground) [59]. Thus, a lower logic power consumption can be achieved. The operation of a PAL inverter shown in Fig. 4.1(b) can be summarized as follows. Assume that initially, input signal *in* is at logic high and AC supply *PCLK* is rising. A conducting path is formed between *outbar* and *PCLK* since N1 is on. Thus, node *outbar* follows the *PCLK* whereas node *out* is floating. As the *PCLK* reaches the threshold voltage, transistor P1 turns on and fully charges *outbar*. Finally when the *PCLK* is falling, the charge stored at *outbar* node is fully recovered through both N1 and P1.

In addition, note that PAL is a two-phase logic where the AC supply of each successive gate is 180° out-of-phase with the respect to the previous gate, as depicted in Fig. 4.1(c). Thus, when one of the gates is at the *evaluation* phase, the preceding gate is at the *hold* phase, maintaining the input signals stable for the evaluating gate. This behavior is also a significant advantage over ECRL that requires four-phase operation. Thus, contrary to ECRL, in PAL based AC computing for wireless devices, a phase shifter is not needed within the receiver, thereby reducing the overhead power consumption. Note that the two inductors within the receiver are configured such that the two harvested AC signals have 180° phase difference, which is sufficient for PAL operation.

#### 4.1.2 Signal Shaper

Despite the advantages mentioned in previous section, PAL cannot correctly operate with the harvested AC signal that has both positive and negative voltage components. To mitigate this limitation, a low complexity signal shaper is proposed, as shown in Fig. 4.2. The proposed signal shaper consists of a PMOS transistor with the bulk, gate, and one of the junctions shorted together. The input is the wirelessly harvested AC signal with -1 V to 1 V whereas the output is from 0 to approximately



Figure 4.2: Signal shaper for WP-PAL based AC computing.

1 V. The signal shaper lets the output gradually follow the shape of the input AC signal, but does not let the output fall below zero volt (see Fig. 4.2). The simulated output waveform of signal shaper is shown in Fig. 4.3. Specifically, the transistor is sized to act as a voltage divider where the output signal is always positive. When the input voltage is higher than the output voltage (transistor is on), the voltage difference across the signal shaper is sufficiently low, thereby minimizing the power loss.

Two signal shapers are required since there are two harvested AC signals with 180° phase difference. The simulated output waveform of a wirelessly powered inverter with and without signal shapers is illustrated in Fig. 4.4. As shown in this figure, the signal shaper eliminates the negative voltage components at the output that can be as low as -0.5 V when there is no signal shaper.

It is important to note that the proposed signal shaper can work bidirectionally and recover charge. Even when the transistor is off due to zero  $V_{GS}$ , the charge flows back to the AC power supply through relatively large gate-to-source and gate-tobulk capacitances since the junctions have AC signals (unlike static CMOS). Thus, full charge recovery is preserved.



Figure 4.3: Simulated output voltage and current waveforms of a signal shaper.



Figure 4.4: Simulated output waveform of a wirelessly powered inverter with and without signal shapers.

However, as the capacitance of the transistor has to be comparable to the equivalent capacitance of PAL logic, the size of the transistor becomes larger leading to the lower efficiency of the shaper. To overcome the efficiency degradation, an improved signal shaper with an external capacitance has been proposed and theoretically analyzed in this work [60].

## 4.2 WP-CEPAL based Computing Framework

The primary components of the wirelessly powered complementary energy path adiabatic logic (WP-CEPAL) based approach is depicted in Fig. 4.5.

#### 4.2.1 CEPAL Operation Principles

CEPAL is similar to static CMOS logic with complementary pull-up and pulldown networks [61]. In addition, CEPAL has a pair of diode-connected charging transistors (P1 and P2) and a pair of diode-connected discharging transistors (N1 and N2), as shown in Fig. 4.5(c). Similar to PAL, CEPAL requires two out-ofphase power-clock signals. Unlike PAL, each gate of CEPAL requires both powerclock signals at the same time, as depicted in Fig. 4.5(d). As such, CEPAL is not inherently pipelined but quasi-static.

Referring to Fig. 4.5(c), assume that initially, node *out* is at logic low and the input of the inverter transitions from logic high to logic low. As *PCLK* starts to rise from low to high ( $\overline{PCLK}$  is transitioning from high to low), pull-up transistor *P3* switches on, enabling one of the two charging paths through diode-connected transistors P1 and P2 (depending upon the AC supply voltage). Thus, the output node follows either *PCLK* or  $\overline{PCLK}$ . Once *out* node reaches the peak voltage level,



Figure 4.5: WP-CEPAL: (a) Requirement for two signal shapers and a peak detector. (b) Two out-of-phase AC power-clock signals. (c) Inverter gate. (d) Cascaded gates.

the *PCLK* starts to go down, producing a floating output for a short amount of time. This issue is eliminated by the rising  $\overline{PCLK}$ . In order to guarantee proper operation, CEPAL requires two signal shapers and a peak detector to provide, respectively, an appropriate AC supply and bulk biasing voltage, as indicated by Fig. 4.5(a).

## 4.3 Simulation Results

To demonstrate the feasibility of the proposed WP-PAL and WP-CEPAL and quantify the benefits in efficiency, a wireless link based on near-field inductive coupling has been designed. The weakly coupled wireless link can be accurately modeled with an *RLC* circuit and an ideal transformer, while considering the resistive losses due to the coils (Refer to Section 2.2 in Chapter 2 and Section 3.3 in Chapter 3).

The wireless link has been combined with a 16-bit carry select adder that has been designed using both conventional and proposed methods [62]. A 45 nm CMOS technology has been used for each method. The wirelessly harvested signal varies from -1 V to +1 V. In the conventional approach, the harvested AC signal has been converted into a regulated DC signal of 1 V through a rectifier and regulator. The regulated DC signal has been used to power conventional static CMOS based 16-bit carry select adder with flip-flops at the primary inputs and outputs. Alternatively, in the proposed approach, the harvested AC signal has been directly used for computation using three different charge-recycling mechanisms (WP-ECRL, WP-PAL, and WP-CEPAL), as described in the previous section and chapter. ECRL and PAL are inherently pipelined so no flip-flops are required. Additional buffers, however, are inserted to ensure that the number of gates from inputs to each output



Figure 4.6: Comparison of the average power consumed by the 16-bit carry select adder operating at 13.56 MHz and designed in both existing and the proposed methods.

is the same, thereby correctly synchronizing the data flow. Similar to static CMOS, CEPAL requires flip-flops for synchronization. The average power consumed by each method has been analyzed at the same frequency (low RFID frequency band of 13.56 MHz) for 100 clock cycles with the same input data pattern. The results are shown in Fig. 4.6. According to this figure, the overall average power consumed by the conventional method is approximately 26.4  $\mu$ W. Alternatively, the power consumed by the proposed method can be as low as 1.97  $\mu$ W (for the wirelessly powered PAL), demonstrating 13.4× reduction. Note that the power represents the logical power consumed by the 16-bit adder whereas the overhead power represents the power consumed by the supporting blocks. In conventional method, these supporting blocks include the rectifier and regulator, consuming 17.6  $\mu$ W, approxi-

mately 67% of the overall power. In wirelessly powered (WP) ECRL, the overhead power is due to two phase shifters (resistive loss) and a peak detector, consuming 3.7  $\mu$ W. In WP-CEPAL, the overhead power due to two signal shapers and a peak detector is 1.5  $\mu$ W. Finally, in WP-PAL, the overhead power due to two signal shapers is 1.7  $\mu$ W. These results demonstrate that in the proposed approach, both the processing and overhead power are reduced, increasing the energy efficiency of a wirelessly powered device by up to an order of magnitude.

#### 4.4 Summary

The two proposed charge-recycling approaches (WP-PAL and WP-CEPAL) can achieve up to  $13.4 \times$  reduction in overall power as compared to the conventional case. More importantly, contrary to WP-ECRL based AC computing framework, the two frameworks in this chapter eliminate the phase shifters that are highly challenging to implement and obtain approximately 60% reduction in the overhead power consumed by auxiliary circuits (phase shifters and peak detectors in WP-ECRL, signal shapers and peak detectors in WP-PAL and WP-CEPAL).

## Chapter 5

# AC Powered ALU for Deep Brain Implantable Devices

A wirelessly powered 8-bit arithmetic logic unit (ALU) is designed in this chapter with application to biomedical implantable devices [63]. The objective of this chapter is to achieve a comprehensive comparison of the three proposed implementations. The rest of the chapter is organized as follows. The background related to implantable electronic devices is provided in Section 5.1. The customized inductive wireless link is discussed in Section 5.2. In Section 5.3, the details of the proposed AC powered 8-bit ALU are presented. The simulation results are analyzed in Section 5.5. Finally, a discussion on design tradeoffs related to AC computing methodology is provided in Section 5.6, and conclusions are drawn in Section 5.7.

## 5.1 Overview of the Application

In implantable applications, wireless powering and communication are essential since any wiring through the skin poses a significant health risk. RF energy is transmitted through inductive coupling due to high attenuation of the electrical field within the body [64]. Transmitted RF power is limited by the heating of the body tissue and these limits are determined by specific absorption rates for different tissues [65, 66]. This limitation significantly constrains the amount of power that can be delivered to the implant. Thus, achieving high energy efficiency and small form factor are among the primary challenges in the design of implantable devices.

Implantable devices record biological signals, such as neural activity [67, 68], and/or stimulate different parts of the neural system [69]. In many applications, like in the case of deep-brain implants, the optimal stimulation timing and pattern are obtained by processing the recorded data, calling for the design of a closed-loop system. The existing designs of such systems [70, 71] have data processing implementation moved to a different location within the body where more energy is available due to the extended physical space for a battery. As demonstrated in this thesis, the proposed AC computing methodology significantly reduces the energy cost of data processing and can therefore lead to implementation of a closed-loop system on a single substrate.

## 5.2 Inductively Coupled Wireless Link

In this case study, the focus is on near-field energy harvesting. The wireless power harvesting system includes an external coil placed adjacent to the skin (for transmission of the RF signal) and an implanted receiving coil. An RF power ampli-



Figure 5.1: Lumped model of an inductively coupled wireless power harvesting system.

fier drives the external coil and a dedicated RF electromagnetic wave is transmitted. A portion of this RF energy is captured by the implantable coil. The lump model of the link is illustrated in Fig. 5.1 [64], which presents a reasonable approximation up to 100 MHz of the transmitting frequency.  $L_1$  and  $L_2$  represent the inductances of the two separate coils and M is the mutual inductance.  $R_{s1}$  and  $R_{s2}$  represent parasitic resistive loss, while  $C_{p1}$  and  $C_{p2}$  are the parasitic capacitances in the coils.  $C_1$  and  $C_2$  are the tuning capacitance in order to achieve resonance in both the external and implantable circuits.  $R_S$  and  $R_L$  are the source and load resistances. The design of the coils in the wireless link is driven primarily by the application, that sets the physical constraints in the design of both coils and the distance between the coils. The form factor of the implant determines the size of the receiving coil. The optimal frequency of the transmission is also dependent on the application scenario. If the external coil is sized on the order of a few centimeters, the optimal frequency is on the order of 40 MHz [72]. In the case of smaller external coil, the optimal frequency shifts to the order of 1 GHz [73].

To illustrate the design process of a wireless link, a deep brain implantable device is assumed. Transmitting coil is designed with a diameter of 5 cm and two designs of the receiving coil with diameters of 1.5 mm and 3 mm are explored. A full-wave electromagnetic field solver based on finite element method, HFSS (high frequency structural simulator), is utilized to analyze and extract the network characteristics of the wireless link. The power efficiency and the parameters for the equivalent narrow band model of the link (see Fig. 5.1) are determined from S parameters extracted from HFSS and Keysight Advanced design system (ADS) simulations. The simulation setup is depicted in Fig. 5.2 and the human head model is shown in Fig. 5.3. The physical and electrical characteristics of the coils are summarized in Table 5.1. The power efficiency reaches -37.4 dB at a distance of 6 cm, assuming matching networks are available for transformation of load impedance to achieve optimum efficiency. The maximum available power for the operation of the implantable device as a function of the distance between the coils is shown in Fig. 5.4. This figure demonstrates that by lowering the power consumption of the implantable coil (as targeted in this research), a greater implantation depth can be achieved.

#### 5.3 AC Powered 8-bit ALU

The RF energy harvested from the near-field wireless link described above is used to power an 8-bit ALU. The ALU consists of two types of computational



Figure 5.2: Wireless link simulation setup: transmitting and receiving coils at  $D_{imp}$  distance.



Figure 5.3: Model of the human head for deep brain implantable devices. GM and WM refer, respectively, to gray matter and white matter.

| Parameter                  | $T_x$ | $R_{x1}$ | $R_{x2}$ |
|----------------------------|-------|----------|----------|
| Diameter (mm)              | 50    | 1.5      | 3        |
| Material                   | Си    | Си       | Си       |
| Number of turns            | 1     | 2        | 2        |
| Trace width (mm)           | 3     | 0.2      | 0.3      |
| Trace Thickness (µm)       | 38    | 38       | 38       |
| Space between turns (mm)   | N/A   | 0.1      | 0.1      |
| Effective L@13.56 MHz (nH) | 126.3 | 4.6      | 17.4     |
| Resistance $(m\Omega)$     | 64.7  | 21.2     | 49.9     |
| Resonance capacitor $(nF)$ | 1.09  | 29.9     | 7.9      |

Table 5.1: Parameters of  $T_x$  and  $R_x$  coils.



Figure 5.4: Available power for the operation of the implant with two different sizes of the receiving coil as a function of distance between transmitting and receiving coils.



Figure 5.5: Block-level diagram of the 8-bit arithmetic logic unit.

blocks: boolean logic (INV, OR, XOR, and AND) and arithmetic logic (adder, subtracter, and multiplier), as shown in Fig. 5.5.

To demonstrate the proposed methodology, the ALU is implemented with each of the three proposed approaches (see Fig. 2.3) as well as the conventional method where the wirelessly harvested AC signal is rectified, regulated, and used with conventional static CMOS. 45 nm technology with a nominal voltage of 1 V is used for each approach. These design characteristics are also summarized in Table 5.2 for each approach.

It is important to note that WP-ECRL and WP-PAL are inherently pipelined with a logic depth of 10 clock phases. Since ECRL operates with 4-phase powerclock signal whereas PAL operates with 2-phase power-clock, the latency for ECRL and PAL are, respectively, 10/4 = 2.5 and 10/2 = 5 clock cycles. Alternatively, WP-CEPAL and static CMOS based approaches require sequential circuits (flip-

|             | Transistor Number | Latency          | Operation    |
|-------------|-------------------|------------------|--------------|
| WP-ECRL     | 4158              | 2.5 clock cycles | 4-phase      |
| WP-PAL      | 4158              | 5 clock cycles   | 2-phase      |
| WP-CEPAL    | 9394              | 4 clock cycles   | quasi-static |
| Static CMOS | 6990              | 4 clock cycles   | static       |

Table 5.2: Characteristics of the 8-bit ALU for each of the proposed and conventional approaches.

flops) for synchronization, resulting in 4 pipelining stages. As such, the overall number of transistors in WP-CEPAL and static CMOS is higher than WP-ECRL and WP-PAL. WP-CEPAL requires the highest number of transistors since each gate in CEPAL requires four transistors in addition to the conventional pull-down and pull-up networks [see Fig. 4.5(c)]. WP-ECRL and WP-PAL require the least number of transistors since there are no flip-flops and the pull-up network in each gate only consists of two pMOS transistors [see Fig. 3.1(c) and Fig. 4.1(b)]. Also note that dual-rail encoding in adiabatic logic generates complementary output signals. As such, some arithmetic operations such as subtraction that requires the 2's complement can be built without introducing additional inverters.

In WP-ECRL and WP-PAL that do not have any flip-flops due to inherent pipelining, additional buffers are used to synchronize data paths with different logic depths. Specifically, buffers are inserted to those data paths with shorter logic depths to ensure that the outputs are synchronized with the same power-clock signal. This requirement adds significant overhead to WP-ECRL and WP-PAL as compared to WP-CEPAL and static CMOS logic. To partially mitigate this issue, multiple gates can be merged into a single complex gate, thereby reducing the logic depth of a data path, as depicted in Fig. 5.6 for a 1-bit full adder. In this exam-



Figure 5.6: Merging multiple gates into a single complex adiabatic gate to mitigate the overhead of additional buffers required for synchronization.

ple, output S (sum) takes two phases of the AC power-clock signal whereas output  $C_{out}$  (carry) takes three phases. Thus, an additional buffer would be required at the output of S to synchronize these two signals. Instead, these two functions can be merged into a single ECRL (or PAL) complex gate, as shown in Fig. 5.6.

## 5.4 Auxiliary Circuitry

#### 5.4.1 **RF-DC** Converter

RF-DC converter/regulator is considered as an auxiliary circuitry for the conventional approach, as illustrated in Fig. 5.7. Typical maximum power conversion efficiencies at low input power levels are in the range of 30 to 40% due to voltage drop across the diodes [25, 74, 75, 76]. To ensure a fair comparison between conventional and proposed approaches, the RF-DC converter/regulator is designed to



Figure 5.7: Circuit diagram of a low complexity RF-DC converter and regulator required for traditional approaches.



Figure 5.8: Simulated output waveforms of the wireless link, rectifier, and regulator. The final regulated output voltage is approximately 1 V, which is used to drive the conventional 8-bit ALU running at 13.56 MHz clock frequency.



Figure 5.9: An RLC model of LC phase shifter and RC load

achieve a power efficiency of 32.9% (39.7% for rectification stage and 95.2% for regulation stage). The harvested wireless signal as well as rectified and regulated signals are illustrated in Fig. 5.8. The regulated 1 V is used to power 8-bit ALU designed with conventional static CMOS method with the same clock frequency of 13.56 MHz.

#### 5.4.2 Phase Shifter Optimization

*LC* phase shifter is designed to provide four-phase power-clock signals for the operation of WP-ECRL computing blocks, as discussed in Chapter 3. To investigate the effect of load variation on the phase shifter, an *RLC* lumped model is derived, as shown in Fig. 5.9. The transfer function of the model in *s* domain can be expressed by

$$H(s) = \frac{V_{out}(s)}{V_{in}(s)} = \frac{\frac{1}{sC_0} \parallel R_0 \parallel (R_L + \frac{1}{sC_L})}{sL_0 + \frac{1}{sC_0} \parallel R_0 \parallel (R_L + \frac{1}{sC_L})}.$$
(5.1)

Based on the transfer function, the effect of varying load on phase shifter output is analyzed. Fig. 5.10(a) shows the relationship between the transfer function and load resistance. As the load resistance  $R_L$  increases, the magnitude and phase delay of the transfer function decreases. Also, it is found that larger load capacitance  $C_L$  leads to larger phase delay, which is depicted in Fig. 5.10(b). To mitigate the load variation and reduce power consumption, parallel resistance  $R_0$  plays an important role in the *RLC* model. As can be seen from Fig. 5.10(c), the magnitude and phase delay are reduced with decreasing parallel resistance. Smaller parallel resistance, however, causes larger current and consume more energy.

#### 5.4.3 More about Signal Shaper

The signal shaper (described in Chapter 4) has two distinct phases of operation. In the first phase, for a small fraction of the sine wave period, the transistor operates in the sub-threshold region and the source-bulk diode is forward biased. In the second phase, the transistor and the parasitic diodes are turned off. Referring to Fig. 4.2, if the transistor capacitors  $C_{GS}$  and  $C_{SB}$  are comparable in size to the load capacitance seen by the power-clock signal, the capacitive coupling current flowing through the capacitors  $C_{GS}$  and  $C_{SB}$  is much larger than the leakage current and the output voltage is a scaled version of the input voltage. The ratio of capacitors  $C_{GS}$ ,  $C_{SB}$  and load capacitance (which depend upon the size of the signal shaper and the number of driven transistors) determines the ratio of the input and output voltage in this region of operation. In the mean time, the size of the signal shaper also sets the overall DC level of the output voltage. The optimal size of the transistor leads to a sine wave-like signal at the output that is always greater than zero volt. To better understand the operation and the high power efficiency of the proposed signal shaper, the time domain waveforms of the currents of all four transistor terminals are illustrated in Fig. 5.11, along with the depiction of the evaluation and recovery phases.

The duty cycle and the DC level of the signal shaper output voltage are deter-



Figure 5.10: Analysis of LC phase shifting network. (a) Effect of load resistance on LC phase shifter, (b) Effect of load capacitance on LC phase shifter, (c) Effect of parallel resistance on LC phase shifter.



Figure 5.11: Waveforms of the signal shaper to better understand the high efficiency operation. From top to bottom: input and output voltages, overall current (at source node), drain current, bulk current, and gate current.

mined by the ratio of the sub-threshold current component and the leakage current component of the overall drain current during two phases of the transistor operation. The gate and bulk current components correspond to the capacitive coupling current through the transistor capacitors and dominate the overall current. Thus, the power efficiency of the signal shaper is maximized. Furthermore, since the coupling current is the primary conduction mechanism, bi-directional current flow across the signal shaper is enabled. This characteristic is critical to successfully recycle charge during the recovery phase (note the negative current during this phase).

Another approach to overcome the issue of harvested AC signal with a negative voltage component would be to produce a negative DC voltage through a peak detector (similar to the one proposed for WP-ECRL) and connect the bulk terminals of the nMOS devices to this negative voltage to ensure proper operation. In this way, harvested AC signal with negative voltage could be directly used without a signal shaper. According to simulation results, however, this approach significantly increases the leakage current from channel to drain due to band-to-band tunneling (BTBT) [77]. BTBT is exacerbated in this case due to gate-induced drain leakage (GIDL) since the gate-to-source node has negative voltage [78].

## 5.5 Simulation Results

The output waveform of one of the bits is provided for each approach in Fig. 5.12. Note that the WP-CEPAL output is similar to the output signal obtained from conventional static CMOS approach. There is, however, reduction in rail-to-rail voltage due to the diode-connected transistors.



Figure 5.12: Example output waveforms of the ALU for each of the proposed methods as well as the conventional approach.

#### 5.5.1 Adiabatic Logic RC Model

To optimize auxiliary circuitry including *LC* phase shifter, peak detector, and signal shaper, an equivalent lumped *RC* load model is extracted for each phase of power-clock signals. The model comprises of an equivalent capacitor for energy storage in series with a resistor for adiabatic energy loss [79]. Logic model parameters *R* and *C* can be extracted from simulation. For a specific frequency  $f_c$  and logic activity, the objective of the simulation is to calculate the power dissipation  $P_L$ , and the RMS current  $I_L$  supplied by the power-clock signals. Using  $P_L$  and  $I_L$ , the model parameters can be calculated as

$$R = \frac{P_L}{I_L^2},\tag{5.2}$$

$$C = \frac{\sqrt{2I_L}}{\pi V_{DD} f_c}.$$
(5.3)

These extraction results are summarized in Table 5.3. The *RC* values in the model for static logic highly depend upon the input data sequence or the switching activity. Alternatively, *RC* extraction results are almost independent of input data sequence in the case of charge-recycling logic. This behavior is due to the fact that a pair of cross-coupled transistors provide a constant load or symmetrical topology for the power-clock signal, meaning either of two complementary branches conduct current for each clock cycle.

| ECRL            |                         |                    |              |              |
|-----------------|-------------------------|--------------------|--------------|--------------|
| Power Clocks    | Resistance(k $\Omega$ ) | Capacitance $(fF)$ | $I_L(\mu A)$ | $P_L(\mu W)$ |
| Phase 1         | 4.15                    | 168.15             | 10.13        | 0.43         |
| Phase 2         | 5.87                    | 155.20             | 9.35         | 0.51         |
| Phase 3         | 1.84                    | 206.16             | 12.42        | 0.28         |
| Phase 4         | 4.74                    | 127.38             | 7.67         | 0.28         |
| PAL             |                         |                    |              |              |
| Power Clocks    | Resistance(k $\Omega$ ) | Capacitance $(fF)$ | $I_L(\mu A)$ | $P_L(\mu W)$ |
| Phase 1         | 1.47                    | 501.28             | 15.10        | 0.33         |
| Phase 2         | 1.04                    | 502.28             | 15.13        | 0.23         |
| CEPAL           |                         |                    |              |              |
| Power Clocks    | Resistance(k $\Omega$ ) | Capacitance $(fF)$ | $I_L(\mu A)$ | $P_L(\mu W)$ |
| Phase 1         | 13.52                   | 443.19             | 13.35        | 2.41         |
| Phase 2         | 13.35                   | 443.52             | 13.36        | 2.38         |
| Static CMOS     |                         |                    |              |              |
| Power Supply    | Resistance(k $\Omega$ ) | Capacitance $(fF)$ | $I_L(\mu A)$ | $P_L(\mu W)$ |
| V <sub>DD</sub> | 0.35                    | 7213.82            | 217.3        | 16.62        |

Table 5.3: The variation of load characteristics  $R_L$  and  $R_C$  during different phases for each of the proposed methods and conventional approach.



Figure 5.13: The effect of phase difference deviation on the power consumption of WP-ECRL.

#### 5.5.2 Phase Tolerance

Since the desired phase difference may be affected by process and/or environmental variations, a preliminary study is performed to investigate the robustness of the proposed method to deviations in phase difference among multiple power-clock signals. The results, as shown in Fig. 5.13, demonstrate that WP-ECRL exhibits a relatively robust behavior with respect to changes in the phase difference deviation. For example, the overall power is depicted in Fig. 5.13 as a function of phase difference for WP-ECRL.

The power is minimum when the phase difference is ideal at  $90^{\circ}$ . When the phase difference deviates from the ideal point, the power consumption slowly increases. Once a certain threshold (approximately  $60-120^{\circ}$ ) is exceeded, the increase in the power consumption becomes faster. Thus, WP-ECRL charge-recycling mech-

|                | $P_R/\mu W$ | $P_{overhead}/\mu W$ | $P_{logic}/\mu W$ |
|----------------|-------------|----------------------|-------------------|
| WP-ECRL        | 17.49       | 14.28                | 3.10              |
| WP-PAL         | 2.94        | 2.542                | 0.40              |
| WP-CEPAL       | 5.70        | 2.15                 | 3.55              |
| WP-Static CMOS | 47.64       | 31.99                | 15.65             |

Table 5.4: Comparison of received power at the wireless link, overhead and logic power consumption for each of the proposed methods and conventional approach.

anism can tolerate a phase difference deviation of approximately  $30^{\circ}$  (power increases by only 16%). Despite the increase in power consumption, the phase difference deviations do not affect the functionality/accuracy of the computational unit for WP-ECRL, even in the extreme case of  $50^{\circ}$  phase deviation. Alternatively, for WP-PAL and WP-CEPAL, a relatively small deviation affects the correct operation. For example, when the phase difference is  $210^{\circ}$  ( $30^{\circ}$  deviation), the logic fails.

#### **5.5.3** Power Evaluation

The average power consumed by the proposed approaches is compared with the conventional approach in Table 5.4. The  $P_R$  in the table refers to the received power by the wireless link.  $P_{Overhead}$  and  $P_{Logic}$  refer, respectively, to the power consumed by the auxiliary circuits in each approach and the logic power consumption (8-bit ALU described in Section 5.3).

As listed in this table, up to  $16.2 \times$  reduction in overall power consumption is achieved by the proposed methodology (WP-PAL) as compared to conventional static CMOS that has a rectifier and regulator. The overhead power is the highest for the conventional case (approximately twice the logic power) due to relatively inefficient AC-to-DC conversion process where the input power levels are in the micro
watt range. The overhead power in WP-ECRL is also relatively high compared to WP-PAL and WP-CEPAL due to the power consumed by two phase shifters. Alternatively, the signal shaper and peak detector are highly efficient, minimizing the overhead power for WP-PAL and WP-CEPAL. The logic power consumption in WP-PAL is only 0.4  $\mu$ W (approximately 40× less than static CMOS) due to the ability to fully recycle charge.

#### 5.5.4 Effect of Circuit Size

To investigate the dependence of both overhead and processing power on circuit size, the power consumed by 4-, 8-, and 16-bit adder is plotted for each approach in Fig. 5.14(a) (overhead power) and Fig. 5.14(b) (processing power). The overhead power in static CMOS based approach increases with circuit size since the rectification and regulation stages consume more power when the load circuit is larger. Alternatively, in the proposed approaches, overhead power consumption is relatively independent of the circuit size. For processing power, WP-PAL exhibits the slowest increase with respect to circuit size since charge is fully recycled. Alternatively, the traditional approach exhibits the fastest increase.

## 5.5.5 Effect of Lower Operating Voltages

Finally, the behavior of the proposed method is investigated at lower operating voltages with different corner cases. This investigation is important for unreliable wireless power sources (such as ambient wireless power) where the harvested voltage can vary. The power consumption of core logic at various voltages is listed in Table 5.5 for each approach under nominal operating conditions. According to



Figure 5.14: Power scaling with transistor numbers for each of the methods, (a) dependence of overhead power on circuit size, (b) dependence of processing power on circuit size.

| $V_A/V$ | $P_{ECRL}/\mu W$ | $P_{PAL}/\mu W$ | $P_{CEPAL}/\mu W$ | $P_{Static}/\mu W$ |
|---------|------------------|-----------------|-------------------|--------------------|
| 1.2     | 2.46             | 1.17            | 7.85              | 35.32              |
| 1.1     | 1.83             | 0.77            | 6.31              | 23.16              |
| 1.0     | 1.51             | 0.59            | 4.92              | 16.55              |
| 0.9     | 1.27             | 0.46            | Fail              | 12.18              |
| 0.8     | 1.07             | 0.40            | Fail              | 8.86               |
| 0.5     | 0.63             | Fail            | Fail              | 3.09               |

Table 5.5: Effect of voltage scaling on power consumption and performance under nominal operating conditions.

this table, WP-ECRL and conventional approach can operate at the lowest supply voltage of 0.5 V where WP-PAL and WP-CEPAL fail. The minimum operating voltage for WP-PAL and WP-CEPAL is higher due to the reduction in the voltage headroom, as illustrated by the waveforms in Fig. 5.12. The effect of scaled voltages is investigated at slow process corner to consider process variations, as listed in Table 5.6. WP-ECRL continues to operate correctly and WP-PAL fails at 0.5 V whereas WP-CEPAL fails at 1 V (contrary to nominal operating points where the failure occurs at 0.9 V). Finally, in Table 5.7, the slow process corners are combined with a high operating temperature of 127°C. In this case, WP-PAL fails at all operating voltages, primarily due to degraded logic-low values. These simulations demonstrate the robustness of WP-ECRL approach as compared to WP-PAL and WP-CEPAL.

## 5.6 Design Tradeoffs

Leveraging charge-recycling operation for wirelessly powered devices such as RFID tags and wireless sensor nodes can achieve significant reduction in power

| $V_A/V$ | $P_{ECRL}/\mu W$ | $P_{PAL}/\mu W$ | $P_{CEPAL}/\mu W$ | $P_{Static}/\mu W$ |
|---------|------------------|-----------------|-------------------|--------------------|
| 1.2     | 2.17             | 0.87            | 6.91              | 27.28              |
| 1.1     | 1.66             | 0.64            | 5.48              | 19.05              |
| 1.0     | 1.42             | 0.52            | Fail              | 14.2               |
| 0.9     | 1.23             | 0.45            | Fail              | 10.74              |
| 0.8     | 1.07             | 0.41            | Fail              | 8.01               |
| 0.5     | 0.67             | Fail            | Fail              | 2.93               |

Table 5.6: Effect of voltage scaling on power consumption and performance under slow process corners and nominal temperature of  $27^{\circ}C$ .

| $V_A/V$ | $P_{ECRL}/\mu W$ | $P_{PAL}/\mu W$ | $P_{CEPAL}/\mu W$ | $P_{Static}/\mu W$ |
|---------|------------------|-----------------|-------------------|--------------------|
| 1.2     | 3.03             | Fail            | 9.87              | 36.82              |
| 1.1     | 2.43             | Fail            | 8.02              | 25.99              |
| 1.0     | 2.03             | Fail            | Fail              | 19.4               |
| 0.9     | 1.72             | Fail            | Fail              | 14.64              |
| 0.8     | 1.46             | Fail            | Fail              | 11.01              |
| 0.5     | 0.86             | Fail            | Fail              | 4.07               |

Table 5.7: Effect of voltage scaling on power consumption and performance under slow process corners and high temperature of 127°C.

|                           | WP-ECRL                           | WP-PAL                                  | WP-CEPAL                     | Conventional            |
|---------------------------|-----------------------------------|-----------------------------------------|------------------------------|-------------------------|
| Power supply              | 4-phase AC power-clocks           | 2-phase AC power-clocks                 | 2-phase AC power-clocks      | DC power                |
| Complementary input       | Yes                               | Yes                                     | No                           | No                      |
| Output swing              | Full swing                        | $V_{OH} = V_{DD}, V_{OL} = 1/4 V_{tp} $ | Half swing                   | Full swing              |
| Output floating           | One side floating during recovery | One side floating                       | Partial floating             | No                      |
| Phase shifter             | Required                          | Not required                            | Not required                 | Not required            |
| Signal shaper             | Not required                      | Required                                | Required                     | Not required            |
| Peak detector             | Required                          | Not required                            | Required                     | Not required            |
| Rectifier and regulator   | Not required                      | Not required                            | Not required                 | Required                |
| Lowest supply voltage     | <0.5V                             | 0.8V                                    | 0.95V                        | 0.5V                    |
| Pipelining                | Inherently pipelined              | Inherently pipelined                    | Flip-flops are required      | Flip-flops are required |
| Tahle 5 8. Onalitative co | mmarison of the proposed metho    | ds and conventional annroad             | th listing advantages and li | imitations              |

| ~                                                       |
|---------------------------------------------------------|
|                                                         |
| 2                                                       |
| Ξ                                                       |
| 0                                                       |
| ·=                                                      |
| a                                                       |
| ι, μ                                                    |
| .=                                                      |
| Я                                                       |
| .=                                                      |
| 1                                                       |
|                                                         |
| $\nabla$                                                |
| u                                                       |
| a                                                       |
|                                                         |
| \$                                                      |
| O)                                                      |
| 50                                                      |
| 2                                                       |
| 12                                                      |
| Ξ                                                       |
| =                                                       |
| 50                                                      |
|                                                         |
| P                                                       |
| b                                                       |
|                                                         |
| ວມ                                                      |
| u                                                       |
| ·=                                                      |
| ž                                                       |
|                                                         |
| 1                                                       |
| <u>_</u>                                                |
| 4                                                       |
| $\mathbf{O}$                                            |
| Ġ                                                       |
| õ                                                       |
| Ľ                                                       |
| 5                                                       |
| Ħ                                                       |
| ÷,                                                      |
| b                                                       |
| _                                                       |
| E                                                       |
| 1;                                                      |
| Ξ                                                       |
| 0                                                       |
| ·=                                                      |
| Ξ                                                       |
| 5                                                       |
| Ψ.                                                      |
| ~                                                       |
| ц                                                       |
| 0                                                       |
| - <b>S</b>                                              |
| _                                                       |
| P                                                       |
| u                                                       |
| 3                                                       |
| ~~                                                      |
|                                                         |
| S                                                       |
| ds                                                      |
| spc                                                     |
| spou                                                    |
| chods                                                   |
| ethods                                                  |
| lethods                                                 |
| nethods                                                 |
| methods                                                 |
| d methods                                               |
| ed methods                                              |
| sed methods                                             |
| sed methods                                             |
| osed methods                                            |
| posed methods                                           |
| oposed methods                                          |
| roposed methods                                         |
| proposed methods                                        |
| proposed methods                                        |
| e proposed methods                                      |
| he proposed methods                                     |
| the proposed methods                                    |
| f the proposed methods                                  |
| of the proposed methods                                 |
| of the proposed methods                                 |
| 1 of the proposed methods                               |
| m of the proposed methods                               |
| on of the proposed methods                              |
| ison of the proposed methods                            |
| rison of the proposed methods                           |
| arison of the proposed methods                          |
| parison of the proposed methods                         |
| nparison of the proposed methods                        |
| mparison of the proposed methods                        |
| imparison of the proposed methods                       |
| comparison of the proposed methods                      |
| comparison of the proposed methods                      |
| e comparison of the proposed methods                    |
| 'e comparison of the proposed methods                   |
| ive comparison of the proposed methods                  |
| tive comparison of the proposed methods                 |
| ative comparison of the proposed methods                |
| tative comparison of the proposed methods               |
| litative comparison of the proposed methods             |
| ilitative comparison of the proposed methods            |
| alitative comparison of the proposed methods            |
| ualitative comparison of the proposed methods           |
| Qualitative comparison of the proposed methods          |
| Qualitative comparison of the proposed methods          |
| : Qualitative comparison of the proposed methods        |
| 8: Qualitative comparison of the proposed methods       |
| .8: Qualitative comparison of the proposed methods      |
| 5.8: Qualitative comparison of the proposed methods     |
| 5.8: Qualitative comparison of the proposed methods     |
| e 5.8: Qualitative comparison of the proposed methods   |
| hle 5.8: Qualitative comparison of the proposed methods |
| ble 5.8: Qualitative comparison of the proposed methods |

consumption, as demonstrated in the previous section. The tradeoffs related with the three proposed implementations are discussed in this section, as summarized in Table 5.8.

WP-ECRL approach can operate at lower voltages compared to WP-PAL and WP-CEPAL. The operation is also relatively more robust due to full swing output signals. WP-ECRL, however, requires two phase shifters due to 4-phase AC power supply. The phase shifter consumes more power than the auxiliary circuitry required for WP-PAL and WP-CEPAL. However, if the data processing block is sufficiently large, this overhead power can be a small portion of the overall power consumption. Also note that the phase shifter potentially consists of off-chip passive devices, depending upon the required inductor and capacitor. Relatively reliable and robust operation at low voltages makes WP-ECRL an appropriate candidate for applications that rely on ambient wireless energy.

WP-PAL exhibits the least overall power consumption with a slight degradation at the output voltage swing. Furthermore, this approach relies on 2-phase AC power supply where the phase shifter is not required. Due to 2-phase operation, however, adjacent logic gates recover and evaluate at the same time, making synchronization more sensitive to phase deviations between the AC power supplies. As an important limitation, WP-PAL cannot reliably operate at voltages less than 0.8 V. Thus, this approach is relatively more appropriate for applications with dedicated wireless power source such as RFIDs and inductively coupled implantable devices.

Finally, WP-CEPAL is similar to static CMOS in terms of design and operation and therefore is an appropriate approach for larger-scale IoT devices where cellbased design and automation is critical. This approach, however, suffers the most from reduced voltage swing due to diode-connected transistors and is not inherently pipelined, unlike WP-ECRL and WP-PAL. Thus, this method consumes the highest number of transistors due to requirement for sequential cells, complete pull-up networks, and diode-connected transistors.

## 5.7 Summary

An inductive coupling based wireless link and an 8-bit ALU are developed in each of the proposed methods. The energy efficiency of the auxiliary circuitry introduced for each method is characterized by quantifying the overhead power. Simulation results demonstrate significant reduction (up to  $16.2\times$ ) in overall power consumption as compared to the conventional method that relies on RF-DC conversion and static CMOS based computation. Finally, some important design considerations and related tradeoffs for each of the proposed method are discussed.

## Chapter 6

# AC Powered Digital Core for Lightweight Encryption

Security is a significant challenge for a variety of emerging applications within pervasive computing such as the deployment of IoT devices at a massive scale. SI-MON, a lightweight cryptographic algorithm, is a promising candidate for encryption in a resource-constrained environment. A low power hardware implementation of a SIMON block cipher is developed in this chapter by applying the proposed AC computing methodology [80]. The proposed hardware-level innovations enable a higher energy efficiency (kilobit per second per Watt) at the expense of slightly less throughput as compared to conventional implementation. The rest of the chapter is organized as follows. The background of SIMON block cipher is provided in Section 6.1. The proposed architecture for AC computing-based implementation is presented in Section 6.2. In Section 6.3, the results of the schematic-level simulations are discussed. Post-layout simulation results and implementation challenges are investigated in Section 6.4. Finally, the design details of the test chip are provided in Section 6.5.

## 6.1 SIMON Block Cipher

SIMON is a Feistel network based lightweight block cipher published by NSA, targeting highly resource-constrained applications [81]. It provides a flexible level of security in ten configurations optimized for different block size 2n and key size mn, where n is the word size and m is the number of keys [82]. This thesis is focused on SIMON32/64, which encrypts 32-bit plaintext with a 64-bit key in 32 rounds (m = 4, n = 16).

### 6.1.1 Round Function

The basic operation of the round function for all configurations of SIMON is depicted in Fig. 6.1. The memory element is split into two equal-sized word blocks, denoted by  $X_{Left}$  and  $X_{Right}$ , respectively. These two word blocks hold the initial input plaintext and the output ciphertext after each encryption round. The round function is constructed by bitwise AND, bitwise XOR, and circular shift operations. In each round,  $X_{Left}$  performs the circular shift and bitwise boolean operations to compute the new ciphertext, which is written back to the same memory elements. Simultaneously, the current bits in  $X_{Left}$  are transferred to  $X_{Right}$ . After a certain number of rounds, the repeated operation ends to generate the final ciphertext with a desired level of security.



Figure 6.1: Structure of a SIMON round function.



Figure 6.2: Structure of a SIMON key expansion function for m=4.

#### 6.1.2 Key Expansion

SIMON block cipher encrypts information in each round with a unique key generated by the key expansion module. Unlike the round function, the key scheduling configurations slightly vary depending upon the number of key words m, which can be 2, 3, or 4. In this thesis, the key expansion of SIMON32/64 has the configuration with m = 4, as illustrated in Fig. 6.2.  $K_i$  in the figure holds the key for the current round. The recently generated key is written back to the uppermost key block  $K_{i+3}$ , and all keywords are shifted one block right. Also, the SIMON key expansion employs a sequence of single-bit round constants  $z_i$  (see Fig. 6.2) to eliminate slide properties and circular shift symmetries, thereby introducing randomness [81].

#### 6.1.3 Bit-Serial Architecture

There exists several parallelism dimensions (bit level, round level, and encryption level) which affects the area, power and throughput of the hardware design [83]. Low-cost hardware architecture fits the need in IoT wireless devices used in resourceconstrained application. As a result, the lowest parallelism level of one bit of one round of one encryption engine, also know as the bit-serial architecture [84], is adopted in the thesis to realize the circuit implementation with adiabatic logic.

In existing FIFO-based bit-serial SIMON architectures, both the key expansion and round functions have two phases: compute and transfer [83, 84]. During the compute phase, necessary bits are fetched from the current state, and the resulting bits of next state are written back to the same memory block after performing the encryption operations. Simultaneously, the transfer phase copies the contents of the left word blocks into the right word block for the next state.

## 6.2 Proposed SIMON Hardware Architecture for AC Computing

Since adiabatic logic is inherently pipelined, additional clock phases are introduced within combinational logic. To guarantee proper functionality, the conventional SIMON block cipher architecture should be modified, as illustrated in Figs. 6.3 and 6.4 for, respectively, round and key expansion functions. The dashedline boxes denote the modifications/additions in the proposed adiabatic architecture, as further described below.

#### 6.2.1 Adiabatic Registers

The FIFO-based bit-serial implementation uses conventional registers as the memory elements. Due to the multi-phase operation of selected adiabatic logic, a certain number of inverters are cascaded to realize the function of registers for data synchronization. In the case of ECRL and PAL implementation, each register consists of four and two inverters, respectively. An *enable* signal can deactivate the register when the input data should not be latched.

#### 6.2.2 Merged Blocks

The second modification is merging the multiplexers with the FIFO blocks, referred to as merged blocks in Figs. 6.3 and 6.4, to ensure that the operation is completed in one clock cycle. Assume that the round function is running the first round in Fig. 6.3. The output of  $FIFO_1$  is an input for the 4-to-1 multiplexer. Shift register up (SRU) and  $FIFO_1$  store the  $X_{Left}$  16-bit word block in the current state.







When the MSB of  $X_{Left}$  is shifted right by one bit, the LSB in *FIFO\_1* should be ready for the computation of the next bit. To achieve this and maintain the consecutiveness of bitwise computation, multiplexer is merged with the first register of FIFO. Otherwise, the LSB in *FIFO\_1* would only arrive to the output of the multiplexer since an adiabatic multiplexer introduces one clock phase.

#### 6.2.3 Compute and Transfer Paths

In the conventional architecture, a set of four flip-flops, labeled as LUT\_FF [84], is used at the output of key expansion for storing and appending the least significant four bits into the most significant four bits without any conflict. Thus, the  $FIFO_3$  can store the output bits of key expansion, only after the first four clock cycles. Alternatively, the adiabatic operation automatically introduces additional clock phases due to combinational logic within key expansion. Thus, the output bits are automatically buffered, as illustrated in Fig. 6.4. As such, the need to activate/deactivate LUT\_FF for storing and appending the least significant four bits into the most significant four bits is eliminated. Thus, the key expansion block is specifically designed with a logic depth of 4 clock cycles in adiabatic logic. The logic depth is determined by the largest number of circular shift bits, which is 4 in SIMON key scheduling. As a result, the adiabatic compute path produces a conflict along the transfer path. It takes 20 cycles to generate the new round key, but it only takes 16 cycles to transfer. Thus, a set of 4 adiabatic registers with a multiplexer, depicted in Fig. 6.3, are added as *balanced transfer path*. The same technique is used for the key expansion, as shown in Fig. 6.4.

| Architecture           | Conventional | Propo | osed |
|------------------------|--------------|-------|------|
| Logic                  | Static Logic | ECRL  | PAL  |
| Average Power (µW)     | 9.12         | 0.91  | 0.27 |
| Latency (Clock Cycles) | 576          | 704   | 704  |
| Energy (pJ)            | 387          | 47    | 14   |
| Throughput (Kbps)      | 753          | 616   | 616  |
| Efficiency (Kb/sec/µW) | 83           | 677   | 2281 |
| Transistor (#)         | 2966         | 2258  | 1242 |

Table 6.1: Performance of the bit-serialized SIMON32/64 cipher implemented in proposed and conventional approaches.

## 6.3 Schematic-level Simulation Results

To verify the correct operation, a software implementation of SIMON32/64 is also developed. The test vectors consist of initial keys 16'h 1918 1110 0908 0100 and plaintext 8'h 6565 6877. The correct output bit sequence of 8'h c69b e9bb is obtained in both adiabatic (ECRL and PAL) and conventional static CMOS based implementations. The corresponding simulated output waveform for each implementation are shown in Fig. 6.5, demonstrating the correct encryption operation.

The simulation results comparing the proposed implementation with the conventional approach are listed in Table 6.1 where average power, latency, energy to encrypt 32-bit plaintext, throughput, energy efficiency (kb/sec/ $\mu$ W), and number of transistors are listed. Note that all of the transistors in each implementation have minimum size. According to these results, the energy of the encryption operation is reduced by up to 27.6 times at the expense of 1.2 times reduction in throughput. The average power consumption is reduced by up to 34 times. Furthermore, the overall number of transistors is reduced by up to 2.4 times. Note that if the pro-



Figure 6.5: Simulated output waveform of the SIMON32/64 cipher blocks in each approach, demonstrating functional verification.

cess of DC-to-AC conversion (required to produce power-clock signals in adiabatic logic) is considered, the energy efficiency can still be improved by up to 16.3 times (assuming a conversion efficiency of 41% [35]).

## 6.4 Post-Layout Simulation Results

In this section, the details of physical implementation are discussed and simulation results are analyzed.

Figure 6.6: Layout view of ECRL-based SIMON32/64.

| 85, 72    |  |         |   |  |  |  |  |
|-----------|--|---------|---|--|--|--|--|
| 80 47.455 |  | <u></u> |   |  |  |  |  |
|           |  |         |   |  |  |  |  |
| 70        |  |         |   |  |  |  |  |
| 50 65     |  |         |   |  |  |  |  |
| 55 6      |  |         |   |  |  |  |  |
|           |  |         |   |  |  |  |  |
| 45        |  |         |   |  |  |  |  |
| 40        |  |         |   |  |  |  |  |
| 30 35     |  |         |   |  |  |  |  |
|           |  |         |   |  |  |  |  |
|           |  |         |   |  |  |  |  |
|           |  |         |   |  |  |  |  |
|           |  |         |   |  |  |  |  |
|           |  |         | • |  |  |  |  |

Figure 6.7: Layout view of static CMOS-based SIMON32/64.









(c)

Figure 6.8: Layout views of multipliers implemented in different approaches: (a) static CMOS, (b) ECRL, and (c) PAL.

| DUT Name          | Width (µm) | Height (µm) | Area ( $\mu m^2$ ) |
|-------------------|------------|-------------|--------------------|
| Static SIMON32/64 | 85.7       | 47.5        | 4070.75            |
| ECRL SIMON32/64   | 79.4       | 52.4        | 4160.56            |
| Static MULT4      | 44.4       | 28.7        | 1274.28            |
| ECRL MULT4        | 46.0       | 38.8        | 1784.80            |
| PAL MULT4         | 43.7       | 38.3        | 1673.71            |

Table 6.2: Summary of the physical area consumed by digital blocks. MULT4 refers to 4-bit multiplier.

### 6.4.1 Physical Implementations

Physical layouts of SIMON32/64 cipher are drawn for ECRL-based and static CMOS-based approaches, as shown in Fig. 6.6 and Fig. 6.7, respectively. A 4-bit multiplier is also physically implemented in conventional, WP-ECRL, and WP-PAL approaches, as depicted, respectively, in Fig. 6.8(a), Fig. 6.8(b), and Fig. 6.8(c).

Table 6.2 summarizes the area consumed by the implemented designs. The physical area in the proposed methods increases slightly compared to those in the conventional methods due to the special routing topology of the cross-coupled structure and the distribution of multi-phase power-clock signals.

#### 6.4.2 Impact of Physical Layout on Power Consumption

Table 6.3 lists the power consumption of standard cells built via different circuit frameworks (static, ECRL, and PAL). Power values obtained from both the schematic-level and post-layout simulations are reported. For each cell, the number before the slash represents the power obtained by the schematic-level simulation whereas the number after the slash represents the power obtained by the post-layout simulation. In the ECRL approach, the power increases by approximately  $3 \times$  when layout-level parasitic impedances are considered. The power increase in a PAL-

| Circuit Topology | Conventional               | Propo      | osed      |
|------------------|----------------------------|------------|-----------|
| Cell Type        | Static Logic ( <i>nW</i> ) | ECRL (nW)  | PAL (nW)  |
| INV              | 8.21/9.33                  | 2.68/7.37  | 0.20/2.25 |
| NOR2             | 7.43/9.09                  | 3.23/7.90  | n/a       |
| OR2              | 13.20/18.77                | 3.23/7.90  | n/a       |
| XOR2             | 34.60/62.99                | 7.46/16.24 | n/a       |
| XNOR2            | 32.60/59.31                | 7.46/16.24 | n/a       |
| AND2             | 12.68/18.31                | 2.95/7.28  | n/a       |
| NAND2            | 6.88/8.63                  | 2.95/7.28  | n/a       |
| MUX2             | 23.99/41.64                | 5.92/11.69 | n/a       |
| DFF              | 177.4/411.3                | 12.3/48.78 | n/a       |

Table 6.3: Comparison of schematic-level and post-layout power consumption of standard cells implemented in conventional and proposed.

based inverter is approximately 10 times. Alternatively, conventional approach exhibits a relatively smaller increase in power when layout-level parasitic impedances are considered. Due to this difference, the achieved power savings are considerably reduced.

This increase in power consumption at the post-layout level simulation is due to the extracted parasitic RC impedances of the interconnects within a cell. The details can be illustrated by the following example where the extracted RC tree is analyzed for an inverter cell. Fig. 6.9, Fig. 6.10, and Fig. 6.11 show the layout views as well as extracted RC tree of inverters built, respectively, in ECRL, PAL, and static-CMOS methods. The RC tree contains the parasitic resistance and capacitance on each metal segment within the cell. In the cases of ECRL and PAL, both output branches share the cross-coupled structure, which introduces additional metal routing. It is important to note that the dependence of power consumption on load capacitance in adiabatic logic differs from that in static CMOS logic. As indicated by (2.3) and (2.4), the power consumption of ECRL and PAL is proportional

| DUT Name                                                                | $P_{SCH}(\mu W)$ | $P_{POST}(\mu W)$ | $P_{POST}/P_{SCH}$ |  |  |
|-------------------------------------------------------------------------|------------------|-------------------|--------------------|--|--|
| Static SIMON32/64                                                       | 1.08             | 5.55              | 5.13               |  |  |
| ECRL SIMON32/64                                                         | 9.06             | 22.83             | 2.52               |  |  |
| Static MULT4                                                            | 1.08             | 3.34              | 3.10               |  |  |
| ECRL MULT4                                                              | 0.41             | 3.95              | 9.63               |  |  |
| PAL MULT4                                                               | 0.22             | 2.4               | 10.90              |  |  |
| Note: $P_{SCH}$ = schematic-level power, $P_{POST}$ = post-layout power |                  |                   |                    |  |  |

Table 6.4: Summary of the power consumed by digital blocks.

to  $C^2$ , whereas the power consumption of static CMOS logic is proportional to *C*. Thus, the effect of parasitic capacitance would be more pronounced for adiabatic logic as compared to conventional static CMOS logic. According to this analysis, the overall parasitic capacitances are  $C_{tot}$ =0.392 fF for the ECRL method,  $C_{tot}$ =0.52 fF for the PAL method, and  $C_{tot}$ =0.176 fF for static-CMOS method. Alternatively, the load capacitance in the schematic-level simulation is only due to devices, which is approximately 0.1 fF. Thus, 4 to 9× power increase is expected for the proposed methodologies for a single cell. The impact of parasitic *RC* impedances on power consumption could be more pronounced at the block-level with more complex interconnect routing.

### 6.4.3 Block-Level Post-Layout Power Consumption Results

Table 6.4 compares the power consumption of the SIMON32/64 and multiplier blocks obtained by schematic-level and post-layout simulations. As discussed in the previous section, larger power increase is observed in both ECRL and PAL approaches as compared to conventional static CMOS. To further illustrate the different effect of interconnect capacitances for proposed and conventional approaches, the current profiles of each circuit (both SIMON32/64 and multiplier) are individ-



(b)

Figure 6.9: ECRL-based inverter: (a) physical layout, (b) extracted RC network.



(a)



Figure 6.10: PAL-based inverter: (a) physical layout, (b) extracted *RC* network.





Figure 6.11: Static CMOS-based inverter: (a) physical layout, (b) extracted *RC* network.



Figure 6.12: Current profile of static CMOS-based SIMON32/64 cipher.

ually shown in Fig. 6.12, Fig. 6.13, Fig. 6.15, Fig. 6.16, and Fig. 6.14. Current profiles of both ECRL and PAL based methods have a repeated pattern, are independent of data flow, and maintain the same waveform shape (only the peak amplitudes change) in post-layout simulations. Alternatively, the current profile of static CMOS-based method shows a different waveform with different peak positions and amplitudes in post-layout simulations.

## 6.5 Test Chip Overview

To experimentally evaluate the proposed AC computing methodology, an application specific integrated circuit (ASIC) with 2 mm  $\times$  2 mm size was fabricated in



Figure 6.13: Current profile of ECRL-based SIMON32/64 cipher. Current profile for each power-clock waveform (4-phase) is shown.



Figure 6.14: Current profile of static CMOS-based 4-bit multiplier.



Figure 6.15: Current profile of ECRL-based 4-bit multiplier. Current profile for each power-clock waveform (4-phase) is shown.



Figure 6.16: Current profile of PAL-based 4-bit multiplier. Current profile for each power-clock waveform (2-phase) is shown.

65 nm CMOS technology. The top-level layout of the chip is shown in Fig. 6.17. Additionally, the microscope photo of the die is illustrated in Fig. 6.18. The test chip consists of core logic using 1.2 V supply voltage and I/O logic at 2.5 V. More details are described in the following sections.

## 6.5.1 Core Circuit Design

The test chip has two types of digital cores: SIMON32/64 cores for lightweight encryption and 4-bit multipliers (partial-product accumulation) for more general arithmetic operation. Furthermore, the related auxiliary circuits are implemented to interface the logic with wirelessly harvested power signals. Table 6.5 lists all of the primary blocks that are implemented in the fabricated test chip.

## 6.5.2 I/O Circuit Design

The I/O circuits in this test chip can drive capacitance of 3 to 5 pF and provide a bandwidth of approximately 200 MHz. Two types of custom I/O circuits exist in the test chip: analog buffers (unity-gain amplifiers) to observe the output signal waveform of the adiabatic logic and digital buffers to read/write the data. Level-shifting techniques are also utilized since I/O circuits have a different voltage domain than the core logic. Finally, protection techniques are used to guarantee the functionality and reliability such as guard rings to prevent latch-up and current-limiting resistors/diode clamps to avoid electrostatic discharge (ESD).



Figure 6.17: Top-level layout view of the test chip.



Figure 6.18: Microscope photo of the entire die.

| DUT # | Function       | DUT Name          | Туре             |
|-------|----------------|-------------------|------------------|
| 1     | Encryption     | Static SIMON32/64 | Digital          |
| 2     | Encryption     | ECRL SIMON32/64   | Digital          |
| 3     | Multiplication | Static MULT4      | Digital          |
| 4     | Multiplication | ECRL MULT4        | Digital          |
| 5     | Multiplication | PAL MULT4         | Digital          |
| 6     | Multiplication | PAL MULT4 Copy    | Digital          |
| 7     | Conversion     | RFDC Converter    | Analog/Auxiliary |
| 8     | Rectification  | Peak Detector     | Analog/Auxiliary |
| 9     | Shaping        | Signal Shaper     | Analog/Auxiliary |

Table 6.5: List of circuit blocks that are implemented in the fabricated ASIC chip.

## Chapter 7

## **Conclusion and Future Directions**

An alternative computing paradigm is explored in this thesis with application to wirelessly powered IoT devices. The proposed approach has the potential to significantly reduce the energy cost, one of the primary barriers that slows down the global scalability of IoT devices. The contributions of this work are summarized in Section 7.1. Several possible future directions are discussed in Section 7.2.

## 7.1 Thesis Summary

A novel AC computing methodology has been proposed for wirelessly powered devices that typically suffer from the low computational resources. By leveraging the existing charge-recycling and adiabatic principles, this method directly uses the harvested AC signal to power the digital logic, thus eliminating the inefficient rectification and regulation stages. Based on the proposed method, three implementation frameworks are developed while introducing several auxiliary circuits to ensure accurate operation with wireless power harvesting. An AC powered 8-bit ALU for

brain implantable devices is developed in each of the proposed approaches as well as the conventional approach to quantify the advantages. The simulation results demonstrate considerable power savings. Furthermore, a lightweight encryption algorithm, SIMON, has been implemented in AC computing-based hardware to reduce the energy cost of encryption in resource-constrained devices. Finally, a test chip is fabricated in 65 nm CMOS technology to experimentally characterize the proposed AC computing methodology.

## 7.2 Future Work and Directions

## 7.2.1 Integration of Sensing, Communication and Power Management

Most of the IoT devices require not only digital blocks for data processing, but also analog blocks for sensing and communication. An interesting future direction is to develop a system-level power management methodology to split the harvested energy into DC and AC paths [85]. The efficient on-site AC processing can extract meaningful information and potentially reduce the amount of data (and therefore power) that should be transmitted. The DC path will supply the power for sensing environmental information and ultra-low power communication [86, 87]. Another potential block in the hybrid system would be a read-out circuit. As shown in the aforementioned waveforms, the output signals are in the form of pulse-like or sinusoidal-like voltage. To be compatible with existing communication or DSP blocks, a read-out circuit would be introduced to transform the original signals into square-wave.

## 7.2.2 AC Computing at Higher Frequencies

An important future direction is to increase the frequency of the wireless signal to the UHF band, up to gigahertz range. This step is important to increase the wireless power transfer range and be able to consider high frequency wireless energy sources. Unlike static CMOS where energy to charge/discharge a capacitance does not depend upon transition time, in charge-recycling operation, energy is inversely proportional with the transition period of the wireless signal. Thus, charge-recycling circuits save more power at lower frequencies. Specifically, for charge-recycling operation to outperform static CMOS, the transition time *T* should satisfy

$$T > 4\frac{RC}{\alpha},\tag{7.1}$$

where  $\alpha$  is the activity factor, *R* is the on-resistance of a transistor and *C* is the load capacitance. Note that the *RC* parameter scales approximately quadratically with technology. Thus, in nanoscale technologies (where the *RC* is in the low picoseconds range), charge-recycling operation can provide considerable power savings, even at the gigahertz frequencies. Elimination of rectifier and regulator further increases the power savings.

### 7.2.3 Monolithic 3D Technology for AC Computing

As demonstrated in the Section 6.4, the proposed methodology suffers from the stronger dependence of power consumption on layout-level parasitic impedances. This characteristic is due to the relatively longer cell-level interconnects and the cross-coupled nature of the gates. Therefore, it is critical to reduce the additional parasitic resistance and capacitance in the physical layout of the circuit for the
proposed methodology. Monolithic three-dimensional (3D) technology can be explored to mitigate this limitation since cell-level interconnects within AC computingbased logic families can be significantly reduced via sufficiently short monolithic inter-tier vias (MIVs) [88] [89]. MIVs enable vertical interconnections with comparable size to conventional on-chip metal vias, thus achieving ultra-high density device integration [90] [91].

## 7.2.4 Energy Storage for AC Computing

In conventional energy harvesting systems with DC computing, the harvested energy can be stored within a storage device when not needed. These conventional storage components, however, cannot be used for the proposed scheme that is based on AC computing. Thus, another future direction is to mitigate this limitation by investigating high-*Q LC* tank based energy storage mechanisms and developing methods to copy data to nonvolatile memory when harvested energy is reduced to critical levels [92]. Another approach is to investigate the feasibility of electromechanical energy storage methods that do not require DC conversion such as a MEMS implementation of a flywheel [93].

## 7.2.5 Side-Channel Resistance of AC Computing

Side-channel analysis is known to have significant potential to obtain the secret key in encryption systems by exploiting the physical characteristics of the hardware implementations, including operation timing, power consumption, and electromagnetic radiation [94] [95]. One of the commonly used attacks is to statistically analyze the power drawn by the device and correlate the results with the input data for different key guesses [96]. To mitigate the power-based side-channel attack, the correlation between input data and power consumption should be reduced [97] [98]. AC computing methodology exhibits interesting characteristics for side-channel resistance due to differential and symmetric outputs and much lower signal-to-noise ratios (due to much lower power consumption). The use of sine waves and charge-recycling could also increase side-channel resistance. These investigations could potentially initiate a new direction where energy efficiency and security are simultaneously considered for RF-powered devices.

## **Bibliography**

- J. Gubbi, R. Buyya, S. Marusic, and M. Palaniswami, "Internet of things (iot): A vision, architectural elements, and future directions," *Future Generation Computer Systems*, vol. 29, no. 7, pp. 1645–1660, 2013.
- [2] J.-M. Tarascon and M. Armand, "Issues and challenges facing rechargeable lithium batteries," in *Materials For Sustainable Energy: A Collection of Peer-Reviewed Research and Review Articles from Nature Publishing Group.* World Scientific, 2011, pp. 171–179.
- [3] B. Scrosati and J. Garche, "Lithium batteries: Status, prospects and future," *Journal of Power Sources*, vol. 195, no. 9, pp. 2419–2430, 2010.
- [4] V. Etacheri, R. Marom, R. Elazari, G. Salitra, and D. Aurbach, "Challenges in the development of advanced li-ion batteries: a review," *Energy & Environmental Science*, vol. 4, no. 9, pp. 3243–3262, 2011.
- [5] S. Grady, "Powering wearable technology and internet of everything devices," http://www.cymbet.com/pdfs/ Powering-Wearable-Technology-and-the-Internet-of-Everything-WP-72-10. 1.pdf, 2014.
- [6] S. Sudevalayam and P. Kulkarni, "Energy harvesting sensor nodes: Survey and implications," *IEEE Communications Surveys & Tutorials*, vol. 13, no. 3, pp. 443–461, 2011.
- [7] C. Lu, V. Raghunathan, and K. Roy, "Efficient design of micro-scale energy harvesting systems," *IEEE Journal on Emerging and Selected Topics in Circuits and Systems*, vol. 1, no. 3, pp. 254–266, 2011.
- [8] K. Ma, X. Li, K. Swaminathan, Y. Zheng, S. Li, Y. Liu, Y. Xie, J. J. Sampson, and V. Narayanan, "Nonvolatile processor architectures: Efficient, reli-

able progress with unstable power," *IEEE Micro*, vol. 36, no. 3, pp. 72–83, 2016.

- [9] S. Hui, "Planar wireless charging technology for portable electronic products and qi," *Proceedings of the IEEE*, vol. 101, no. 6, pp. 1290–1301, 2013.
- [10] Y. Peng, Z. Li, W. Zhang, and D. Qiao, "Prolonging sensor network lifetime through wireless charging," in *Real-time systems symposium (RTSS)*, 2010 *IEEE 31st.* IEEE, 2010, pp. 129–139.
- [11] C. Sauer, M. Stanaćević, G. Cauwenberghs, and N. Thakor, "Power harvesting and telemetry in cmos for implanted devices," *Circuits and Systems I: Regular Papers, IEEE Transactions on*, vol. 52, no. 12, pp. 2605–2613, 2005.
- [12] K. Gudan, S. Chemishkian, J. Hull, M. Reynolds, and S. Thomas, "Feasibility of wireless sensors using ambient 2.4 ghz rf energy," in *Sensors*, 2012 IEEE. IEEE, 2012, pp. 1–4.
- [13] S. Kim, R. Vyas, J. Bito, K. Niotaki, A. Collado, A. Georgiadis, and M. Tentzeris, "Ambient rf energy-harvesting technologies for self-sustainable standalone wireless sensor platforms," *Proceedings of the IEEE*, vol. 102, no. 11, pp. 1649–1666, 2014.
- [14] M. A. Hannan, S. Mutashar, S. A. Samad, and A. Hussain, "Energy harvesting for the implantable biomedical devices: issues and challenges," *Biomedical engineering online*, vol. 13, no. 1, p. 79, 2014.
- [15] M. M. Ahmadi and G. A. Jullien, "A wireless-implantable microsystem for continuous blood glucose monitoring," *IEEE Transactions on Biomedical Circuits and Systems*, vol. 3, no. 3, pp. 169–180, 2009.
- [16] B. Lenaerts and R. Puers, Omnidirectional inductive powering for biomedical implants. Springer, 2009.
- [17] J. Ho, A. Yeh, E. Neofytou, S. Kim, Y. Tanabe, B. Patlolla, R. Beygui, and A. Poon, "Wireless power transfer to deep-tissue microimplants," *Proceedings* of the National Academy of Sciences, vol. 111, no. 22, pp. 7974–7979, 2014.
- [18] X. Lu, P. Wang, D. Niyato, D. Kim, and Z. Han, "Wireless networks with rf energy harvesting: A contemporary survey," *IEEE Communications Surveys* & *Tutorials*, vol. 17, no. 2, pp. 757–789, 2015.

- [19] T. Soyata, L. Copeland, and W. Heinzelman, "Rf energy harvesting for embedded systems: A survey of tradeoffs and methodology," *IEEE Circuits and Systems Magazine*, vol. 16, no. 1, pp. 22–57, 2016.
- [20] R. Vullers, R. V. Schaijk, H. Visser, J. Penders, and C. V. Hoof, "Energy harvesting for autonomous wireless sensor networks," *IEEE Solid-State Circuits Magazine*, vol. 2, no. 2, pp. 29–38, 2010.
- [21] G. Park, T. Rosing, M. D. Todd, C. R. Farrar, and W. Hodgkiss, "Energy harvesting for structural health monitoring sensor networks," *Journal of Infrastructure Systems*, vol. 14, no. 1, pp. 64–79, 2008.
- [22] J. P. Lynch and K. J. Loh, "A summary review of wireless sensors and sensor networks for structural health monitoring," *Shock and Vibration Digest*, vol. 38, no. 2, pp. 91–130, 2006.
- [23] G. K. Balachandran and R. E. Barnett, "A 110 na voltage regulator system with dynamic bandwidth boosting for rfid systems," *Solid-State Circuits, IEEE Journal of*, vol. 41, no. 9, pp. 2019–2028, 2006.
- [24] G. Papotto, F. Carrara, and G. Palmisano, "A 90-nm cmos thresholdcompensated rf energy harvester," *IEEE Journal of Solid-State Circuits*, vol. 46, no. 9, pp. 1985–1997, 2011.
- [25] M. Stoopman, S. Keyrouz, H. Visser, K. Philips, and W. Serdijn, "Co-design of a cmos rectifier and small loop antenna for highly sensitive rf energy harvesters," *IEEE Journal of Solid-State Circuits*, vol. 49, no. 3, pp. 622–634, 2014.
- [26] C. Valenta and G. Durgin, "Harvesting wireless power: Survey of energyharvester conversion efficiency in far-field, wireless power transfer systems," *IEEE Microwave Magazine*, vol. 15, no. 4, pp. 108–120, 2014.
- [27] M. Bailyn, A survey of thermodynamics. AIP, 1994.
- [28] L. Tisza, *Generalized thermodynamics*. MIT press Cambridge, 1966, vol. rf1.
- [29] C. H. Bennett and R. Landauer, "The fundamental physical limits of computation," *Scientific American*, vol. 253, no. 1, pp. 48–57, 1985.

- [30] J. S. Denker, "A review of adiabatic computing," in Low power electronics, 1994. Digest of technical papers., IEEE symposium. IEEE, 1994, pp. 94–97.
- [31] R. Landauer, "Irreversibility and heat generation in the computing process," *IBM journal of research and development*, vol. 5, no. 3, pp. 183–191, 1961.
- [32] C. H. Bennett, "Logical reversibility of computation," *IBM journal of Re-search and Development*, vol. 17, no. 6, pp. 525–532, 1973.
- [33] S. G. Younis, "Asymptotically zero energy computing using split-level charge recovery logic." MASSACHUSETTS INST OF TECH CAMBRIDGE ARTI-FICIAL INTELLIGENCE LAB, Tech. Rep., 1994.
- [34] P. Teichmann, *Adiabatic logic: future trend and system level perspective*. Springer Science & Business Media, 2011, vol. 34.
- [35] Y. Moon and D.-K. Jeong, "An efficient charge recovery logic circuit," Solid-State Circuits, IEEE Journal of, vol. 31, no. 4, pp. 514–522, 1996.
- [36] D. Maksimovic and V. G. Oklobdzija, "Clocked cmos adiabatic logic with single ac power supply," in *Solid-State Circuits Conference*, 1995. ESSCIRC'95. *Twenty-first European*. IEEE, 1995, pp. 370–373.
- [37] M. Alioto and G. Palumbo, "Performance evaluation of adiabatic gates," *IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications*, vol. 47, no. 9, pp. 1297–1308, 2000.
- [38] S. Kim, C. H. Ziesler, and M. C. Papaefthymiou, "Charge-recovery computing on silicon," *Computers, IEEE Transactions on*, vol. 54, no. 6, pp. 651–659, 2005.
- [39] P. Ranjith, S. K. Mandal, and D. Nagchoudhuri, "An efficient power clock generation circuit for complementary pass-transistor adiabatic logic carry-save multiplier," in *Computers and Devices for Communication*, 2009. CODEC 2009. 4th International Conference on. IEEE, 2009, pp. 1–4.
- [40] A. Blotti and R. Saletti, "Ultralow-power adiabatic circuit semi-custom design," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 12, no. 11, pp. 1248–1253, 2004.

- [41] H. J. Visser and R. J. Vullers, "Rf energy harvesting and transport for wireless sensor network applications: Principles and requirements," *Proceedings of the IEEE*, vol. 101, no. 6, pp. 1410–1423, 2013.
- [42] I. Flint, X. Lu, N. Privault, D. Niyato, and P. Wang, "Performance analysis of ambient rf energy harvesting: A stochastic geometry approach," in *Global Communications Conference (GLOBECOM)*, 2014 IEEE. IEEE, 2014, pp. 1448–1453.
- [43] I. Flint, X. Lu, N. Privault, D. Niyato, and P. Wang, "Performance analysis of ambient rf energy harvesting with repulsive point process modeling," *IEEE Transactions on Wireless Communications*, vol. 14, no. 10, pp. 5402–5416, 2015.
- [44] W. L. Stutzman and G. A. Thiele, *Antenna theory and design*. John Wiley & Sons, 2012.
- [45] S. K. Yoon, S. J. Kim, and U. K. Kwon, "A new circuit structure for near field wireless power transmission," in *Circuits and Systems (ISCAS)*, 2012 IEEE International Symposium on. IEEE, 2012, pp. 982–985.
- [46] J. Siebert, J. Collier, and R. Amirtharajah, "Self-timed circuits for energy harvesting ac power supplies," in *Proceedings of the 2005 international symposium on Low power electronics and design*. ACM, 2005, pp. 315–318.
- [47] J. Wenck, R. Amirtharajah, J. Collier, and J. Siebert, "Ac power supply circuits for energy harvesting," in 2007 IEEE Symposium on VLSI Circuits. IEEE, Jun 2007, pp. 92–93.
- [48] S. Briole, C. Pacha, K. Goser, A. Kaiser, R. Thewes, W. Weber, and R. Brederlow, "Ac-only rf id tags for barcode replacement," in *Solid-State Circuits Conference, 2004. Digest of Technical Papers. ISSCC. 2004 IEEE International.* IEEE, 2004, pp. 438–537.
- [49] Y. Ye and K. Roy, "Qserl: Quasi-static energy recovery logic," *IEEE Journal of Solid-State Circuits*, vol. 36, no. 2, pp. 239–248, 2001.
- [50] Y. He, J. Tian, X. Tan, and H. Min, "Quasi-static adiabatic logic 2n-2n2p2d family," *Electronics Letters*, vol. 42, no. 16, p. 1, 2006.
- [51] Y. He and H. Min, "Adiabatic circuit applied for lf tag," *Auto-ID Labs White Paper WPHARDWARE-041*, 2007.

- [52] W. Zhao, K. Bhanushali, and P. Franzon, "Design of a rectifier-free uhf gen-2 compatible rfid tag using rf-only logic," in *RFID (RFID), 2016 IEEE International Conference on.* IEEE, 2016, pp. 1–6.
- [53] T. Wan, E. Salman, and M. Stanacevic, "A new circuit design framework for iot devices: Charge-recycling with wireless power harvesting," in *Circuits and Systems (ISCAS), 2016 IEEE International Symposium on*. IEEE, 2016, pp. 2046–2049.
- [54] M. Arsalan and M. Shams, "Charge-recovery power clock generators for adiabatic logic circuits," in *Proceedings of the 18th International Conference on VLSI Design*. IEEE, 2005, pp. 171–174.
- [55] E. Salman, E. G. Friedman, and R. M. Secareanu, "Substrate and ground noise interactions in mixed-signal circuits," in 2018 IEEE International System-on-Chip Conference, September 2016, pp. 293–296.
- [56] I. J. Bahl, *Lumped elements for RF and microwave circuits*. Artech house, 2003.
- [57] E. Cantatore *et al.*, "A 13.56-mhz rfid system based on organic transponders," *IEEE Journal of Solid-State Circuits*, vol. 42, no. 1, pp. 84–92, 2007.
- [58] V. Oklobdzija, D. Maksimovic, and F. Lin, "Pass-transistor adiabatic logic using single power-clock supply," *IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing*, vol. 44, no. 10, pp. 842–846, 1997.
- [59] T. Wan, Y. Karimi, M. Stanacevic, and E. Salman, "Energy efficient ac computing methodology for wirelessly powered iot devices," in *Circuits and Systems (ISCAS), 2017 IEEE International Symposium on.* IEEE, 2017, pp. 1–4.
- [60] Y. Huang, T. Wan, E. Salman, and M. Stanaćević, "Signal shaping at interface of wireless power harvesting and ac computational logic," in 2019 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 2019, pp. 1–5.
- [61] C.-S. A. Gong, M.-T. Shiue, C.-T. Hong, and K.-W. Yao, "Analysis and design of an efficient irreversible energy recovery logic in 0.18-m cmos," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 55, no. 9, pp. 2595–2607, 2008.

- [62] T. Wan, Y. Karimi, M. Stanaćević, and E. Salman, "Perspective paper can ac computing be an alternative for wirelessly powered iot devices?" *IEEE Embedded Systems Letters*, vol. 9, no. 1, pp. 13–16, 2017.
- [63] T. Wan, Y. Karimi, M. Stanaćević, and E. Salman, "Ac computing methodology for rf-powered iot devices," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 27, no. 5, pp. 1017–1028, 2019.
- [64] U.-M. Jow and M. Ghovanloo, "Design and optimization of printed spiral coils for efficient transcutaneous inductive power transmission," *IEEE Transactions* on biomedical circuits and systems, vol. 1, no. 3, pp. 193–202, 2007.
- [65] A. Christ, M. Douglas, J. Nadakuduti, and N. Kuster, "Assessing human exposure to electromagnetic fields from wireless power transmission systems," *Proceedings of the IEEE*, vol. 101, no. 6, pp. 1482–1493, 2013.
- [66] J. Lin, "A new ieee standard for safety levels with respect to human exposure to radio-frequency radiation," *IEEE Antennas and Propagation Magazine*, vol. 48, no. 1, pp. 157–159, 2006.
- [67] M. Yin, D. Borton, J. Aceros, W. Patterson, and A. Nurmikko, "A 100-channel hermetically sealed implantable device for chronic wireless neurosensing applications," *IEEE transactions on biomedical circuits and systems*, vol. 7, no. 2, pp. 115–128, 2013.
- [68] D. Seo, R. Neely, K. Shen, U. Singhal, E. Alon, J. Rabaey, J. Carmena, and M. Maharbiz, "Wireless recording in the peripheral nervous system with ultrasonic neural dust," *Neuron*, vol. 91, no. 3, pp. 529–539, 2016.
- [69] B. Thurgood, D. Warren, N. Ledbetter, G. Clark, and R. Harrison, "A wireless integrated circuit for 100-channel charge-balanced neural stimulation," *IEEE Transactions on Biomedical Circuits and Systems*, vol. 3, no. 6, pp. 405–414, 2009.
- [70] H. Zhao, D. Sokolov, and P. Degenaar, "An implantable optrode with selfdiagnostic function in 0.35 μm cmos for optical neural stimulation," in *Biomedical Circuits and Systems Conference (BioCAS)*, 2014 IEEE. IEEE, 2014, pp. 244–247.
- [71] H. Zhao, F. Dehkhoda, R. Ramezani, D. Sokolov, P. Degenaar, Y. Liu, and T. Constandinou, "A cmos-based neural implantable optrode for optogenetic

stimulation and electrical recording," in *Biomedical Circuits and Systems Conference (BioCAS)*, 2015 IEEE. IEEE, 2015, pp. 1–4.

- [72] M. Zargham and P. Gulak, "Maximum achievable efficiency in near-field coupled power-transfer systems," *IEEE Transactions on Biomedical Circuits and Systems*, vol. 6, no. 3, pp. 228–245, 2012.
- [73] A. Poon, S. O'Driscoll, and T. Meng, "Optimal frequency for wireless power transmission into dispersive tissue," *IEEE Transactions on Antennas and Propagation*, vol. 58, no. 5, pp. 1739–1750, 2010.
- [74] L. G. de Carli, Y. Juppa, A. J. Cardoso, C. Galup-Montoro, and M. C. Schneider, "Maximizing the power conversion efficiency of ultra-low-voltage cmos multi-stage rectifiers," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 62, no. 4, pp. 967–975, 2015.
- [75] T. Le, K. Mayaram, and T. Fiez, "Efficient far-field radio frequency energy harvesting for passively powered sensor networks," *IEEE Journal of Solid-State Circuits*, vol. 43, no. 5, pp. 1287–1302, 2008.
- [76] Y. Lu, H. Dai, M. Huang, M.-K. Law, S.-W. Sin, U. Seng-Pan, and R. P. Martins, "A wide input range dual-path cmos rectifier for rf energy harvesting," *IEEE Transactions on Circuits and Systems II: Express Briefs*, 2016.
- [77] K. Roy, S. Mukhopadhyay, and H. Mahmoodi-Meimand, "Leakage current mechanisms and leakage reduction techniques in deep-submicrometer cmos circuits," *Proceedings of the IEEE*, vol. 91, no. 2, pp. 305–327, 2003.
- [78] X. Yuan, J.-E. Park, J. Wang, E. Zhao, D. C. Ahlgren, T. Hook, J. Yuan, V. W. Chan, H. Shang, C.-H. Liang *et al.*, "Gate-induced-drain-leakage current in 45-nm cmos technology," *IEEE Transactions on Device and Materials Reliability*, vol. 8, no. 3, pp. 501–508, 2008.
- [79] D. Maksimovic and V. G. Oklobdzija, "Integrated power clock generators for low energy logic," in *Power Electronics Specialists Conference*, 1995. *PESC'95 Record.*, 26th Annual IEEE, vol. 1. IEEE, 1995, pp. 61–67.
- [80] T. Wan and E. Salman, "Ultra low power simon core for lightweight encryption," in 2018 IEEE International Symposium on Circuits and Systems (IS-CAS). IEEE, 2018, pp. 1–5.

- [81] R. Beaulieu, D. Shors, J. Smith, S. Treatman-Clark, B. Weeks, and L. Wingers, "The SIMON and SPECK families of lightweight block ciphers," Cryptology ePrint Archive, Report 2013/404, 2013, http://eprint.iacr.org/2013/404.
- [82] R. Beaulieu, S. Treatman-Clark, D. Shors, B. Weeks, J. Smith, and L. Wingers, "The simon and speck lightweight block ciphers," in *Design Automation Conference (DAC)*, 2015 52nd ACM/EDAC/IEEE. IEEE, 2015, pp. 1–6.
- [83] A. Aysu, E. Gulcan, and P. Schaumont, "SIMON says: Break area records of block ciphers on FPGAs," *IEEE Embedded Systems Letters*, vol. 6, no. 2, pp. 37–40, 2014.
- [84] E. Gulcan, A. Aysu, and P. Schaumont, "A flexible and compact hardware architecture for the SIMON block cipher," in *International Workshop on Lightweight Cryptography for Security and Privacy*. Springer, 2014, pp. 34–50.
- [85] E. Salman, M. Stanaćević, S. Das, and P. M. Djurić, "Leveraging rf power for intelligent tag networks," in *Proceedings of the 2018 on Great Lakes Symposium on VLSI*. ACM, 2018, pp. 329–334.
- [86] Y. Karimi, A. Athalye, S. R. Das, P. M. Djurić, and M. Stanaćević, "Design of a backscatter-based tag-to-tag system," in *RFID (RFID)*, 2017 IEEE International Conference on. IEEE, 2017, pp. 6–12.
- [87] A. Athalye, J. Jian, Y. Karimi, S. R. Das, and P. M. Djurić, "Analog front end design for tags in backscatter-based tag-to-tag communication networks," in *Circuits and Systems (ISCAS), 2016 IEEE International Symposium on*. IEEE, 2016, pp. 2054–2057.
- [88] V. F. Pavlidis, I. Savidis, and E. G. Friedman, *Three-dimensional integrated circuit design*. Newnes, 2017.
- [89] I. Miketic and E. Salman, "Power and data integrity in monolithic 3d integrated simon core," in 2019 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 2019, pp. 1–5.
- [90] C. Yan and E. Salman, "Mono3d: Open source cell library for monolithic 3d integrated circuits," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 65, no. 3, pp. 1075–1085, 2017.

- [91] C. Yan, S. Kontak, H. Wang, and E. Salman, "Open source cell library mono3d to develop large-scale monolithic 3d integrated circuits," in 2017 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 2017, pp. 1–4.
- [92] Y. Wang, Y. Liu, C. Wang, Z. Li, X. Sheng, H. G. Lee, N. Chang, and H. Yang, "Storage-less and converter-less photovoltaic energy harvesting with maximum power point tracking for internet of things," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 35, no. 2, pp. 173–186, 2016.
- [93] K. W. Lee, J.-e. Yi, B. Kim, J. Ko, S. Jeong, M. Noh, and S. S. Lee, "Micro generator using flywheel energy storage system with high-temperature superconductor bearing," in *Micro Electro Mechanical Systems, 2007. MEMS. IEEE 20th International Conference on.* IEEE, 2007, pp. 875–878.
- [94] F.-X. Standaert, "Introduction to side-channel attacks," in *Secure Integrated Circuits and Systems*. Springer, 2010, pp. 27–42.
- [95] H. Bar-El, "Introduction to side channel attacks," *Discretix Technologies Ltd*, vol. 43, 2003.
- [96] E. Brier, C. Clavier, and F. Olivier, "Correlation power analysis with a leakage model," in *International Workshop on Cryptographic Hardware and Embedded Systems*. Springer, 2004, pp. 16–29.
- [97] S. D. Kumar, H. Thapliyal, and A. Mohammad, "Finsal: Finfet-based secure adiabatic logic for energy-efficient and dpa resistant iot devices," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 37, no. 1, pp. 110–122, 2017.
- [98] D. D. Hwang, K. Tiri, A. Hodjat, B.-C. Lai, S. Yang, P. Schaumont, and I. Verbauwhede, "Aes-based security coprocessor ic in 0.18-*muhboxm* cmos with resistance to differential power analysis side-channel attacks," *IEEE Journal* of Solid-State Circuits, vol. 41, no. 4, pp. 781–792, 2006.