August 2016: I am looking for motivated PhD students with backgrounds in digital design/FPGAs to join my research group. Please contact me if interested.


Energy efficiency has become the limiting factor of current and future computing performance, affecting computing systems of all kinds, from mobile devices to datacenters. Meanwhile, modern applications continue to grow more complex and computationally expensive, while relying on larger amounts of data. This presents a considerable challenge: how can we continue to improve our computational capabilities in spite of these limitations?

A key technique to improve energy efficiency and reach high performance is hardware specialization. Recently, there has been much interest in using field-programmable gate arrays (FPGAs) as accelerators in general-purpose computing environments. Their fine-grained parallel structures allow them to exploit the benefits of hardware-level customization while they still allow reprogrammability.

However the biggest obstacle limiting the growth of FPGAs is the difficulty of implementing algorithms in hardware and integrating this hardware into real-world computer systems. My research aims to address these difficulties by combining the areas of digital hardware design with compilers, tools, and domain-specific languages. More specifically, my work explores how we can use computer-based tools to make digital hardware more efficient, how we can reduce the effort needed to design, optimize, and verify digital systems, and how these technologies can be exploited to address key challenges in modern computing.

Below you will find high-level descriptions of my current research work and information for a few selected papers. For a full list of papers please see my Publications page.

Accelerating Deep Learning and Computer Vision with FPGAs

Deep learning and convolutional neural networks (CNNs) have revolutionized machine learning, leading to recent advances in several areas such natural language processing and computer vision, and widespread interest from industry and academia. However, these advances come at a steep computational cost. The goal of this project is to enable implementation of large-scale deep learning applications on a scalable parallel “cloud” of FPGAs by automating the translation from straightforward algorithmic specifications of deep learning problems into optimized hardware, parallelized across many interconnected FPGAs.

This work is funded by the National Science Foundation's Exploiting Parallelism and Scalability (XPS) program.

Selected papers:

Domain-Specific Languages and Tools for Automatic Hardware Generation

In order to reduce the difficulty of implementing FPGA and ASIC accelerators, researchers have proposed a number of different types of automated systems. Some of these take the form of parameterized IP (intellectual property) cores, which are implementations of a given problem created by an expert with a small amount of flexibility through parameters. At the other end of the spectrum are tools such as “high-level synthesis” (HLS) that aim to convert C or C++ code directly into hardware. In practice, typical parameterized IPs are too restrictive, forcing designers into a “one-size-fits-all” approach; meanwhile, HLS is too open-ended: by trying to work well for all problems, it is too difficult to produce good solutions.

My work aims to address these problems through the use of domain-specific hardware generation tools. These tools target a specific domain of problems (e.g. linear DSP transforms), providing enough flexibility to work well for a variety of different problems in the domain, while being targeted enough that they can produce very good results with little effort from the end user. One example of this is my work on the Spiral hardware generation framework, a domain specific hardware generation tool for linear signal processing transforms such as the fast Fourier transform. This system uses a mathematical domain-specific language (DSL) to optimize transform algorithm hardware; its results are competitive with (and often are more efficient than) hand-designed systems.

My ongoing work aims to create a flexible framework for creating domain-specific hardware generators, improving their usability, and using the results to study new application domains.

Selected papers:

See also the Spiral DFT/FFT hardware generator, which produces high quality designs over a very wide tradeoff space, allowing users to choose designs that best match their implementation-specific tradeoff goals, balancing cost (power, energy, area) against performance (throughput, latency). The system is able to produce cores that compare well with existing designs in the literature or in IP libraries and enables higher performance/cost design points than otherwise available.

Hardware Accelerators for Datacenters and Networks

Datacenters (large-scale computing centers comprised of large numbers of servers) have become ubiquitous in modern computing, but are severely power constrained. Although typical datacenter applications are not traditional targets for hardware acceleration, their strict power limits have made FPGA acceleration an attractive target. However, typical datacenter applications can be considerably challenging to accelerate with FPGAs. The goal of this work is to study how FPGAs can improve efficiency and speed of large-scale datacenters and their applications.

This work is supported by the Semiconductor Research Corporation.