SoC and Hardware Software Co-design for Resource
Constrained Embedded Systems
Graduate Students
Natt Thepayasuwan, Ph.D. candidate
Sankalp Kallakuri, Ph.D. candidate
Yulei Weng, Ph.D. candidate
Vaishali Damle, M.S. (graduated in 2003)
Rohit Pai, M.S. (graduated in 2003)
Research Goal and Objectives
Motivation
Many embedded systems must meet stringent cost, timing, and energy consumption
constraints. In addition, embedded architectures are very thrifty in employing
hardware resources: they include general purpose processors running
at low/medium frequencies (like ARM, 801C188EB, Philips 80C552 etc), have
a reduced amount of memory (the memory capacity can be as low as 128k of
RAM and 256k of flash memory), and incorporate customized co-processors
and I/O peripherals (including RF and analog circuits). Typical examples
include embedded systems for telecommunication and multimedia, like cell
phones, digital cameras, and personal communicators. Systems-on-Chip (SoC)
are single-chip implementations of embedded systems. Compared to printed
circuit board designs, SoC offer higher performance and reliability at
cheaper costs. It is foreseen that advances in device manufacturing technology,
including present deep submicron technologies and future nanotechnologies,
will continuously reduce the minimum feature size, and thus increase the
functional complexity of SoCs, while clock frequencies will range
around 10-15 GHz.

Figure 1: Impact of layout on data communication speed and system design
For SoC realized in deep submicron technologies (DSM), physical level
attributes, such as interconnect parasitics, substrate coupling, and substrate
noise, significantly influence system performance,
e.g. data communication speed, system latency, power consumption, and
signal integrity. Figure 1 illustrates the impact of layout parasitics
on data communication speed and system design. Each task is labeled by
its execution time on Power PC processor core. Without considering layout
information, the co-design step decides to allocate a single 266MHz system
bus for all core communications. This would meet the timing constraints,
while keeping the system architecture simple. However, considering the
physical distances between cores - shown in Figure 1(b), it is difficult
to implement a bus with the requested speed. The same latency can be obtained
with three buses of lower speed, like those in Figure 1(b), because the
system concurrency improves. The bus speeds of 133MHz, 133MHz, and 33 MHz
were found based on the physical locations of cores, and the RLC parasitic
of the routed buses. This example arguments that the communication sub-system
of an SoC needs to be designed while contemplating layout-related criteria.
In general, it is difficult to postulate a unique bus architecture as being
optimal for various applications and performance requirements. Instead,
bus architectures need to be customized depending on the application specifics
and design needs.
Related Publications
N. Thepayasuwan, A. Doboli, "Layout
Conscious Approach and Bus Architecture Synthesis for Hardware-Software
Co-Design of Systems on Chip Optimized for Speed", accepted for publication,
IEEE Transactions on VLSI Systems, 2004.
N. Thepayasuwan, A. Doboli, "Pruning-based
Synthesis of Flat and Hierarchical Bus Architectures for SoC in Deep Submicron
Technologies", International Journal on Embedded Computing, Vol. 1, 2004.
S. Kallakuri, A. Doboli, S. Doboli,
"Applying Stochastic Modeling to Bus Arbitration for Network-on-Chip Systems",
submitted, Integration the VLSI Journal, special issue on VLSI System-On-Chip,
May 2004.
N. Thepayasuwan, A. Doboli, ``Layout
Conscious Bus Architecture Synthesis for Deep Submicron Systems on Chip'',
Design, Automation and Test in Europe Conference (DATE) 2004, Paris.
S. Kallakuri, A. Doboli, S. Doboli,
"Stochastic Modeling Based Environment for Synthesis and Comparison of
Bus Arbitration Policies", International Symposium on VLSI (ISVLSI), 2004.
N. Thepayasuwan, A. Doboli, "Hardware-Software
Co-Design of Resource Constrained Systems on a Chip in Deep Submicron Technology",
International Workshop on Embedded Computing Systems (ECS-04), Tokyo, 2004.
N. Thepayasuwan, A. Doboli, "OSIRIS:
Automated Synthesis of Flat and Hierarchical Bus Architectures for Deep
Submicron Systems-on-Chip", International Symposium on VLSI (ISVLSI), 2004.
N. Thepayasuwan, V. Damle, A. Doboli,
``Bus Architecture Synthesis for Hardware-Software Co-Design for Deep Submicron
Systems on Chip'', International Conference on Computer Design (ICCD) 2003,
San Jose CA.
N. Thepayasuwan, A. Doboli,"An Exploration
Based Binding and Scheduling Technique for Synthesis of Digital Blocks
for Mixed-Signal Applications", Proc. ISCAS 2003, Bangkok.
S. Kallakuri, A. Doboli, S. Doboli,
``Applying Stochastic Modeling to Bus Arbitration for Network-on-Chip Systems'',
Proc. of the 2003 International Conference on VLSI, Las Vegas, 2003.
V. Damle, A. Doboli, ``Pattern-Based
Pin-to-Pin Routing for High Speed Digital Circuits in Deep Submicron Technologies'',
accepted for the Southwest Symposium on Mixed Signal Design (SSMSD), Las
Vegas, 2003.
N. Thepayasuwan, A. Doboli, ``A Methodology
for Core Placement and Bus Synthesis under Time, Area and Energy Consumption
Constraints'', International Workshop on Logic and Synthesis, New Orleans,
2002.
A. Doboli, R. Vemuri,"Integrated High-Level
Synthesis and Power-Net Routing for Digital Design under Switching Noise
Constraints", Proceedings of the Design Automation Conference 2001, Las
Vegas.
A. Doboli, "Integrated Hardware-Software
Co-Synthesis and High-Level Synthesis for Design of Embedded Systems under
Power and Latency Constraints", Proceedings of the Design, Automation and
Test in Europe Conference, 2001, Munich.
P. Eles, A. Doboli, P. Pop, Z. Peng,
"Scheduling with Buss Access Optimization for Distributed Embedded Systems",
IEEE Transactions on VLSI Systems, Vol. 8, No. 5, pp. 472-491, October
2000.
OSIRIS: Layout Conscious Approach and Bus Architecture
Synthesis for Systems on Chip Optimized for Speed
Approach
This research focuses on hardware-software co-design method for developing
SoC implementations subject to latency minimization. The novelty is in
proposing a systematic, layout-conscious approach for tackling the SoC
communication sub-system, including an original bus architecture synthesis
algorithm. System-level design attempts to minimize latency and maximize
the feasibility of
constraints imposed to the bus architecture. Applications are task
graphs with data dependencies and reduced number of control dependencies.
The set of available hardware resources and the SoC area are known. The
co-design method includes three subsequent parts: (1) combined partitioning
and static non-preemptive scheduling, (2) bus architecture synthesis, and
(3) re-scheduling for the best bus architecture. The first step is an exploration
process based on simulated annealing algorithm. The cost function expresses
the minimization of system latency and maximization of the feasibility
of bus architecture constraints, like required speed, number of links and
amount of resulting connectivity between cores. We propose Performance
Models (PM), a graph-based description, that symbolically captures the
relationships between performance, graph characteristics, and design decisions.
PM are general, flexible, and can be easily extended to new design activities
without requiring cumbersome validation. The second step synthesizes and
routes the bus architecture for an SoC. IP cores are placed using a hierarchical
cluster growth algorithm. Using the proposed PBS bitwise generation algorithm,
bus architecture synthesis first identifies a set of possible building
blocks, and then assembles them together, such that bus length, bus topology,
communication
conflicts, and unnecessary core connectivity are minimized. We propose
a special table structure (named bus architecture synthesis table) and
select-eliminate method to prune poor solutions, such as buses with complex
and redundant connectivity. The algorithm was successfully used to automatically
synthesize bus architectures for realistic SoC, including a network processor
and a JPEG SoC.