Publication Details
DESCNet: Developing Efficient Scratchpad Memories for Capsule Network Hardware
Capsule neural network, inference accelerator, on-chip memory, optimization
Deep Neural Networks (DNNs) have been established as the state-of-the-art method
for advanced machine learning applications. Recently proposed by the Google
Brain's team, the Capsule Networks (CapsNets) have improved the generalization
ability, as compared to DNNs, due to their multi-dimensional capsules and
preserving the spatial relationship between different objects. However, they pose
significantly high computational and memory requirements, making their
energy-efficient inference a challenging task. This paper provides, for the first
time, an in-depth analysis to highlight the design- and run-time challenges for
the (on-chip scratchpad) memories deployed in hardware accelerators executing
fast CapsNets inference. To enable an efficient design, we propose an
application-specific memory architecture, called DESCNet, which minimizes the
off-chip memory accesses, while efficiently feeding the data to the hardware
accelerator executing CapsNets inference. We analyze the corresponding on-chip
memory requirement, and leverage it to propose a methodology for exploring
different scratchpad memory designs and their energy/area trade-offs. Afterwards,
an application-specific power-gating technique for the on-chip scratchpad memory
is employed to further reduce its energy consumption, depending upon the mapped
dataflow of the CapsNet and the utilization across different operations of its
processing. We integrated our DESCNet memory design, as well as another
state-of-the-art memory design for comparison studies, with an open-source DNN
accelerator executing Google's CapsNet model for the MNIST dataset. We also
enhanced the design to execute the recent deep CapsNet model for the CIFAR10
dataset. Note: we use the same benchmarks and test conditions for which these
CapsNets have been proposed and evaluated by their respective teams. The complete
hardware is synthesized for a 32nm CMOS technology using the ASIC-design flow
with Synopsys tools and CACTI-P, and detailed area, performance and power/energy
estimation is performed using different configurations. Our results for
a selected Pareto-optimal solution demonstrate no performance loss and an energy
reduction of 79% for the complete accelerator, including computational units and
memories, when compared to the state-of-the-art design.
@article{BUT168539,
author="MARCHISIO, A. and MRÁZEK, V. and HANIF, M. and SHAFIQUE, M.",
title="DESCNet: Developing Efficient Scratchpad Memories for Capsule Network Hardware",
journal="IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS",
year="2021",
volume="40",
number="9",
pages="1768--1781",
doi="10.1109/TCAD.2020.3030610",
issn="1937-4151",
url="https://ieeexplore.ieee.org/document/9222370"
}