Publication Details
Acceleration Techniques for Automated Design of Approximate Convolutional Neural Networks
Mrázek Vojtěch, Ing., Ph.D. (DCSY)
Vaverka Filip, Ing., Ph.D.
Vašíček Zdeněk, doc. Ing., Ph.D. (DCSY)
Sekanina Lukáš, prof. Ing., Ph.D. (DCSY)
- Approximate computing,
- convolutional neural network,
- neural architecture search,
- energy efficiency,
- quantization,
- acceleration
The main issue connected with using approximate components such as approximate
multipliers in deep convolutional neural networks (CNN) during the design process
is the necessity to emulate them due to the lack of native support for
approximate operations in modern CPUs and GPUs, which is computationally
expensive. To accelerate the emulation of approximate operations of CNNs on GPUs,
we propose TFApprox4IL, a software library supporting both symmetric as well as
asymmetric quantization modes, approximate 8xN bit multipliers emulated using
lookup tables, a new type of approximate layer known as approximate depthwise
convolution, and quantization-aware training. The TFApprox4IL performance is
extensively evaluated in the simulation of approximate implementations of
MobileNetV2 and ResNet networks on Nvidia Pascal and Tesla GPU architectures.
Furthermore, TFApprox4IL is also evaluated in neural architecture search (NAS)
algorithms to automatically design CNN architectures that directly employ
approximate multipliers. On two different NAS methods, EvoApproxNAS and Google
Model Search (GMS), we show how approximate multipliers can effectively be
incorporated into the CNN design process. To estimate the energy consumption of
the approximate CNNs, AxMultAT tool based on Timeloop and Accelergy is
introduced. Contrasted to the highly optimized GPU-based CNN simulation
implemented using exact arithmetic operations available within TensorFlow, the
average overhead of the inference and training, introduced by TFApprox4IL,
is 13.6× and 8.0× , respectively, considering ResNet50V2 and MobileNetV2 CNNs on
ImageNet and CIFAR-10 data sets. This overhead was reduced by one order of
magnitude with respect to previous methods.
@article{BUT180721,
author="Michal {Piňos} and Vojtěch {Mrázek} and Filip {Vaverka} and Zdeněk {Vašíček} and Lukáš {Sekanina}",
title="Acceleration Techniques for Automated Design of Approximate Convolutional Neural Networks",
journal="IEEE Journal on Emerging and Selected Topics in Circuits and Systems",
year="2023",
volume="13",
number="1",
pages="212--224",
doi="10.1109/JETCAS.2023.3235204",
issn="2156-3357",
url="https://ieeexplore.ieee.org/document/10011413"
}