Publication Details
General memory efficient packet matching FPGA architecture for future high-speed networks
FPGA, Packet matching, Packet filtering, High-speed networks, Exact match, Cuckoo
hashing
Packet classification (matching) is one of the critical operations in networking
widely used in many different devices and tasks ranging from switching or routing
to a variety of monitoring and security applications like firewall or IDS. To
satisfy the ever-growing performance demands of current and future high-speed
networks, specially designed hardware accelerated architectures implementing
packet classification are necessary. These demands are now growing to such an
extent, that in order to keep up with the rising throughputs of network links,
the FPGA accelerated architectures are required to perform matching of multiple
packets in every single clock cycle. To meet this requirement a simple
replication approach can be utilized - instantiate multiple copies of
a processing pipeline matching incoming packets in parallel. However, simple
replication of pipelines inseparably brings a significant increase in utilization
of FPGA resources of all types, which is especially costly for rather scarce
on-chip memories used in matching tables. We propose and examine a unique
parallel hardware architecture for hash-based exact match classification of
multiple packets in each clock cycle that offers a reduction of memory
replication requirements. The core idea of the proposed architecture is to
exploit the basic memory organization structure present in all modern FPGAs,
where hundreds of individual block or distributed memory tiles are available and
can be accessed (addressed) independently. This way, we are able to maintain
a rather high throughput of matching multiple packets per clock cycle even
without fully replicated memory resources in matching tables. Our results show
that the designed approach can use on-chip memory resources very efficiently and
even scales exceptionally well with increased capacities of match tables. For
example, the proposed architecture is able to achieve a throughput of more than
2 Tbps (over 3 000 Mpps) with an effective capacity of more than 40 000 IPv4 flow
records at the cost of only a few hundred block memory tiles (366 BlockRAM for
Xilinx or 672 M20K for Intel FPGAs) utilizing only a small fraction of available
logic resources (around 68 000 LUTs for Xilinx or 95 000 ALMs for Intel).
@article{BUT161471,
author="Michal {Kekely} and Lukáš {Kekely} and Jan {Kořenek}",
title="General memory efficient packet matching FPGA architecture for future high-speed networks",
journal="Microprocessors and Microsystems",
year="2020",
volume="73",
number="3",
pages="1--12",
doi="10.1016/j.micpro.2019.102950",
issn="0141-9331",
url="http://www.sciencedirect.com/science/article/pii/S0141933119301334"
}