Publication Details
Low Overhead Distributed IP Flow Records Collection and Analysis
Žádník Martin, Ing., Ph.D. (DCSY)
NetFlow, IPFIX, IP flow collector, distributed system, parallel computing, Hadoop, big data
Collection and analysis of IP flow records are data-intensive tasks for which the power of a single node may not be sufficient. Several Hadoop-based solutions to this problem exist, but those are usually suitable only for truly big data, otherwise, disadvantages of Hadoop may prevail. In this work, we presented a distributed platform with significantly less overhead, focusing on smaller clusters, preserving interactivity of the centralized system while exploiting the prospects of the distributed system like high availability, parallel processing, scalability or redundancy. Experiments showed great scalability of both storage and query performance. Extensions for data mining and machine learning are easy to include and are already work in progress, moreover, the whole software stack is open-source.
@misc{BUT170109,
author="Jan {Wrona} and Martin {Žádník}",
title="Low Overhead Distributed IP Flow Records Collection and Analysis",
booktitle="SIGCOMM '17: Proceedings of the 2017 ACM SIGCOMM Conference",
year="2017",
pages="2",
address="Los Angeles",
note="abstract"
}