Publication Details
Unmasking the Phishermen: Phishing Domain Detection with Machine Learning and Multi-Source Intelligence
Horák Adam, Ing. (DIFS)
Polišenský Jan, Bc. (DIFS)
Jeřábek Kamil, Ing., Ph.D. (DIFS)
Ryšavý Ondřej, doc. Ing., Ph.D. (DIFS)
Phishing, Domain, Detection, Machine learning, XGBoost, Features, DNS, RDAP, TLS,
GeoIP
In the digital landscape, phishing attacks have rapidly evolved into a major
cybersecurity challenge, posing significant risks to individuals and
organizations. This short paper presents our preliminary research on detecting
phishing domains. Our approach amalgamates intelligence from multiple sources:
DNS servers, WHOIS/RDAP, TLS certificates, and GeoIP data. We created a rich 15.8
GB dataset of information about benign and phishing domains, from which we
derived a comprehensive 80-feature vector for training and testing machine
learning classifiers. We propose preliminary results with a fine-tuned XGBoost
model, achieving 0.9716 precision rate, 0.9540 F-1 score, and false positive rate
of 0.23%.
@inproceedings{BUT186776,
author="Radek {Hranický} and Adam {Horák} and Jan {Polišenský} and Kamil {Jeřábek} and Ondřej {Ryšavý}",
title="Unmasking the Phishermen: Phishing Domain Detection with Machine Learning and Multi-Source Intelligence",
booktitle="Proceedings of IEEE/IFIP Network Operations and Management Symposium 2024",
year="2024",
pages="1--5",
publisher="Institute of Electrical and Electronics Engineers",
address="Soul",
doi="10.1109/NOMS59830.2024.10575573",
isbn="979-8-3503-2794-6",
url="https://ieeexplore.ieee.org/document/10575573"
}