Publication Details
Convolutional Neural Networks and X-Vector Embedding for DCASE2018 Acoustic Scene Classification Challenge
Burget Lukáš, doc. Ing., Ph.D. (DCGM)
Černocký Jan, prof. Dr. Ing. (DCGM)
Audio scene classification, Convolutional neuralnetworks, Deep learning, x-vectors, Regularized LDA
In this paper, the Brno University of Technology (BUT) team submissionsfor Task 1 (Acoustic Scene Classification, ASC) of theDCASE-2018 challenge are described. Also, the analysis of differentmethods on the leaderboard set is provided. The proposedapproach is a fusion of two different Convolutional Neural Network(CNN) topologies. The first one is the common two-dimensionalCNNs which is mainly used in image classification. The second oneis a one-dimensional CNN for extracting fixed-length audio segmentembeddings, so called x-vectors, which has also been used inspeech processing, especially for speaker recognition. In additionto the different topologies, two types of features were tested: logmel-spectrogram and CQT features. Finally, the outputs of differentsystems are fused using a simple output averaging in the bestperforming system. Our submissions ranked third among 24 teamsin the ASC sub-task A (task 1a).
@inproceedings{BUT155111,
author="Hossein {Zeinali} and Lukáš {Burget} and Jan {Černocký}",
title="Convolutional Neural Networks and X-Vector Embedding for DCASE2018 Acoustic Scene Classification Challenge",
booktitle="Proceedings of DCASE 2018 Workshop",
year="2018",
pages="1--5",
publisher="Tampere University of Technology",
address="Surrey",
isbn="978-952-15-4262-6",
url="http://dcase.community/documents/workshop2018/proceedings/DCASE2018Workshop_Zeinali_149.pdf"
}