Publication Details
End-to-End Open Vocabulary Keyword Search
keyword search, spoken term detection
Recently, neural approaches to spoken content retrieval have become popular.
However, they tend to be restricted in their vocabulary or in their ability to
deal with imbalanced test settings. These restrictions limit their applicability
in keyword search, where the set of queries is not known beforehand, and where
the system should return not just whether an utterance contains a query but the
exact location of any such occurrences. In this work, we propose a model directly
optimized for keyword search. The model takes a query and an utterance as input
and returns a sequence of probabilities for each frame of the utterance of the
query having occurred in that frame. Experiments show that the proposed model not
only outperforms similar end-to-end models on a task where the ratio of positive
and negative trials is artificially balanced, but it is also able to deal with
the far more challenging task of keyword search with its inherent imbalance.
Furthermore, using our system to rescore the outputs an LVCSR-based keyword
search system leads to significant improvements on the latter.
@inproceedings{BUT175847,
author="YUSUF, B. and GOK, A. and GUNDOGDU, B. and SARAÇLAR, M.",
title="End-to-End Open Vocabulary Keyword Search",
booktitle="Proceedings Interspeech 2021",
year="2021",
journal="Proceedings of Interspeech",
volume="2021",
number="8",
pages="4388--4392",
publisher="International Speech Communication Association",
address="Brno",
doi="10.21437/Interspeech.2021-1399",
issn="1990-9772",
url="https://www.isca-speech.org/archive/interspeech_2021/yusuf21_interspeech.html"
}