Dissertation Topic

Improving natural language processing

Academic Year: 2024/2025

Supervisor: Šimko Marián, doc. Ing., Ph.D.

Department: Department of Computer Graphics and Multimedia

Programs:
Information Technology (DIT) - combined study
Information Technology (DIT-EN) - combined study

The recent development of large language models (LLMs) shows the potential of deep learning and artificial neural networks for many natural language processing (NLP) tasks. Advances in their automation have a significant impact on a plethora of innovative applications affecting everyday life.

Although large-scale language models have been successfully used to solve a large number of tasks, several research challenges remain. These may be related with individual natural language processing tasks, application domains, or the languages themselves. In addition, new challenges stemming from the nature of large language models and the so-called black-box nature of neural network-based models.

Further research and exploration of related phenomena is needed, with special attention to the problem of trustworthiness in NLP or new learning paradigms addressing the problem of low availability of resources needed for learning (low-resource NLP).

Interesting research challenges that can be addressed within the topic include:

Large language models and their properties (e.g., hallucination understanding)
Trustworthy NLP (e.g., bias mitigation, explainability of models)
Adapting large language models to a specific context and task (e.g. via PEFT, RAG)
Advanced learning techniques (e.g., transfer learning, multilingual learning)
Domain-specific information extraction and text classification (e.g., novel methods for sentiment analysis, improving conversation quality in chatbots)

Relevant publications:

Pikuliak, M., et al. SlovakBERT: Slovak Masked Language Model. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 7156–7168, ACL, 2022
http://dx.doi.org/10.18653/v1/2022.findings-emnlp.530
Pikuliak, M., Šimko, M. Average Is Not Enough: Caveats of Multilingual Evaluation. In Proceedings of the The 2nd Workshop on Multi-lingual Representation Learning (MRL), pages 125–133, ACL, 2022
http://dx.doi.org/10.18653/v1/2022.mrl-1.13

The research will be performed at the Kempelen Institute of Intelligent Technologies (KInIT, https://kinit.sk) in Bratislava in cooperation with industrial partners or researchers from highly respected research units from abroad. A combined (external) form of study and full employment at KInIT is expected.