Dissertation Topic
Addressing Limitations of Large Language Models
Academic Year: 2024/2025
Supervisor: Gregor Michal, doc. Ing., Ph.D.
Programs:
Information Technology (DIT) - combined study
Information Technology (DIT-EN) - combined study
Large language models (LLMs) are powerful tools that can support a wide range of downstream tasks. They can be used e.g. in advanced conversational interfaces or in various tasks that involve retrieval, classification, generation, and more. Such tasks can be approached through zero-shot or few-shot in-context learning, or by fine-tuning the LLM on larger datasets (typically using parameter-efficient techniques to reduce memory and storage requirements). Despite their unprecedented performance in many tasks, LLMs suffer from several significant limitations that currently hinder their safe and widespread use in many domains. These limitations include tendencies to generate responses not supported by the training corpus or input context (hallucination), difficulties in handling extremely long contexts (e.g., entire books), and limited ability to utilize other data modalities such as vision, where state-of-the-art models generally struggle to recognize fine-grained concepts. The goal of this research is to explore such limitations, and – after selecting one or two of them to focus on – to propose new strategies to mitigate them. These strategies may include e.g.: • Shifting the generation mode closer to retrieval-style approaches and non-parametric language models; • Augmenting models with self-correction mechanisms and self-evaluation pipelines; • Efficiently supporting extended contexts; • Fuller utilization of multimodality, especially in the context of vision-language models; explainability analysis of models and the design of new training mechanisms supporting the ability to recognize fine-grained visual concepts as well; • Introducing novel fine-tuning techniques; • Improving and further utilizing the reasoning abilities of LLMs. Relevant publications: • Srba, I., Pecher, B., Tomlein, M., Moro, R., Stefancova, E., Simko, J. and Bielikova, M., 2022, July. Monant medical misinformation dataset: Mapping articles to fact-checked claims. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 2949-2959). https://dl.acm.org/doi/10.1145/3477495.3531726 • Pikuliak, M., Srba, I., Moro, R., Hromadka, T., Smolen, T., Melisek, M., Vykopal, I., Simko, J., Podrouzek, J. and Bielikova, M., 2023. Multilingual Previously Fact-Checked Claim Retrieval. https://arxiv.org/abs/2305.07991 The application domain can be for example support for fact-checking and disinformation combatting, where the factuality of LLM outputs is absolutely critical. The research will be performed at the Kempelen Institute of Intelligent Technologies (KInIT) in Bratislava in cooperation with researchers from highly respected research units. A combined (external) form of study and employment at KInIT is expected.