Detail výsledku

DEEMO: De-identity Multimodal Emotion Recognition and Reasoning

LI, D.; XING, B.; LIU, X.; XIA, B.; WEN, B.; KÄLVIÄINEN, H. DEEMO: De-identity Multimodal Emotion Recognition and Reasoning. MM '25: Proceedings of the 33rd ACM International Conference on Multimedia. New York, NY, USA: ACM, 2025. p. 5707-5716. ISBN: 979-8-4007-2035-2.
Typ
článek ve sborníku konference
Jazyk
angličtina
Autoři
Li Deng
Xing Bohao
Liu Xin
Xia Baiqiang
Wen Bihan
Kälviäinen Heikki Antero, prof., Dr., UPGM (FIT)
Abstrakt

Emotion understanding is a critical yet challenging task. Most existing approaches rely heavily on identity-sensitive information, such as facial expressions and speech, which raises concerns about personal privacy. To address this, we introduce the De-identity Multimodal Emotion Recognition and Reasoning ( DEEMO ), a novel task designed to enable emotion understanding using de-identified video and audio inputs. The DEEMO dataset consists of two subsets: DEEMO-NFBL , which includes rich annotations of Non-Facial Body Language (NFBL), and DEEMO-MER , an instruction dataset for Multimodal Emotion Recognition and Reasoning using identity-free cues. This design supports emotion understanding without compromising identity privacy. In addition, we propose DEEMO-LLaMA, a Multimodal Large Language Model (MLLM) that integrates de-identified audio, video, and textual information to enhance both emotion recognition and reasoning. Extensive experiments show that DEEMO-LLaMA achieves state-of-the-art performance on both tasks, outperforming existing MLLMs by a significant margin, achieving 74.49% accuracy and 74.45% F1-score in de-identity emotion recognition, and 6.20 clue overlap and 7.66 label overlap in de-identity emotion reasoning. Our work contributes to ethical AI by advancing privacy-preserving emotion understanding and promoting responsible affective computing. The dataset and codes will be available at https://github.com/Leedeng/DEEMO.

Klíčová slova

Affective Computing, Emotion Understanding, Identity-free, Multi-modal Large Language Model

Rok
2025
Strany
5707–5716
Sborník
MM '25: Proceedings of the 33rd ACM International Conference on Multimedia
ISBN
979-8-4007-2035-2
Vydavatel
ACM
Místo
New York, NY, USA
DOI
BibTeX
@inproceedings{BUT199660,
  author="{} and  {} and  {} and  {} and  {} and Heikki Antero {Kälviäinen}",
  title="DEEMO: De-identity Multimodal Emotion Recognition and Reasoning",
  booktitle="MM '25: Proceedings of the 33rd ACM International Conference on Multimedia",
  year="2025",
  pages="5707--5716",
  publisher="ACM",
  address="New York, NY, USA",
  doi="10.1145/3746027.3755411",
  isbn="979-8-4007-2035-2"
}
Pracoviště
Nahoru