Hi-ToM: A Benchmark for Evaluating Higher-Order Theory of Mind
Reasoning in Large Language Models

Paper
Code
Hi-ToM dataset
BibTeX Citation

Abstract

Theory of Mind (ToM) is the ability to reason about one's own and others' mental states. ToM plays a critical role in the development of intelligence, language understanding, and cognitive processes. While previous work has primarily focused on first and second-order ToM, we explore higher-order ToM, which involves recursive reasoning on individuals' mental states in complex scenarios.

We introduce Hi-ToM, a Higher Order Theory of Mind benchmark. Our experimental evaluation using various Large Language Models (LLMs) indicates a decline in performance on higher-order ToM tasks, demonstrating the limitations of current LLMs. We conduct a thorough analysis of different failure cases of LLMs, and share our thoughts on the implications of our findings on the future of NLP.

Yinghui He
Yufan Wu
Yilin Jia
Rada Mihalcea
Yulong Chen
Naihao Deng

Downloads

PDF Paper

Code

Hi-ToM dataset