Abstract
Theory of Mind (ToM) is the ability to reason about one's own and
others' mental states. ToM plays a critical role in the development
of intelligence, language understanding, and cognitive processes.
While previous work has primarily focused on first and second-order
ToM, we explore higher-order ToM, which involves recursive reasoning
on individuals' mental states in complex scenarios.
We introduce Hi-ToM, a Higher
Order Theory of
Mind benchmark. Our experimental evaluation using
various Large Language Models (LLMs) indicates a decline in
performance on higher-order ToM tasks, demonstrating the limitations
of current LLMs. We conduct a thorough analysis of different failure
cases of LLMs, and share our thoughts on the implications of our
findings on the future of NLP.