Recent advances in table understanding have focused on instruction-tuning large language models (LLMs) for table-related tasks. However, existing research has overlooked the impact of hyperparameter choices, and also lacks a comprehensive evaluation of the out-of-domain table understanding ability and the general capabilities of these table LLMs. In this paper, we evaluate these abilities in existing table LLMs, and find significant declines in both out-of-domain table understanding and general capabilities as compared to their base models.

Through systematic analysis, we show that hyperparameters, such as learning rate, can significantly influence both table-specific and general capabilities. Contrary to the previous table instruction-tuning work, we demonstrate that smaller learning rates and fewer training instances can enhance table understanding while preserving general capabilities. Based on our findings, we introduce TAMA, a TAble LLM instruction-tuned from LLaMA 3.1 8B Instruct, which achieves performance on par with, or surpassing GPT-3.5 and GPT-4 on table tasks, while maintaining strong out-of-domain generalization and general capabilities. Our findings highlight the potential for reduced data annotation costs and more efficient model development through careful hyperparameter selection.

In this work,

  • We examine the existing table LLMs and reveal that these table LLMs do not generalize to out-of-domain table tasks and show compromised general capabilities compared to their base model.
  • We reveal the impacts of the often-ignored hyperparameter selection such as the learning rate, number of training instances, etc. We find that the commonly-adopted learning rate can be too large, and may lead to suboptimal table understanding performance and compromises the model’s general capabilities. In addition, we can achieve strong table understanding ability with a much smaller amount of training data compared to the existing works.
  • Based on our findings, with careful hyperparameter selection, we instruction-tune LLaMA 3.1 8B Instruct model with 2,600 table instruction data. As an 8B size model, our resulting model, TAMA achieves performance on par with, or even exceeding GPT-3.5 in table understanding tasks, and in some cases surpasses GPT4, while retaining the general capabilities of its base model. Moreover, TAMA exhibits strong out-of-domain table understanding and general capabilities (Figure 1).

Resources

An overview of Performance Comparison

Figure 1: An overview of performance comparison between TAMA and existing table LLMs.

🔥 News

  • 2025.6 We release the model checkpoints models! 🤗
  • 2025.4 Our paper of Rethinking Table Instruction Tuning has been accepted by ACL 2025 Findings! 🎉

Reference

Please kindly cite our paper if you use our code, data, models or results:

@misc{
deng2025rethinking,
title={Rethinking Table Instruction Tuning},
author={Naihao Deng and Rada Mihalcea},
year={2025},
url={https://openreview.net/forum?id=GLmqHCwbOJ}
}