Text and Table Numerical Question-answering Model Based on Multi-granularity Cell Contrast
Author:
Affiliation:

Clc Number:

TP18

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    In the task of numerical question-answering with texts and tables, the models are required to perform numerical reasoning based on given texts and tables. The goal is to generate a computational program consisting of multi-step numerical calculations, and the program’s results are used as the answer to the question. To model the texts and tables, the current work linearizes the table into a series of cell sentences through templates and then designs a generator based on the texts and cell sentences to produce the computational program. However, this approach faces a specific problem: the differences between cell sentences generated by templates are minimal, making it difficult for the generator to distinguish between cell sentences that are essential for answering the question (supporting cell sentences) and those irrelevant to the question (distracting cell sentences). Ultimately, the model generates incorrect computational programs based on distracting cell sentences. To tackle this issue, this study proposes an approach called multi-granularity cell semantic contrast (MGCC) for our generator. The main purpose of this approach is to enhance the representation distances between supporting and distracting cell sentences, thereby helping the generator differentiate between them. Specifically, this contrast mechanism is composed of coarse-grained cell semantic contrasts and fine-grained constituent element contrasts, including contrasts in row names, column names, and cell values. The experimental results validate that the proposed MGCC approach enables the generator to achieve better performance than the benchmark model on the FinQA and MultiHiertt numerical reasoning datasets. On the FinQA dataset, it leads to an improvement of up to 3.38% in answer accuracy. Notably, on the more challenging hierarchical table dataset MultiHiertt, it yields a 7.8% increase in the accuracy of the generator. Compared with GPT-3 combined with chain of chain of thought (CoT), MGCC results in respective improvements of 5.44% and 1.69% on the FinQA and MultiHiertt datasets. The subsequent analytical experiments further verify that the multi-granularity cell semantic contrast approach contributes to the model’s improved differentiation between supporting and distracting cell sentences.

    Reference
    Related
    Cited by
Get Citation

琚江舟,毛云麟,吴震,陈宇飞,戴新宇,陈家骏.多粒度单元格对比的文本和表格数值问答模型.软件学报,,():1-22

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:December 21,2023
  • Revised:March 01,2024
  • Adopted:
  • Online: June 20,2024
  • Published:
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063