Enhancing Data-to-Text Systems with Neural-Symbolic Methods: An Exploration of Large Language Models as Text Scorers
Summary
Data-to-text generation converts structured data into natural language text, simplifying complex data interpretation and reducing manual effort. Traditional rule-based and neural approaches each offer distinct strengths and weaknesses—rule-based systems ensure data fidelity but often produce rigid text, while neural models generate more natural text but risk deviating from the source data. To address these limitations, a neural-symbolic data-to-text conversational system was proposed, consisting of an information retrieval system, a generative grammar, and a text scorer. This study explores the use of large language models as text scorers, focusing on their ability to align with human judgments when scoring grammar-generated text. A benchmark dataset was created to study human preferences, and several large language models (LLMs) were tested using the sentence-scoring method to obtain model judgments. Experiments revealed that all LLMs struggled with the ``likelihood trap'', favoring bland responses over informative ones. Prompts, including prompts containing shuffled data, were effective in mitigating this issue, suggesting that the prompt's role is less about conveying accurate information and more about mitigating word frequency effects on sentence scoring. Furthermore, increasing model scale did not consistently improve performance, suggesting that larger models primarily enhance competencies that are not critical for the text-scoring task. The FLAN-T5 model outperformed other tested models, with the 783M-parameter variant achieving near-human performance. Finally, the integration of a basic generative grammar with the LLM text scorer demonstrated the effectiveness of the neural-symbolic approach. LLMs' extensive linguistic knowledge allows for simplification of the grammar design, while the grammar ensures accurate data representation in the generated text.