Multi-LogiEval: Towards Evaluating Multi-Step Logical Reasoning Ability of Large Language Models ...

As Large Language Models (LLMs) continue to exhibit remarkable performance in natural language understanding tasks, there is a crucial need to measure their ability for human-like multi-step logical reasoning. Existing logical reasoning evaluation benchmarks often focus primarily on simplistic singl...

Full description

Bibliographic Details
Main Authors:	Patel, Nisarg, Kulkarni, Mohith, Parmar, Mihir, Budhiraja, Aashna, Nakamura, Mutsumi, Varshney, Neeraj, Baral, Chitta
Format:	Report
Language:	unknown
Published:	arXiv 2024
Subjects:	Computation and Language cs.CL Artificial Intelligence cs.AI FOS Computer and information sciences Orca
Online Access:	https://dx.doi.org/10.48550/arxiv.2406.17169 https://arxiv.org/abs/2406.17169

Description
Summary:	As Large Language Models (LLMs) continue to exhibit remarkable performance in natural language understanding tasks, there is a crucial need to measure their ability for human-like multi-step logical reasoning. Existing logical reasoning evaluation benchmarks often focus primarily on simplistic single-step or multi-step reasoning with a limited set of inference rules. Furthermore, the lack of datasets for evaluating non-monotonic reasoning represents a crucial gap since it aligns more closely with human-like reasoning. To address these limitations, we propose Multi-LogiEval, a comprehensive evaluation dataset encompassing multi-step logical reasoning with various inference rules and depths. Multi-LogiEval covers three logic types--propositional, first-order, and non-monotonic--consisting of more than 30 inference rules and more than 60 of their combinations with various depths. Leveraging this dataset, we conduct evaluations on a range of LLMs including GPT-4, ChatGPT, Gemini-Pro, Yi, Orca, and Mistral, ... : 23 Pages ...

Multi-LogiEval: Towards Evaluating Multi-Step Logical Reasoning Ability of Large Language Models ...

Similar Items