Orca: A Few-shot Benchmark for Chinese Conversational Machine Reading Comprehension

The Conversational Machine Reading Comprehension (CMRC) task aims to answer questions in conversations, which has been a hot research topic because of its wide applications. However, existing CMRC benchmarks in which each conversation is coupled with a static passage are inconsistent with real scena...

Full description

Bibliographic Details
Published in:	Findings of the Association for Computational Linguistics: EMNLP 2023
Main Authors:	Chen, Nuo, Li, Hongguang, He, Junqing, Bao, Yinan, Lin, Xinshi, Yang, Qi, Liu, Jianfeng, Gan, Ruyi, Zhang, Jiaxing, Wang, Baoyuan, Li, Jia
Format:	Conference Object
Language:	English
Published:	Association for Computational Linguistics (ACL) 2023
Subjects:	Orca
Online Access:	https://repository.hkust.edu.hk/ir/Record/1783.1-135472 https://doi.org/10.18653/v1/2023.findings-emnlp.1050 http://lbdiscover.ust.hk/uresolver?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rfr_id=info:sid/HKUST:SPI&rft.genre=article&rft.issn=&rft.volume=&rft.issue=&rft.date=2023&rft.spage=15685&rft.aulast=Chen&rft.aufirst=Nuo&rft.atitle=Orca%3A+A+Few-shot+Benchmark+for+Chinese+Conversational+Machine+Reading+Comprehension&rft.title=Findings+of+the+Association+for+Computational+Linguistics%3A+EMNLP+2023 http://www.scopus.com/record/display.url?eid=2-s2.0-85183292698&origin=inward

Description
Summary:	The Conversational Machine Reading Comprehension (CMRC) task aims to answer questions in conversations, which has been a hot research topic because of its wide applications. However, existing CMRC benchmarks in which each conversation is coupled with a static passage are inconsistent with real scenarios. In this regard, it is hard to evaluate model's comprehension ability towards real scenarios. In this work, we propose the first Chinese CMRC benchmark Orca and further provide zero-shot/few-shot settings to evaluate model's generalization ability towards diverse domains. We collect 831 hot-topic driven conversations with 4,742 turns in total. Each turn of a conversation is assigned with a response-related passage, aiming to evaluate model's comprehension ability more reasonably. The topics of conversations are collected from social media platform and cover 33 domains, trying to be consistent with real scenarios. Importantly, answers in Orca are all well-annotated natural responses rather than specific spans or short phrases in previous datasets. We implement two strong frameworks to tackle the challenge in Orca. The results indicate there is substantial room for improvement for strong baselines such as ChatGPT on our CMRC benchmark. Our codes and datasets are available at: https://github.com/nuochenpku/Orca. © 2023 Association for Computational Linguistics.

Orca: A Few-shot Benchmark for Chinese Conversational Machine Reading Comprehension

Similar Items