Orca: A Few-shot Benchmark for Chinese Conversational Machine Reading Comprehension

The conversational machine reading comprehension (CMRC) task aims to answer questions in conversations, which has been a hot research topic in recent years because of its wide applications. However, existing CMRC benchmarks in which each conversation is assigned a static passage are inconsistent wit...

Full description

Bibliographic Details
Main Authors: Chen, Nuo, Li, Hongguang, He, Junqing, Bao, Yinan, Lin, Xinshi, Yang, Qi, Liu, Jianfeng, Gan, Ruyi, Zhang, Jiaxing, Wang, Baoyuan, Li, Jia
Format: Text
Language:unknown
Published: 2023
Subjects:
Online Access:http://arxiv.org/abs/2302.13619
id ftarxivpreprints:oai:arXiv.org:2302.13619
record_format openpolar
spelling ftarxivpreprints:oai:arXiv.org:2302.13619 2023-11-12T04:24:06+01:00 Orca: A Few-shot Benchmark for Chinese Conversational Machine Reading Comprehension Chen, Nuo Li, Hongguang He, Junqing Bao, Yinan Lin, Xinshi Yang, Qi Liu, Jianfeng Gan, Ruyi Zhang, Jiaxing Wang, Baoyuan Li, Jia 2023-02-27 http://arxiv.org/abs/2302.13619 unknown http://arxiv.org/abs/2302.13619 EMNLP 2023 Computer Science - Computation and Language Computer Science - Artificial Intelligence text 2023 ftarxivpreprints 2023-10-22T01:06:24Z The conversational machine reading comprehension (CMRC) task aims to answer questions in conversations, which has been a hot research topic in recent years because of its wide applications. However, existing CMRC benchmarks in which each conversation is assigned a static passage are inconsistent with real scenarios. Thus, model's comprehension ability towards real scenarios are hard to evaluate reasonably. To this end, we propose the first Chinese CMRC benchmark Orca and further provide zero-shot/few-shot settings to evaluate model's generalization ability towards diverse domains. We collect 831 hot-topic driven conversations with 4,742 turns in total. Each turn of a conversation is assigned with a response-related passage, aiming to evaluate model's comprehension ability more reasonably. The topics of conversations are collected from social media platform and cover 33 domains, trying to be consistent with real scenarios. Importantly, answers in Orca are all well-annotated natural responses rather than the specific spans or short phrase in previous datasets. Besides, we implement three strong baselines to tackle the challenge in Orca. The results indicate the great challenge of our CMRC benchmark. Our datatset and checkpoints are available at https://github.com/nuochenpku/Orca. Comment: 14 pages Text Orca ArXiv.org (Cornell University Library)
institution Open Polar
collection ArXiv.org (Cornell University Library)
op_collection_id ftarxivpreprints
language unknown
topic Computer Science - Computation and Language
Computer Science - Artificial Intelligence
spellingShingle Computer Science - Computation and Language
Computer Science - Artificial Intelligence
Chen, Nuo
Li, Hongguang
He, Junqing
Bao, Yinan
Lin, Xinshi
Yang, Qi
Liu, Jianfeng
Gan, Ruyi
Zhang, Jiaxing
Wang, Baoyuan
Li, Jia
Orca: A Few-shot Benchmark for Chinese Conversational Machine Reading Comprehension
topic_facet Computer Science - Computation and Language
Computer Science - Artificial Intelligence
description The conversational machine reading comprehension (CMRC) task aims to answer questions in conversations, which has been a hot research topic in recent years because of its wide applications. However, existing CMRC benchmarks in which each conversation is assigned a static passage are inconsistent with real scenarios. Thus, model's comprehension ability towards real scenarios are hard to evaluate reasonably. To this end, we propose the first Chinese CMRC benchmark Orca and further provide zero-shot/few-shot settings to evaluate model's generalization ability towards diverse domains. We collect 831 hot-topic driven conversations with 4,742 turns in total. Each turn of a conversation is assigned with a response-related passage, aiming to evaluate model's comprehension ability more reasonably. The topics of conversations are collected from social media platform and cover 33 domains, trying to be consistent with real scenarios. Importantly, answers in Orca are all well-annotated natural responses rather than the specific spans or short phrase in previous datasets. Besides, we implement three strong baselines to tackle the challenge in Orca. The results indicate the great challenge of our CMRC benchmark. Our datatset and checkpoints are available at https://github.com/nuochenpku/Orca. Comment: 14 pages
format Text
author Chen, Nuo
Li, Hongguang
He, Junqing
Bao, Yinan
Lin, Xinshi
Yang, Qi
Liu, Jianfeng
Gan, Ruyi
Zhang, Jiaxing
Wang, Baoyuan
Li, Jia
author_facet Chen, Nuo
Li, Hongguang
He, Junqing
Bao, Yinan
Lin, Xinshi
Yang, Qi
Liu, Jianfeng
Gan, Ruyi
Zhang, Jiaxing
Wang, Baoyuan
Li, Jia
author_sort Chen, Nuo
title Orca: A Few-shot Benchmark for Chinese Conversational Machine Reading Comprehension
title_short Orca: A Few-shot Benchmark for Chinese Conversational Machine Reading Comprehension
title_full Orca: A Few-shot Benchmark for Chinese Conversational Machine Reading Comprehension
title_fullStr Orca: A Few-shot Benchmark for Chinese Conversational Machine Reading Comprehension
title_full_unstemmed Orca: A Few-shot Benchmark for Chinese Conversational Machine Reading Comprehension
title_sort orca: a few-shot benchmark for chinese conversational machine reading comprehension
publishDate 2023
url http://arxiv.org/abs/2302.13619
genre Orca
genre_facet Orca
op_relation http://arxiv.org/abs/2302.13619
EMNLP 2023
_version_ 1782338674759827456