Cross-functional Analysis of Generalisation in Behavioural Learning

In behavioural testing, system functionalities underrepresented in the standard evaluation setting (with a held-out test set) are validated through controlled input-output pairs. Optimising performance on the behavioural tests during training (behavioural learning) would improve coverage of phenomen...

Full description

Bibliographic Details
Published in:	Transactions of the Association for Computational Linguistics
Main Authors:	de Araujo, Pedro Henrique Luz, Roth, Benjamin
Format:	Text
Language:	unknown
Published:	2023
Subjects:	Computer Science - Computation and Language Computer Science - Machine Learning Beluga Beluga*
Online Access:	http://arxiv.org/abs/2305.12951 https://doi.org/10.1162/tacl_a_00590

id	ftarxivpreprints:oai:arXiv.org:2305.12951
record_format	openpolar
spelling	ftarxivpreprints:oai:arXiv.org:2305.12951 2023-10-01T03:55:03+02:00 Cross-functional Analysis of Generalisation in Behavioural Learning de Araujo, Pedro Henrique Luz Roth, Benjamin 2023-05-22 http://arxiv.org/abs/2305.12951 https://doi.org/10.1162/tacl_a_00590 unknown http://arxiv.org/abs/2305.12951 doi:10.1162/tacl_a_00590 Computer Science - Computation and Language Computer Science - Machine Learning text 2023 ftarxivpreprints https://doi.org/10.1162/tacl_a_00590 2023-09-03T01:06:06Z In behavioural testing, system functionalities underrepresented in the standard evaluation setting (with a held-out test set) are validated through controlled input-output pairs. Optimising performance on the behavioural tests during training (behavioural learning) would improve coverage of phenomena not sufficiently represented in the i.i.d. data and could lead to seemingly more robust models. However, there is the risk that the model narrowly captures spurious correlations from the behavioural test suite, leading to overestimation and misrepresentation of model performance -- one of the original pitfalls of traditional evaluation. In this work, we introduce BeLUGA, an analysis method for evaluating behavioural learning considering generalisation across dimensions of different granularity levels. We optimise behaviour-specific loss functions and evaluate models on several partitions of the behavioural test suite controlled to leave out specific phenomena. An aggregate score measures generalisation to unseen functionalities (or overfitting). We use BeLUGA to examine three representative NLP tasks (sentiment analysis, paraphrase identification and reading comprehension) and compare the impact of a diverse set of regularisation and domain generalisation methods on generalisation performance. Comment: 16 pages, 1 figure. To be published in the Transactions of the Association for Computational Linguistics (TACL). This preprint is a pre-MIT Press publication version Text Beluga Beluga* ArXiv.org (Cornell University Library) Transactions of the Association for Computational Linguistics 11 1066 1081
institution	Open Polar
collection	ArXiv.org (Cornell University Library)
op_collection_id	ftarxivpreprints
language	unknown
topic	Computer Science - Computation and Language Computer Science - Machine Learning
spellingShingle	Computer Science - Computation and Language Computer Science - Machine Learning de Araujo, Pedro Henrique Luz Roth, Benjamin Cross-functional Analysis of Generalisation in Behavioural Learning
topic_facet	Computer Science - Computation and Language Computer Science - Machine Learning
description	In behavioural testing, system functionalities underrepresented in the standard evaluation setting (with a held-out test set) are validated through controlled input-output pairs. Optimising performance on the behavioural tests during training (behavioural learning) would improve coverage of phenomena not sufficiently represented in the i.i.d. data and could lead to seemingly more robust models. However, there is the risk that the model narrowly captures spurious correlations from the behavioural test suite, leading to overestimation and misrepresentation of model performance -- one of the original pitfalls of traditional evaluation. In this work, we introduce BeLUGA, an analysis method for evaluating behavioural learning considering generalisation across dimensions of different granularity levels. We optimise behaviour-specific loss functions and evaluate models on several partitions of the behavioural test suite controlled to leave out specific phenomena. An aggregate score measures generalisation to unseen functionalities (or overfitting). We use BeLUGA to examine three representative NLP tasks (sentiment analysis, paraphrase identification and reading comprehension) and compare the impact of a diverse set of regularisation and domain generalisation methods on generalisation performance. Comment: 16 pages, 1 figure. To be published in the Transactions of the Association for Computational Linguistics (TACL). This preprint is a pre-MIT Press publication version
format	Text
author	de Araujo, Pedro Henrique Luz Roth, Benjamin
author_facet	de Araujo, Pedro Henrique Luz Roth, Benjamin
author_sort	de Araujo, Pedro Henrique Luz
title	Cross-functional Analysis of Generalisation in Behavioural Learning
title_short	Cross-functional Analysis of Generalisation in Behavioural Learning
title_full	Cross-functional Analysis of Generalisation in Behavioural Learning
title_fullStr	Cross-functional Analysis of Generalisation in Behavioural Learning
title_full_unstemmed	Cross-functional Analysis of Generalisation in Behavioural Learning
title_sort	cross-functional analysis of generalisation in behavioural learning
publishDate	2023
url	http://arxiv.org/abs/2305.12951 https://doi.org/10.1162/tacl_a_00590
genre	Beluga Beluga*
genre_facet	Beluga Beluga*
op_relation	http://arxiv.org/abs/2305.12951 doi:10.1162/tacl_a_00590
op_doi	https://doi.org/10.1162/tacl_a_00590
container_title	Transactions of the Association for Computational Linguistics
container_volume	11
container_start_page	1066
op_container_end_page	1081
_version_	1778523170375991296

Cross-functional Analysis of Generalisation in Behavioural Learning

Similar Items