Cross-functional Analysis of Generalisation in Behavioural Learning ...

In behavioural testing, system functionalities underrepresented in the standard evaluation setting (with a held-out test set) are validated through controlled input-output pairs. Optimising performance on the behavioural tests during training (behavioural learning) would improve coverage of phenomen...

Full description

Bibliographic Details
Main Authors: de Araujo, Pedro Henrique Luz, Roth, Benjamin
Format: Text
Language:unknown
Published: arXiv 2023
Subjects:
Online Access:https://dx.doi.org/10.48550/arxiv.2305.12951
https://arxiv.org/abs/2305.12951
Description
Summary:In behavioural testing, system functionalities underrepresented in the standard evaluation setting (with a held-out test set) are validated through controlled input-output pairs. Optimising performance on the behavioural tests during training (behavioural learning) would improve coverage of phenomena not sufficiently represented in the i.i.d. data and could lead to seemingly more robust models. However, there is the risk that the model narrowly captures spurious correlations from the behavioural test suite, leading to overestimation and misrepresentation of model performance -- one of the original pitfalls of traditional evaluation. In this work, we introduce BeLUGA, an analysis method for evaluating behavioural learning considering generalisation across dimensions of different granularity levels. We optimise behaviour-specific loss functions and evaluate models on several partitions of the behavioural test suite controlled to leave out specific phenomena. An aggregate score measures generalisation to unseen ... : 16 pages, 1 figure. To be published in the Transactions of the Association for Computational Linguistics (TACL). This preprint is a pre-MIT Press publication version ...