Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge

Noisy data, crawled from the web or supplied by volunteers such as Mechanical Turkers or citizen scientists, is considered an alternative to professionally labeled data. There has been research focused on mitigating the effects of label noise. It is typically modeled as inaccuracy, where the correct...

Full description

Bibliographic Details
Main Authors:	Brust, Clemens-Alexander, Barz, Björn, Denzler, Joachim
Format:	Text
Language:	unknown
Published:	2020
Subjects:	Computer Science - Computer Vision and Pattern Recognition Computer Science - Machine Learning Snow Bunting
Online Access:	http://arxiv.org/abs/2010.06469

id	ftarxivpreprints:oai:arXiv.org:2010.06469
record_format	openpolar
spelling	ftarxivpreprints:oai:arXiv.org:2010.06469 2023-09-05T13:23:08+02:00 Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge Brust, Clemens-Alexander Barz, Björn Denzler, Joachim 2020-10-13 http://arxiv.org/abs/2010.06469 unknown http://arxiv.org/abs/2010.06469 Computer Science - Computer Vision and Pattern Recognition Computer Science - Machine Learning text 2020 ftarxivpreprints 2023-08-16T16:08:15Z Noisy data, crawled from the web or supplied by volunteers such as Mechanical Turkers or citizen scientists, is considered an alternative to professionally labeled data. There has been research focused on mitigating the effects of label noise. It is typically modeled as inaccuracy, where the correct label is replaced by an incorrect label from the same set. We consider an additional dimension of label noise: imprecision. For example, a non-breeding snow bunting is labeled as a bird. This label is correct, but not as precise as the task requires. Standard softmax classifiers cannot learn from such a weak label because they consider all classes mutually exclusive, which non-breeding snow bunting and bird are not. We propose CHILLAX (Class Hierarchies for Imprecise Label Learning and Annotation eXtrapolation), a method based on hierarchical classification, to fully utilize labels of any precision. Experiments on noisy variants of NABirds and ILSVRC2012 show that our method outperforms strong baselines by as much as 16.4 percentage points, and the current state of the art by up to 3.9 percentage points. Comment: 9 pages pre-print. Accepted for publication at ICPR 2020 Text Snow Bunting ArXiv.org (Cornell University Library)
institution	Open Polar
collection	ArXiv.org (Cornell University Library)
op_collection_id	ftarxivpreprints
language	unknown
topic	Computer Science - Computer Vision and Pattern Recognition Computer Science - Machine Learning
spellingShingle	Computer Science - Computer Vision and Pattern Recognition Computer Science - Machine Learning Brust, Clemens-Alexander Barz, Björn Denzler, Joachim Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge
topic_facet	Computer Science - Computer Vision and Pattern Recognition Computer Science - Machine Learning
description	Noisy data, crawled from the web or supplied by volunteers such as Mechanical Turkers or citizen scientists, is considered an alternative to professionally labeled data. There has been research focused on mitigating the effects of label noise. It is typically modeled as inaccuracy, where the correct label is replaced by an incorrect label from the same set. We consider an additional dimension of label noise: imprecision. For example, a non-breeding snow bunting is labeled as a bird. This label is correct, but not as precise as the task requires. Standard softmax classifiers cannot learn from such a weak label because they consider all classes mutually exclusive, which non-breeding snow bunting and bird are not. We propose CHILLAX (Class Hierarchies for Imprecise Label Learning and Annotation eXtrapolation), a method based on hierarchical classification, to fully utilize labels of any precision. Experiments on noisy variants of NABirds and ILSVRC2012 show that our method outperforms strong baselines by as much as 16.4 percentage points, and the current state of the art by up to 3.9 percentage points. Comment: 9 pages pre-print. Accepted for publication at ICPR 2020
format	Text
author	Brust, Clemens-Alexander Barz, Björn Denzler, Joachim
author_facet	Brust, Clemens-Alexander Barz, Björn Denzler, Joachim
author_sort	Brust, Clemens-Alexander
title	Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge
title_short	Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge
title_full	Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge
title_fullStr	Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge
title_full_unstemmed	Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge
title_sort	making every label count: handling semantic imprecision by integrating domain knowledge
publishDate	2020
url	http://arxiv.org/abs/2010.06469
genre	Snow Bunting
genre_facet	Snow Bunting
op_relation	http://arxiv.org/abs/2010.06469
_version_	1776203716906450944

Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge

Similar Items