Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge

Noisy data, crawled from the web or supplied by volunteers such as Mechanical Turkers or citizen scientists, is considered an alternative to professionally labeled data. There has been research focused on mitigating the effects of label noise. It is typically modeled as inaccuracy, where the correct...

Full description

Bibliographic Details
Main Authors: Brust, Clemens-Alexander, Barz, Björn, Denzler, Joachim
Format: Text
Language:unknown
Published: 2020
Subjects:
Online Access:http://arxiv.org/abs/2010.06469
id ftarxivpreprints:oai:arXiv.org:2010.06469
record_format openpolar
spelling ftarxivpreprints:oai:arXiv.org:2010.06469 2023-09-05T13:23:08+02:00 Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge Brust, Clemens-Alexander Barz, Björn Denzler, Joachim 2020-10-13 http://arxiv.org/abs/2010.06469 unknown http://arxiv.org/abs/2010.06469 Computer Science - Computer Vision and Pattern Recognition Computer Science - Machine Learning text 2020 ftarxivpreprints 2023-08-16T16:08:15Z Noisy data, crawled from the web or supplied by volunteers such as Mechanical Turkers or citizen scientists, is considered an alternative to professionally labeled data. There has been research focused on mitigating the effects of label noise. It is typically modeled as inaccuracy, where the correct label is replaced by an incorrect label from the same set. We consider an additional dimension of label noise: imprecision. For example, a non-breeding snow bunting is labeled as a bird. This label is correct, but not as precise as the task requires. Standard softmax classifiers cannot learn from such a weak label because they consider all classes mutually exclusive, which non-breeding snow bunting and bird are not. We propose CHILLAX (Class Hierarchies for Imprecise Label Learning and Annotation eXtrapolation), a method based on hierarchical classification, to fully utilize labels of any precision. Experiments on noisy variants of NABirds and ILSVRC2012 show that our method outperforms strong baselines by as much as 16.4 percentage points, and the current state of the art by up to 3.9 percentage points. Comment: 9 pages pre-print. Accepted for publication at ICPR 2020 Text Snow Bunting ArXiv.org (Cornell University Library)
institution Open Polar
collection ArXiv.org (Cornell University Library)
op_collection_id ftarxivpreprints
language unknown
topic Computer Science - Computer Vision and Pattern Recognition
Computer Science - Machine Learning
spellingShingle Computer Science - Computer Vision and Pattern Recognition
Computer Science - Machine Learning
Brust, Clemens-Alexander
Barz, Björn
Denzler, Joachim
Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge
topic_facet Computer Science - Computer Vision and Pattern Recognition
Computer Science - Machine Learning
description Noisy data, crawled from the web or supplied by volunteers such as Mechanical Turkers or citizen scientists, is considered an alternative to professionally labeled data. There has been research focused on mitigating the effects of label noise. It is typically modeled as inaccuracy, where the correct label is replaced by an incorrect label from the same set. We consider an additional dimension of label noise: imprecision. For example, a non-breeding snow bunting is labeled as a bird. This label is correct, but not as precise as the task requires. Standard softmax classifiers cannot learn from such a weak label because they consider all classes mutually exclusive, which non-breeding snow bunting and bird are not. We propose CHILLAX (Class Hierarchies for Imprecise Label Learning and Annotation eXtrapolation), a method based on hierarchical classification, to fully utilize labels of any precision. Experiments on noisy variants of NABirds and ILSVRC2012 show that our method outperforms strong baselines by as much as 16.4 percentage points, and the current state of the art by up to 3.9 percentage points. Comment: 9 pages pre-print. Accepted for publication at ICPR 2020
format Text
author Brust, Clemens-Alexander
Barz, Björn
Denzler, Joachim
author_facet Brust, Clemens-Alexander
Barz, Björn
Denzler, Joachim
author_sort Brust, Clemens-Alexander
title Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge
title_short Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge
title_full Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge
title_fullStr Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge
title_full_unstemmed Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge
title_sort making every label count: handling semantic imprecision by integrating domain knowledge
publishDate 2020
url http://arxiv.org/abs/2010.06469
genre Snow Bunting
genre_facet Snow Bunting
op_relation http://arxiv.org/abs/2010.06469
_version_ 1776203716906450944