Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge

Noisy data, crawled from the web or supplied by volunteers such as Mechanical Turkers or citizen scientists, is considered an alternative to professionally labeled data. There has been research focused on mitigating the effects of label noise. It is typically modeled as inaccuracy, where the correct...

Full description

Bibliographic Details
Main Authors: Brust, Clemens-Alexander, Barz, Björn, Denzler, Joachim
Format: Article in Journal/Newspaper
Language:unknown
Published: arXiv 2020
Subjects:
Online Access:https://dx.doi.org/10.48550/arxiv.2010.06469
https://arxiv.org/abs/2010.06469
id ftdatacite:10.48550/arxiv.2010.06469
record_format openpolar
spelling ftdatacite:10.48550/arxiv.2010.06469 2023-05-15T18:20:04+02:00 Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge Brust, Clemens-Alexander Barz, Björn Denzler, Joachim 2020 https://dx.doi.org/10.48550/arxiv.2010.06469 https://arxiv.org/abs/2010.06469 unknown arXiv arXiv.org perpetual, non-exclusive license http://arxiv.org/licenses/nonexclusive-distrib/1.0/ Computer Vision and Pattern Recognition cs.CV Machine Learning cs.LG FOS Computer and information sciences Article CreativeWork article Preprint 2020 ftdatacite https://doi.org/10.48550/arxiv.2010.06469 2022-03-10T15:05:57Z Noisy data, crawled from the web or supplied by volunteers such as Mechanical Turkers or citizen scientists, is considered an alternative to professionally labeled data. There has been research focused on mitigating the effects of label noise. It is typically modeled as inaccuracy, where the correct label is replaced by an incorrect label from the same set. We consider an additional dimension of label noise: imprecision. For example, a non-breeding snow bunting is labeled as a bird. This label is correct, but not as precise as the task requires. Standard softmax classifiers cannot learn from such a weak label because they consider all classes mutually exclusive, which non-breeding snow bunting and bird are not. We propose CHILLAX (Class Hierarchies for Imprecise Label Learning and Annotation eXtrapolation), a method based on hierarchical classification, to fully utilize labels of any precision. Experiments on noisy variants of NABirds and ILSVRC2012 show that our method outperforms strong baselines by as much as 16.4 percentage points, and the current state of the art by up to 3.9 percentage points. : 9 pages pre-print. Accepted for publication at ICPR 2020 Article in Journal/Newspaper Snow Bunting DataCite Metadata Store (German National Library of Science and Technology)
institution Open Polar
collection DataCite Metadata Store (German National Library of Science and Technology)
op_collection_id ftdatacite
language unknown
topic Computer Vision and Pattern Recognition cs.CV
Machine Learning cs.LG
FOS Computer and information sciences
spellingShingle Computer Vision and Pattern Recognition cs.CV
Machine Learning cs.LG
FOS Computer and information sciences
Brust, Clemens-Alexander
Barz, Björn
Denzler, Joachim
Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge
topic_facet Computer Vision and Pattern Recognition cs.CV
Machine Learning cs.LG
FOS Computer and information sciences
description Noisy data, crawled from the web or supplied by volunteers such as Mechanical Turkers or citizen scientists, is considered an alternative to professionally labeled data. There has been research focused on mitigating the effects of label noise. It is typically modeled as inaccuracy, where the correct label is replaced by an incorrect label from the same set. We consider an additional dimension of label noise: imprecision. For example, a non-breeding snow bunting is labeled as a bird. This label is correct, but not as precise as the task requires. Standard softmax classifiers cannot learn from such a weak label because they consider all classes mutually exclusive, which non-breeding snow bunting and bird are not. We propose CHILLAX (Class Hierarchies for Imprecise Label Learning and Annotation eXtrapolation), a method based on hierarchical classification, to fully utilize labels of any precision. Experiments on noisy variants of NABirds and ILSVRC2012 show that our method outperforms strong baselines by as much as 16.4 percentage points, and the current state of the art by up to 3.9 percentage points. : 9 pages pre-print. Accepted for publication at ICPR 2020
format Article in Journal/Newspaper
author Brust, Clemens-Alexander
Barz, Björn
Denzler, Joachim
author_facet Brust, Clemens-Alexander
Barz, Björn
Denzler, Joachim
author_sort Brust, Clemens-Alexander
title Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge
title_short Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge
title_full Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge
title_fullStr Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge
title_full_unstemmed Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge
title_sort making every label count: handling semantic imprecision by integrating domain knowledge
publisher arXiv
publishDate 2020
url https://dx.doi.org/10.48550/arxiv.2010.06469
https://arxiv.org/abs/2010.06469
genre Snow Bunting
genre_facet Snow Bunting
op_rights arXiv.org perpetual, non-exclusive license
http://arxiv.org/licenses/nonexclusive-distrib/1.0/
op_doi https://doi.org/10.48550/arxiv.2010.06469
_version_ 1766197540533829632