Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge
Noisy data, crawled from the web or supplied by volunteers such as Mechanical Turkers or citizen scientists, is considered an alternative to professionally labeled data. There has been research focused on mitigating the effects of label noise. It is typically modeled as inaccuracy, where the correct...
Main Authors: | , , |
---|---|
Format: | Text |
Language: | unknown |
Published: |
2020
|
Subjects: | |
Online Access: | http://arxiv.org/abs/2010.06469 |
id |
ftarxivpreprints:oai:arXiv.org:2010.06469 |
---|---|
record_format |
openpolar |
spelling |
ftarxivpreprints:oai:arXiv.org:2010.06469 2023-09-05T13:23:08+02:00 Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge Brust, Clemens-Alexander Barz, Björn Denzler, Joachim 2020-10-13 http://arxiv.org/abs/2010.06469 unknown http://arxiv.org/abs/2010.06469 Computer Science - Computer Vision and Pattern Recognition Computer Science - Machine Learning text 2020 ftarxivpreprints 2023-08-16T16:08:15Z Noisy data, crawled from the web or supplied by volunteers such as Mechanical Turkers or citizen scientists, is considered an alternative to professionally labeled data. There has been research focused on mitigating the effects of label noise. It is typically modeled as inaccuracy, where the correct label is replaced by an incorrect label from the same set. We consider an additional dimension of label noise: imprecision. For example, a non-breeding snow bunting is labeled as a bird. This label is correct, but not as precise as the task requires. Standard softmax classifiers cannot learn from such a weak label because they consider all classes mutually exclusive, which non-breeding snow bunting and bird are not. We propose CHILLAX (Class Hierarchies for Imprecise Label Learning and Annotation eXtrapolation), a method based on hierarchical classification, to fully utilize labels of any precision. Experiments on noisy variants of NABirds and ILSVRC2012 show that our method outperforms strong baselines by as much as 16.4 percentage points, and the current state of the art by up to 3.9 percentage points. Comment: 9 pages pre-print. Accepted for publication at ICPR 2020 Text Snow Bunting ArXiv.org (Cornell University Library) |
institution |
Open Polar |
collection |
ArXiv.org (Cornell University Library) |
op_collection_id |
ftarxivpreprints |
language |
unknown |
topic |
Computer Science - Computer Vision and Pattern Recognition Computer Science - Machine Learning |
spellingShingle |
Computer Science - Computer Vision and Pattern Recognition Computer Science - Machine Learning Brust, Clemens-Alexander Barz, Björn Denzler, Joachim Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge |
topic_facet |
Computer Science - Computer Vision and Pattern Recognition Computer Science - Machine Learning |
description |
Noisy data, crawled from the web or supplied by volunteers such as Mechanical Turkers or citizen scientists, is considered an alternative to professionally labeled data. There has been research focused on mitigating the effects of label noise. It is typically modeled as inaccuracy, where the correct label is replaced by an incorrect label from the same set. We consider an additional dimension of label noise: imprecision. For example, a non-breeding snow bunting is labeled as a bird. This label is correct, but not as precise as the task requires. Standard softmax classifiers cannot learn from such a weak label because they consider all classes mutually exclusive, which non-breeding snow bunting and bird are not. We propose CHILLAX (Class Hierarchies for Imprecise Label Learning and Annotation eXtrapolation), a method based on hierarchical classification, to fully utilize labels of any precision. Experiments on noisy variants of NABirds and ILSVRC2012 show that our method outperforms strong baselines by as much as 16.4 percentage points, and the current state of the art by up to 3.9 percentage points. Comment: 9 pages pre-print. Accepted for publication at ICPR 2020 |
format |
Text |
author |
Brust, Clemens-Alexander Barz, Björn Denzler, Joachim |
author_facet |
Brust, Clemens-Alexander Barz, Björn Denzler, Joachim |
author_sort |
Brust, Clemens-Alexander |
title |
Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge |
title_short |
Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge |
title_full |
Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge |
title_fullStr |
Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge |
title_full_unstemmed |
Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge |
title_sort |
making every label count: handling semantic imprecision by integrating domain knowledge |
publishDate |
2020 |
url |
http://arxiv.org/abs/2010.06469 |
genre |
Snow Bunting |
genre_facet |
Snow Bunting |
op_relation |
http://arxiv.org/abs/2010.06469 |
_version_ |
1776203716906450944 |