Open-World Semi-Supervised Learning

A fundamental limitation of applying semi-supervised learning in real-world settings is the assumption that unlabeled test data contains only classes previously encountered in the labeled training data. However, this assumption rarely holds for data in-the-wild, where instances belonging to novel cl...

Full description

Bibliographic Details
Main Authors:	Cao, Kaidi, Brbic, Maria, Leskovec, Jure
Format:	Article in Journal/Newspaper
Language:	unknown
Published:	arXiv 2021
Subjects:	Machine Learning cs.LG Computer Vision and Pattern Recognition cs.CV FOS Computer and information sciences Orca
Online Access:	https://dx.doi.org/10.48550/arxiv.2102.03526 https://arxiv.org/abs/2102.03526

id	ftdatacite:10.48550/arxiv.2102.03526
record_format	openpolar
spelling	ftdatacite:10.48550/arxiv.2102.03526 2023-05-15T17:53:27+02:00 Open-World Semi-Supervised Learning Cao, Kaidi Brbic, Maria Leskovec, Jure 2021 https://dx.doi.org/10.48550/arxiv.2102.03526 https://arxiv.org/abs/2102.03526 unknown arXiv arXiv.org perpetual, non-exclusive license http://arxiv.org/licenses/nonexclusive-distrib/1.0/ Machine Learning cs.LG Computer Vision and Pattern Recognition cs.CV FOS Computer and information sciences Article CreativeWork article Preprint 2021 ftdatacite https://doi.org/10.48550/arxiv.2102.03526 2022-03-10T14:56:55Z A fundamental limitation of applying semi-supervised learning in real-world settings is the assumption that unlabeled test data contains only classes previously encountered in the labeled training data. However, this assumption rarely holds for data in-the-wild, where instances belonging to novel classes may appear at testing time. Here, we introduce a novel open-world semi-supervised learning setting that formalizes the notion that novel classes may appear in the unlabeled test data. In this novel setting, the goal is to solve the class distribution mismatch between labeled and unlabeled data, where at the test time every input instance either needs to be classified into one of the existing classes or a new unseen class needs to be initialized. To tackle this challenging problem, we propose ORCA, an end-to-end deep learning approach that introduces uncertainty adaptive margin mechanism to circumvent the bias towards seen classes caused by learning discriminative features for seen classes faster than for the novel classes. In this way, ORCA reduces the gap between intra-class variance of seen with respect to novel classes. Experiments on image classification datasets and a single-cell annotation dataset demonstrate that ORCA consistently outperforms alternative baselines, achieving 25% improvement on seen and 96% improvement on novel classes of the ImageNet dataset. Article in Journal/Newspaper Orca DataCite Metadata Store (German National Library of Science and Technology)
institution	Open Polar
collection	DataCite Metadata Store (German National Library of Science and Technology)
op_collection_id	ftdatacite
language	unknown
topic	Machine Learning cs.LG Computer Vision and Pattern Recognition cs.CV FOS Computer and information sciences
spellingShingle	Machine Learning cs.LG Computer Vision and Pattern Recognition cs.CV FOS Computer and information sciences Cao, Kaidi Brbic, Maria Leskovec, Jure Open-World Semi-Supervised Learning
topic_facet	Machine Learning cs.LG Computer Vision and Pattern Recognition cs.CV FOS Computer and information sciences
description	A fundamental limitation of applying semi-supervised learning in real-world settings is the assumption that unlabeled test data contains only classes previously encountered in the labeled training data. However, this assumption rarely holds for data in-the-wild, where instances belonging to novel classes may appear at testing time. Here, we introduce a novel open-world semi-supervised learning setting that formalizes the notion that novel classes may appear in the unlabeled test data. In this novel setting, the goal is to solve the class distribution mismatch between labeled and unlabeled data, where at the test time every input instance either needs to be classified into one of the existing classes or a new unseen class needs to be initialized. To tackle this challenging problem, we propose ORCA, an end-to-end deep learning approach that introduces uncertainty adaptive margin mechanism to circumvent the bias towards seen classes caused by learning discriminative features for seen classes faster than for the novel classes. In this way, ORCA reduces the gap between intra-class variance of seen with respect to novel classes. Experiments on image classification datasets and a single-cell annotation dataset demonstrate that ORCA consistently outperforms alternative baselines, achieving 25% improvement on seen and 96% improvement on novel classes of the ImageNet dataset.
format	Article in Journal/Newspaper
author	Cao, Kaidi Brbic, Maria Leskovec, Jure
author_facet	Cao, Kaidi Brbic, Maria Leskovec, Jure
author_sort	Cao, Kaidi
title	Open-World Semi-Supervised Learning
title_short	Open-World Semi-Supervised Learning
title_full	Open-World Semi-Supervised Learning
title_fullStr	Open-World Semi-Supervised Learning
title_full_unstemmed	Open-World Semi-Supervised Learning
title_sort	open-world semi-supervised learning
publisher	arXiv
publishDate	2021
url	https://dx.doi.org/10.48550/arxiv.2102.03526 https://arxiv.org/abs/2102.03526
genre	Orca
genre_facet	Orca
op_rights	arXiv.org perpetual, non-exclusive license http://arxiv.org/licenses/nonexclusive-distrib/1.0/
op_doi	https://doi.org/10.48550/arxiv.2102.03526
_version_	1766161165867548672

Open-World Semi-Supervised Learning

Similar Items