Modality-Aware Triplet Hard Mining for Zero-shot Sketch-Based Image Retrieval

This paper tackles the Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) problem from the viewpoint of cross-modality metric learning. This task has two characteristics: 1) the zero-shot setting requires a metric space with good within-class compactness and the between-class discrepancy for recognizi...

Full description

Bibliographic Details
Main Authors:	Huang, Zongheng, Sun, YiFan, Han, Chuchu, Gao, Changxin, Sang, Nong
Format:	Text
Language:	unknown
Published:	2021
Subjects:	Computer Science - Computer Vision and Pattern Recognition DML
Online Access:	http://arxiv.org/abs/2112.07966

id	ftarxivpreprints:oai:arXiv.org:2112.07966
record_format	openpolar
spelling	ftarxivpreprints:oai:arXiv.org:2112.07966 2023-09-05T13:19:06+02:00 Modality-Aware Triplet Hard Mining for Zero-shot Sketch-Based Image Retrieval Huang, Zongheng Sun, YiFan Han, Chuchu Gao, Changxin Sang, Nong 2021-12-15 http://arxiv.org/abs/2112.07966 unknown http://arxiv.org/abs/2112.07966 Computer Science - Computer Vision and Pattern Recognition text 2021 ftarxivpreprints 2023-08-16T16:50:19Z This paper tackles the Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) problem from the viewpoint of cross-modality metric learning. This task has two characteristics: 1) the zero-shot setting requires a metric space with good within-class compactness and the between-class discrepancy for recognizing the novel classes and 2) the sketch query and the photo gallery are in different modalities. The metric learning viewpoint benefits ZS-SBIR from two aspects. First, it facilitates improvement through recent good practices in deep metric learning (DML). By combining two fundamental learning approaches in DML, e.g., classification training and pairwise training, we set up a strong baseline for ZS-SBIR. Without bells and whistles, this baseline achieves competitive retrieval accuracy. Second, it provides an insight that properly suppressing the modality gap is critical. To this end, we design a novel method named Modality-Aware Triplet Hard Mining (MATHM). MATHM enhances the baseline with three types of pairwise learning, e.g., a cross-modality sample pair, a within-modality sample pair, and their combination.\We also design an adaptive weighting method to balance these three components during training dynamically. Experimental results confirm that MATHM brings another round of significant improvement based on the strong baseline and sets up new state-of-the-art performance. For example, on the TU-Berlin dataset, we achieve 47.88+2.94% mAP@all and 58.28+2.34% Prec@100. Code will be publicly available at: https://github.com/huangzongheng/MATHM. Comment: 13 pages, 7 figures Text DML ArXiv.org (Cornell University Library)
institution	Open Polar
collection	ArXiv.org (Cornell University Library)
op_collection_id	ftarxivpreprints
language	unknown
topic	Computer Science - Computer Vision and Pattern Recognition
spellingShingle	Computer Science - Computer Vision and Pattern Recognition Huang, Zongheng Sun, YiFan Han, Chuchu Gao, Changxin Sang, Nong Modality-Aware Triplet Hard Mining for Zero-shot Sketch-Based Image Retrieval
topic_facet	Computer Science - Computer Vision and Pattern Recognition
description	This paper tackles the Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) problem from the viewpoint of cross-modality metric learning. This task has two characteristics: 1) the zero-shot setting requires a metric space with good within-class compactness and the between-class discrepancy for recognizing the novel classes and 2) the sketch query and the photo gallery are in different modalities. The metric learning viewpoint benefits ZS-SBIR from two aspects. First, it facilitates improvement through recent good practices in deep metric learning (DML). By combining two fundamental learning approaches in DML, e.g., classification training and pairwise training, we set up a strong baseline for ZS-SBIR. Without bells and whistles, this baseline achieves competitive retrieval accuracy. Second, it provides an insight that properly suppressing the modality gap is critical. To this end, we design a novel method named Modality-Aware Triplet Hard Mining (MATHM). MATHM enhances the baseline with three types of pairwise learning, e.g., a cross-modality sample pair, a within-modality sample pair, and their combination.\We also design an adaptive weighting method to balance these three components during training dynamically. Experimental results confirm that MATHM brings another round of significant improvement based on the strong baseline and sets up new state-of-the-art performance. For example, on the TU-Berlin dataset, we achieve 47.88+2.94% mAP@all and 58.28+2.34% Prec@100. Code will be publicly available at: https://github.com/huangzongheng/MATHM. Comment: 13 pages, 7 figures
format	Text
author	Huang, Zongheng Sun, YiFan Han, Chuchu Gao, Changxin Sang, Nong
author_facet	Huang, Zongheng Sun, YiFan Han, Chuchu Gao, Changxin Sang, Nong
author_sort	Huang, Zongheng
title	Modality-Aware Triplet Hard Mining for Zero-shot Sketch-Based Image Retrieval
title_short	Modality-Aware Triplet Hard Mining for Zero-shot Sketch-Based Image Retrieval
title_full	Modality-Aware Triplet Hard Mining for Zero-shot Sketch-Based Image Retrieval
title_fullStr	Modality-Aware Triplet Hard Mining for Zero-shot Sketch-Based Image Retrieval
title_full_unstemmed	Modality-Aware Triplet Hard Mining for Zero-shot Sketch-Based Image Retrieval
title_sort	modality-aware triplet hard mining for zero-shot sketch-based image retrieval
publishDate	2021
url	http://arxiv.org/abs/2112.07966
genre	DML
genre_facet	DML
op_relation	http://arxiv.org/abs/2112.07966
_version_	1776199913084813312

Modality-Aware Triplet Hard Mining for Zero-shot Sketch-Based Image Retrieval

Similar Items