Modality-Aware Triplet Hard Mining for Zero-shot Sketch-Based Image Retrieval

This paper tackles the Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) problem from the viewpoint of cross-modality metric learning. This task has two characteristics: 1) the zero-shot setting requires a metric space with good within-class compactness and the between-class discrepancy for recognizi...

Full description

Bibliographic Details
Main Authors: Huang, Zongheng, Sun, YiFan, Han, Chuchu, Gao, Changxin, Sang, Nong
Format: Text
Language:unknown
Published: 2021
Subjects:
DML
Online Access:http://arxiv.org/abs/2112.07966
id ftarxivpreprints:oai:arXiv.org:2112.07966
record_format openpolar
spelling ftarxivpreprints:oai:arXiv.org:2112.07966 2023-09-05T13:19:06+02:00 Modality-Aware Triplet Hard Mining for Zero-shot Sketch-Based Image Retrieval Huang, Zongheng Sun, YiFan Han, Chuchu Gao, Changxin Sang, Nong 2021-12-15 http://arxiv.org/abs/2112.07966 unknown http://arxiv.org/abs/2112.07966 Computer Science - Computer Vision and Pattern Recognition text 2021 ftarxivpreprints 2023-08-16T16:50:19Z This paper tackles the Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) problem from the viewpoint of cross-modality metric learning. This task has two characteristics: 1) the zero-shot setting requires a metric space with good within-class compactness and the between-class discrepancy for recognizing the novel classes and 2) the sketch query and the photo gallery are in different modalities. The metric learning viewpoint benefits ZS-SBIR from two aspects. First, it facilitates improvement through recent good practices in deep metric learning (DML). By combining two fundamental learning approaches in DML, e.g., classification training and pairwise training, we set up a strong baseline for ZS-SBIR. Without bells and whistles, this baseline achieves competitive retrieval accuracy. Second, it provides an insight that properly suppressing the modality gap is critical. To this end, we design a novel method named Modality-Aware Triplet Hard Mining (MATHM). MATHM enhances the baseline with three types of pairwise learning, e.g., a cross-modality sample pair, a within-modality sample pair, and their combination.\We also design an adaptive weighting method to balance these three components during training dynamically. Experimental results confirm that MATHM brings another round of significant improvement based on the strong baseline and sets up new state-of-the-art performance. For example, on the TU-Berlin dataset, we achieve 47.88+2.94% mAP@all and 58.28+2.34% Prec@100. Code will be publicly available at: https://github.com/huangzongheng/MATHM. Comment: 13 pages, 7 figures Text DML ArXiv.org (Cornell University Library)
institution Open Polar
collection ArXiv.org (Cornell University Library)
op_collection_id ftarxivpreprints
language unknown
topic Computer Science - Computer Vision and Pattern Recognition
spellingShingle Computer Science - Computer Vision and Pattern Recognition
Huang, Zongheng
Sun, YiFan
Han, Chuchu
Gao, Changxin
Sang, Nong
Modality-Aware Triplet Hard Mining for Zero-shot Sketch-Based Image Retrieval
topic_facet Computer Science - Computer Vision and Pattern Recognition
description This paper tackles the Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) problem from the viewpoint of cross-modality metric learning. This task has two characteristics: 1) the zero-shot setting requires a metric space with good within-class compactness and the between-class discrepancy for recognizing the novel classes and 2) the sketch query and the photo gallery are in different modalities. The metric learning viewpoint benefits ZS-SBIR from two aspects. First, it facilitates improvement through recent good practices in deep metric learning (DML). By combining two fundamental learning approaches in DML, e.g., classification training and pairwise training, we set up a strong baseline for ZS-SBIR. Without bells and whistles, this baseline achieves competitive retrieval accuracy. Second, it provides an insight that properly suppressing the modality gap is critical. To this end, we design a novel method named Modality-Aware Triplet Hard Mining (MATHM). MATHM enhances the baseline with three types of pairwise learning, e.g., a cross-modality sample pair, a within-modality sample pair, and their combination.\We also design an adaptive weighting method to balance these three components during training dynamically. Experimental results confirm that MATHM brings another round of significant improvement based on the strong baseline and sets up new state-of-the-art performance. For example, on the TU-Berlin dataset, we achieve 47.88+2.94% mAP@all and 58.28+2.34% Prec@100. Code will be publicly available at: https://github.com/huangzongheng/MATHM. Comment: 13 pages, 7 figures
format Text
author Huang, Zongheng
Sun, YiFan
Han, Chuchu
Gao, Changxin
Sang, Nong
author_facet Huang, Zongheng
Sun, YiFan
Han, Chuchu
Gao, Changxin
Sang, Nong
author_sort Huang, Zongheng
title Modality-Aware Triplet Hard Mining for Zero-shot Sketch-Based Image Retrieval
title_short Modality-Aware Triplet Hard Mining for Zero-shot Sketch-Based Image Retrieval
title_full Modality-Aware Triplet Hard Mining for Zero-shot Sketch-Based Image Retrieval
title_fullStr Modality-Aware Triplet Hard Mining for Zero-shot Sketch-Based Image Retrieval
title_full_unstemmed Modality-Aware Triplet Hard Mining for Zero-shot Sketch-Based Image Retrieval
title_sort modality-aware triplet hard mining for zero-shot sketch-based image retrieval
publishDate 2021
url http://arxiv.org/abs/2112.07966
genre DML
genre_facet DML
op_relation http://arxiv.org/abs/2112.07966
_version_ 1776199913084813312