MixVPR: Feature Mixing for Visual Place Recognition

Visual Place Recognition (VPR) is a crucial part of mobile robotics and autonomous driving as well as other computer vision tasks. It refers to the process of identifying a place depicted in a query image using only computer vision. At large scale, repetitive structures, weather and illumination cha...

Full description

Bibliographic Details
Main Authors:	Ali-bey, Amar, Chaib-draa, Brahim, Giguère, Philippe
Format:	Text
Language:	unknown
Published:	2023
Subjects:	Computer Science - Computer Vision and Pattern Recognition Nordland
Online Access:	http://arxiv.org/abs/2303.02190

id	ftarxivpreprints:oai:arXiv.org:2303.02190
record_format	openpolar
spelling	ftarxivpreprints:oai:arXiv.org:2303.02190 2023-09-05T13:21:15+02:00 MixVPR: Feature Mixing for Visual Place Recognition Ali-bey, Amar Chaib-draa, Brahim Giguère, Philippe 2023-03-03 http://arxiv.org/abs/2303.02190 unknown http://arxiv.org/abs/2303.02190 Computer Science - Computer Vision and Pattern Recognition text 2023 ftarxivpreprints 2023-08-16T17:34:11Z Visual Place Recognition (VPR) is a crucial part of mobile robotics and autonomous driving as well as other computer vision tasks. It refers to the process of identifying a place depicted in a query image using only computer vision. At large scale, repetitive structures, weather and illumination changes pose a real challenge, as appearances can drastically change over time. Along with tackling these challenges, an efficient VPR technique must also be practical in real-world scenarios where latency matters. To address this, we introduce MixVPR, a new holistic feature aggregation technique that takes feature maps from pre-trained backbones as a set of global features. Then, it incorporates a global relationship between elements in each feature map in a cascade of feature mixing, eliminating the need for local or pyramidal aggregation as done in NetVLAD or TransVPR. We demonstrate the effectiveness of our technique through extensive experiments on multiple large-scale benchmarks. Our method outperforms all existing techniques by a large margin while having less than half the number of parameters compared to CosPlace and NetVLAD. We achieve a new all-time high recall@1 score of 94.6% on Pitts250k-test, 88.0% on MapillarySLS, and more importantly, 58.4% on Nordland. Finally, our method outperforms two-stage retrieval techniques such as Patch-NetVLAD, TransVPR and SuperGLUE all while being orders of magnitude faster. Our code and trained models are available at https://github.com/amaralibey/MixVPR. Comment: Accepted at WACV 2023 Text Nordland Nordland Nordland ArXiv.org (Cornell University Library)
institution	Open Polar
collection	ArXiv.org (Cornell University Library)
op_collection_id	ftarxivpreprints
language	unknown
topic	Computer Science - Computer Vision and Pattern Recognition
spellingShingle	Computer Science - Computer Vision and Pattern Recognition Ali-bey, Amar Chaib-draa, Brahim Giguère, Philippe MixVPR: Feature Mixing for Visual Place Recognition
topic_facet	Computer Science - Computer Vision and Pattern Recognition
description	Visual Place Recognition (VPR) is a crucial part of mobile robotics and autonomous driving as well as other computer vision tasks. It refers to the process of identifying a place depicted in a query image using only computer vision. At large scale, repetitive structures, weather and illumination changes pose a real challenge, as appearances can drastically change over time. Along with tackling these challenges, an efficient VPR technique must also be practical in real-world scenarios where latency matters. To address this, we introduce MixVPR, a new holistic feature aggregation technique that takes feature maps from pre-trained backbones as a set of global features. Then, it incorporates a global relationship between elements in each feature map in a cascade of feature mixing, eliminating the need for local or pyramidal aggregation as done in NetVLAD or TransVPR. We demonstrate the effectiveness of our technique through extensive experiments on multiple large-scale benchmarks. Our method outperforms all existing techniques by a large margin while having less than half the number of parameters compared to CosPlace and NetVLAD. We achieve a new all-time high recall@1 score of 94.6% on Pitts250k-test, 88.0% on MapillarySLS, and more importantly, 58.4% on Nordland. Finally, our method outperforms two-stage retrieval techniques such as Patch-NetVLAD, TransVPR and SuperGLUE all while being orders of magnitude faster. Our code and trained models are available at https://github.com/amaralibey/MixVPR. Comment: Accepted at WACV 2023
format	Text
author	Ali-bey, Amar Chaib-draa, Brahim Giguère, Philippe
author_facet	Ali-bey, Amar Chaib-draa, Brahim Giguère, Philippe
author_sort	Ali-bey, Amar
title	MixVPR: Feature Mixing for Visual Place Recognition
title_short	MixVPR: Feature Mixing for Visual Place Recognition
title_full	MixVPR: Feature Mixing for Visual Place Recognition
title_fullStr	MixVPR: Feature Mixing for Visual Place Recognition
title_full_unstemmed	MixVPR: Feature Mixing for Visual Place Recognition
title_sort	mixvpr: feature mixing for visual place recognition
publishDate	2023
url	http://arxiv.org/abs/2303.02190
genre	Nordland Nordland Nordland
genre_facet	Nordland Nordland Nordland
op_relation	http://arxiv.org/abs/2303.02190
_version_	1776201843255279616

MixVPR: Feature Mixing for Visual Place Recognition

Similar Items