Scene Retrieval for Contextual Visual Mapping

Visual navigation localizes a query place image against a reference database of place images, also known as a `visual map'. Localization accuracy requirements for specific areas of the visual map, `scene classes', vary according to the context of the environment and task. State-of-the-art...

Full description

Bibliographic Details
Main Authors: Smith, William H. B., Milford, Michael, McDonald-Maier, Klaus D., Ehsan, Shoaib
Format: Text
Language:unknown
Published: 2021
Subjects:
Online Access:http://arxiv.org/abs/2102.12728
id ftarxivpreprints:oai:arXiv.org:2102.12728
record_format openpolar
spelling ftarxivpreprints:oai:arXiv.org:2102.12728 2023-09-05T13:21:15+02:00 Scene Retrieval for Contextual Visual Mapping Smith, William H. B. Milford, Michael McDonald-Maier, Klaus D. Ehsan, Shoaib 2021-02-25 http://arxiv.org/abs/2102.12728 unknown http://arxiv.org/abs/2102.12728 Computer Science - Computer Vision and Pattern Recognition Computer Science - Robotics text 2021 ftarxivpreprints 2023-08-16T16:21:48Z Visual navigation localizes a query place image against a reference database of place images, also known as a `visual map'. Localization accuracy requirements for specific areas of the visual map, `scene classes', vary according to the context of the environment and task. State-of-the-art visual mapping is unable to reflect these requirements by explicitly targetting scene classes for inclusion in the map. Four different scene classes, including pedestrian crossings and stations, are identified in each of the Nordland and St. Lucia datasets. Instead of re-training separate scene classifiers which struggle with these overlapping scene classes we make our first contribution: defining the problem of `scene retrieval'. Scene retrieval extends image retrieval to classification of scenes defined at test time by associating a single query image to reference images of scene classes. Our second contribution is a triplet-trained convolutional neural network (CNN) to address this problem which increases scene classification accuracy by up to 7% against state-of-the-art networks pre-trained for scene recognition. The second contribution is an algorithm `DMC' that combines our scene classification with distance and memorability for visual mapping. Our analysis shows that DMC includes 64% more images of our chosen scene classes in a visual map than just using distance interval mapping. State-of-the-art visual place descriptors AMOS-Net, Hybrid-Net and NetVLAD are finally used to show that DMC improves scene class localization accuracy by a mean of 3% and localization accuracy of the remaining map images by a mean of 10% across both datasets. Comment: 8 page paper on visual place recogniton and scene classification Text Nordland Nordland Nordland ArXiv.org (Cornell University Library)
institution Open Polar
collection ArXiv.org (Cornell University Library)
op_collection_id ftarxivpreprints
language unknown
topic Computer Science - Computer Vision and Pattern Recognition
Computer Science - Robotics
spellingShingle Computer Science - Computer Vision and Pattern Recognition
Computer Science - Robotics
Smith, William H. B.
Milford, Michael
McDonald-Maier, Klaus D.
Ehsan, Shoaib
Scene Retrieval for Contextual Visual Mapping
topic_facet Computer Science - Computer Vision and Pattern Recognition
Computer Science - Robotics
description Visual navigation localizes a query place image against a reference database of place images, also known as a `visual map'. Localization accuracy requirements for specific areas of the visual map, `scene classes', vary according to the context of the environment and task. State-of-the-art visual mapping is unable to reflect these requirements by explicitly targetting scene classes for inclusion in the map. Four different scene classes, including pedestrian crossings and stations, are identified in each of the Nordland and St. Lucia datasets. Instead of re-training separate scene classifiers which struggle with these overlapping scene classes we make our first contribution: defining the problem of `scene retrieval'. Scene retrieval extends image retrieval to classification of scenes defined at test time by associating a single query image to reference images of scene classes. Our second contribution is a triplet-trained convolutional neural network (CNN) to address this problem which increases scene classification accuracy by up to 7% against state-of-the-art networks pre-trained for scene recognition. The second contribution is an algorithm `DMC' that combines our scene classification with distance and memorability for visual mapping. Our analysis shows that DMC includes 64% more images of our chosen scene classes in a visual map than just using distance interval mapping. State-of-the-art visual place descriptors AMOS-Net, Hybrid-Net and NetVLAD are finally used to show that DMC improves scene class localization accuracy by a mean of 3% and localization accuracy of the remaining map images by a mean of 10% across both datasets. Comment: 8 page paper on visual place recogniton and scene classification
format Text
author Smith, William H. B.
Milford, Michael
McDonald-Maier, Klaus D.
Ehsan, Shoaib
author_facet Smith, William H. B.
Milford, Michael
McDonald-Maier, Klaus D.
Ehsan, Shoaib
author_sort Smith, William H. B.
title Scene Retrieval for Contextual Visual Mapping
title_short Scene Retrieval for Contextual Visual Mapping
title_full Scene Retrieval for Contextual Visual Mapping
title_fullStr Scene Retrieval for Contextual Visual Mapping
title_full_unstemmed Scene Retrieval for Contextual Visual Mapping
title_sort scene retrieval for contextual visual mapping
publishDate 2021
url http://arxiv.org/abs/2102.12728
genre Nordland
Nordland
Nordland
genre_facet Nordland
Nordland
Nordland
op_relation http://arxiv.org/abs/2102.12728
_version_ 1776201844210532352