EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything ...

Segment Anything Model (SAM) has emerged as a powerful tool for numerous vision applications. A key component that drives the impressive performance for zero-shot transfer and high versatility is a super large Transformer model trained on the extensive high-quality SA-1B dataset. While beneficial, t...

Full description

Bibliographic Details
Main Authors:	Xiong, Yunyang, Varadarajan, Bala, Wu, Lemeng, Xiang, Xiaoyu, Xiao, Fanyi, Zhu, Chenchen, Dai, Xiaoliang, Wang, Dilin, Sun, Fei, Iandola, Forrest, Krishnamoorthi, Raghuraman, Chandra, Vikas
Format:	Article in Journal/Newspaper
Language:	unknown
Published:	arXiv 2023
Subjects:	Computer Vision and Pattern Recognition cs.CV FOS Computer and information sciences sami
Online Access:	https://dx.doi.org/10.48550/arxiv.2312.00863 https://arxiv.org/abs/2312.00863

id	ftdatacite:10.48550/arxiv.2312.00863
record_format	openpolar
spelling	ftdatacite:10.48550/arxiv.2312.00863 2024-01-28T10:08:56+01:00 EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything ... Xiong, Yunyang Varadarajan, Bala Wu, Lemeng Xiang, Xiaoyu Xiao, Fanyi Zhu, Chenchen Dai, Xiaoliang Wang, Dilin Sun, Fei Iandola, Forrest Krishnamoorthi, Raghuraman Chandra, Vikas 2023 https://dx.doi.org/10.48550/arxiv.2312.00863 https://arxiv.org/abs/2312.00863 unknown arXiv arXiv.org perpetual, non-exclusive license http://arxiv.org/licenses/nonexclusive-distrib/1.0/ Computer Vision and Pattern Recognition cs.CV FOS Computer and information sciences Article Preprint CreativeWork article 2023 ftdatacite https://doi.org/10.48550/arxiv.2312.00863 2024-01-04T15:36:18Z Segment Anything Model (SAM) has emerged as a powerful tool for numerous vision applications. A key component that drives the impressive performance for zero-shot transfer and high versatility is a super large Transformer model trained on the extensive high-quality SA-1B dataset. While beneficial, the huge computation cost of SAM model has limited its applications to wider real-world applications. To address this limitation, we propose EfficientSAMs, light-weight SAM models that exhibits decent performance with largely reduced complexity. Our idea is based on leveraging masked image pretraining, SAMI, which learns to reconstruct features from SAM image encoder for effective visual representation learning. Further, we take SAMI-pretrained light-weight image encoders and mask decoder to build EfficientSAMs, and finetune the models on SA-1B for segment anything task. We perform evaluations on multiple vision tasks including image classification, object detection, instance segmentation, and semantic object ... Article in Journal/Newspaper sami DataCite Metadata Store (German National Library of Science and Technology)
institution	Open Polar
collection	DataCite Metadata Store (German National Library of Science and Technology)
op_collection_id	ftdatacite
language	unknown
topic	Computer Vision and Pattern Recognition cs.CV FOS Computer and information sciences
spellingShingle	Computer Vision and Pattern Recognition cs.CV FOS Computer and information sciences Xiong, Yunyang Varadarajan, Bala Wu, Lemeng Xiang, Xiaoyu Xiao, Fanyi Zhu, Chenchen Dai, Xiaoliang Wang, Dilin Sun, Fei Iandola, Forrest Krishnamoorthi, Raghuraman Chandra, Vikas EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything ...
topic_facet	Computer Vision and Pattern Recognition cs.CV FOS Computer and information sciences
description	Segment Anything Model (SAM) has emerged as a powerful tool for numerous vision applications. A key component that drives the impressive performance for zero-shot transfer and high versatility is a super large Transformer model trained on the extensive high-quality SA-1B dataset. While beneficial, the huge computation cost of SAM model has limited its applications to wider real-world applications. To address this limitation, we propose EfficientSAMs, light-weight SAM models that exhibits decent performance with largely reduced complexity. Our idea is based on leveraging masked image pretraining, SAMI, which learns to reconstruct features from SAM image encoder for effective visual representation learning. Further, we take SAMI-pretrained light-weight image encoders and mask decoder to build EfficientSAMs, and finetune the models on SA-1B for segment anything task. We perform evaluations on multiple vision tasks including image classification, object detection, instance segmentation, and semantic object ...
format	Article in Journal/Newspaper
author	Xiong, Yunyang Varadarajan, Bala Wu, Lemeng Xiang, Xiaoyu Xiao, Fanyi Zhu, Chenchen Dai, Xiaoliang Wang, Dilin Sun, Fei Iandola, Forrest Krishnamoorthi, Raghuraman Chandra, Vikas
author_facet	Xiong, Yunyang Varadarajan, Bala Wu, Lemeng Xiang, Xiaoyu Xiao, Fanyi Zhu, Chenchen Dai, Xiaoliang Wang, Dilin Sun, Fei Iandola, Forrest Krishnamoorthi, Raghuraman Chandra, Vikas
author_sort	Xiong, Yunyang
title	EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything ...
title_short	EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything ...
title_full	EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything ...
title_fullStr	EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything ...
title_full_unstemmed	EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything ...
title_sort	efficientsam: leveraged masked image pretraining for efficient segment anything ...
publisher	arXiv
publishDate	2023
url	https://dx.doi.org/10.48550/arxiv.2312.00863 https://arxiv.org/abs/2312.00863
genre	sami
genre_facet	sami
op_rights	arXiv.org perpetual, non-exclusive license http://arxiv.org/licenses/nonexclusive-distrib/1.0/
op_doi	https://doi.org/10.48550/arxiv.2312.00863
_version_	1789338259846856704

EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything ...

Similar Items