EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything ...
Segment Anything Model (SAM) has emerged as a powerful tool for numerous vision applications. A key component that drives the impressive performance for zero-shot transfer and high versatility is a super large Transformer model trained on the extensive high-quality SA-1B dataset. While beneficial, t...
Main Authors: | , , , , , , , , , , , |
---|---|
Format: | Article in Journal/Newspaper |
Language: | unknown |
Published: |
arXiv
2023
|
Subjects: | |
Online Access: | https://dx.doi.org/10.48550/arxiv.2312.00863 https://arxiv.org/abs/2312.00863 |
id |
ftdatacite:10.48550/arxiv.2312.00863 |
---|---|
record_format |
openpolar |
spelling |
ftdatacite:10.48550/arxiv.2312.00863 2024-01-28T10:08:56+01:00 EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything ... Xiong, Yunyang Varadarajan, Bala Wu, Lemeng Xiang, Xiaoyu Xiao, Fanyi Zhu, Chenchen Dai, Xiaoliang Wang, Dilin Sun, Fei Iandola, Forrest Krishnamoorthi, Raghuraman Chandra, Vikas 2023 https://dx.doi.org/10.48550/arxiv.2312.00863 https://arxiv.org/abs/2312.00863 unknown arXiv arXiv.org perpetual, non-exclusive license http://arxiv.org/licenses/nonexclusive-distrib/1.0/ Computer Vision and Pattern Recognition cs.CV FOS Computer and information sciences Article Preprint CreativeWork article 2023 ftdatacite https://doi.org/10.48550/arxiv.2312.00863 2024-01-04T15:36:18Z Segment Anything Model (SAM) has emerged as a powerful tool for numerous vision applications. A key component that drives the impressive performance for zero-shot transfer and high versatility is a super large Transformer model trained on the extensive high-quality SA-1B dataset. While beneficial, the huge computation cost of SAM model has limited its applications to wider real-world applications. To address this limitation, we propose EfficientSAMs, light-weight SAM models that exhibits decent performance with largely reduced complexity. Our idea is based on leveraging masked image pretraining, SAMI, which learns to reconstruct features from SAM image encoder for effective visual representation learning. Further, we take SAMI-pretrained light-weight image encoders and mask decoder to build EfficientSAMs, and finetune the models on SA-1B for segment anything task. We perform evaluations on multiple vision tasks including image classification, object detection, instance segmentation, and semantic object ... Article in Journal/Newspaper sami DataCite Metadata Store (German National Library of Science and Technology) |
institution |
Open Polar |
collection |
DataCite Metadata Store (German National Library of Science and Technology) |
op_collection_id |
ftdatacite |
language |
unknown |
topic |
Computer Vision and Pattern Recognition cs.CV FOS Computer and information sciences |
spellingShingle |
Computer Vision and Pattern Recognition cs.CV FOS Computer and information sciences Xiong, Yunyang Varadarajan, Bala Wu, Lemeng Xiang, Xiaoyu Xiao, Fanyi Zhu, Chenchen Dai, Xiaoliang Wang, Dilin Sun, Fei Iandola, Forrest Krishnamoorthi, Raghuraman Chandra, Vikas EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything ... |
topic_facet |
Computer Vision and Pattern Recognition cs.CV FOS Computer and information sciences |
description |
Segment Anything Model (SAM) has emerged as a powerful tool for numerous vision applications. A key component that drives the impressive performance for zero-shot transfer and high versatility is a super large Transformer model trained on the extensive high-quality SA-1B dataset. While beneficial, the huge computation cost of SAM model has limited its applications to wider real-world applications. To address this limitation, we propose EfficientSAMs, light-weight SAM models that exhibits decent performance with largely reduced complexity. Our idea is based on leveraging masked image pretraining, SAMI, which learns to reconstruct features from SAM image encoder for effective visual representation learning. Further, we take SAMI-pretrained light-weight image encoders and mask decoder to build EfficientSAMs, and finetune the models on SA-1B for segment anything task. We perform evaluations on multiple vision tasks including image classification, object detection, instance segmentation, and semantic object ... |
format |
Article in Journal/Newspaper |
author |
Xiong, Yunyang Varadarajan, Bala Wu, Lemeng Xiang, Xiaoyu Xiao, Fanyi Zhu, Chenchen Dai, Xiaoliang Wang, Dilin Sun, Fei Iandola, Forrest Krishnamoorthi, Raghuraman Chandra, Vikas |
author_facet |
Xiong, Yunyang Varadarajan, Bala Wu, Lemeng Xiang, Xiaoyu Xiao, Fanyi Zhu, Chenchen Dai, Xiaoliang Wang, Dilin Sun, Fei Iandola, Forrest Krishnamoorthi, Raghuraman Chandra, Vikas |
author_sort |
Xiong, Yunyang |
title |
EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything ... |
title_short |
EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything ... |
title_full |
EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything ... |
title_fullStr |
EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything ... |
title_full_unstemmed |
EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything ... |
title_sort |
efficientsam: leveraged masked image pretraining for efficient segment anything ... |
publisher |
arXiv |
publishDate |
2023 |
url |
https://dx.doi.org/10.48550/arxiv.2312.00863 https://arxiv.org/abs/2312.00863 |
genre |
sami |
genre_facet |
sami |
op_rights |
arXiv.org perpetual, non-exclusive license http://arxiv.org/licenses/nonexclusive-distrib/1.0/ |
op_doi |
https://doi.org/10.48550/arxiv.2312.00863 |
_version_ |
1789338259846856704 |