EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything ...

Segment Anything Model (SAM) has emerged as a powerful tool for numerous vision applications. A key component that drives the impressive performance for zero-shot transfer and high versatility is a super large Transformer model trained on the extensive high-quality SA-1B dataset. While beneficial, t...

Full description

Bibliographic Details
Main Authors: Xiong, Yunyang, Varadarajan, Bala, Wu, Lemeng, Xiang, Xiaoyu, Xiao, Fanyi, Zhu, Chenchen, Dai, Xiaoliang, Wang, Dilin, Sun, Fei, Iandola, Forrest, Krishnamoorthi, Raghuraman, Chandra, Vikas
Format: Article in Journal/Newspaper
Language:unknown
Published: arXiv 2023
Subjects:
Online Access:https://dx.doi.org/10.48550/arxiv.2312.00863
https://arxiv.org/abs/2312.00863
Description
Summary:Segment Anything Model (SAM) has emerged as a powerful tool for numerous vision applications. A key component that drives the impressive performance for zero-shot transfer and high versatility is a super large Transformer model trained on the extensive high-quality SA-1B dataset. While beneficial, the huge computation cost of SAM model has limited its applications to wider real-world applications. To address this limitation, we propose EfficientSAMs, light-weight SAM models that exhibits decent performance with largely reduced complexity. Our idea is based on leveraging masked image pretraining, SAMI, which learns to reconstruct features from SAM image encoder for effective visual representation learning. Further, we take SAMI-pretrained light-weight image encoders and mask decoder to build EfficientSAMs, and finetune the models on SA-1B for segment anything task. We perform evaluations on multiple vision tasks including image classification, object detection, instance segmentation, and semantic object ...