Cross-Modal Fine-Tuning: Align then Refine ...
Fine-tuning large-scale pretrained models has led to tremendous progress in well-studied modalities such as vision and NLP. However, similar gains have not been observed in many other modalities due to a lack of relevant pretrained models. In this work, we propose ORCA, a general cross-modal fine-tu...
Main Authors: | , , , , , , |
---|---|
Format: | Article in Journal/Newspaper |
Language: | unknown |
Published: |
arXiv
2023
|
Subjects: | |
Online Access: | https://dx.doi.org/10.48550/arxiv.2302.05738 https://arxiv.org/abs/2302.05738 |
id |
ftdatacite:10.48550/arxiv.2302.05738 |
---|---|
record_format |
openpolar |
spelling |
ftdatacite:10.48550/arxiv.2302.05738 2023-05-15T17:53:12+02:00 Cross-Modal Fine-Tuning: Align then Refine ... Shen, Junhong Li, Liam Dery, Lucio M. Staten, Corey Khodak, Mikhail Neubig, Graham Talwalkar, Ameet 2023 https://dx.doi.org/10.48550/arxiv.2302.05738 https://arxiv.org/abs/2302.05738 unknown arXiv arXiv.org perpetual, non-exclusive license http://arxiv.org/licenses/nonexclusive-distrib/1.0/ Machine Learning cs.LG FOS Computer and information sciences Article article Preprint CreativeWork 2023 ftdatacite https://doi.org/10.48550/arxiv.2302.05738 2023-04-03T15:45:17Z Fine-tuning large-scale pretrained models has led to tremendous progress in well-studied modalities such as vision and NLP. However, similar gains have not been observed in many other modalities due to a lack of relevant pretrained models. In this work, we propose ORCA, a general cross-modal fine-tuning framework that extends the applicability of a single large-scale pretrained model to diverse modalities. ORCA adapts to a target task via an align-then-refine workflow: given the target input, ORCA first learns an embedding network that aligns the embedded feature distribution with the pretraining modality. The pretrained model is then fine-tuned on the embedded data to exploit the knowledge shared across modalities. Through extensive experiments, we show that ORCA obtains state-of-the-art results on 3 benchmarks containing over 60 datasets from 12 modalities, outperforming a wide range of hand-designed, AutoML, general-purpose, and task-specific methods. We highlight the importance of data alignment via a ... Article in Journal/Newspaper Orca DataCite Metadata Store (German National Library of Science and Technology) |
institution |
Open Polar |
collection |
DataCite Metadata Store (German National Library of Science and Technology) |
op_collection_id |
ftdatacite |
language |
unknown |
topic |
Machine Learning cs.LG FOS Computer and information sciences |
spellingShingle |
Machine Learning cs.LG FOS Computer and information sciences Shen, Junhong Li, Liam Dery, Lucio M. Staten, Corey Khodak, Mikhail Neubig, Graham Talwalkar, Ameet Cross-Modal Fine-Tuning: Align then Refine ... |
topic_facet |
Machine Learning cs.LG FOS Computer and information sciences |
description |
Fine-tuning large-scale pretrained models has led to tremendous progress in well-studied modalities such as vision and NLP. However, similar gains have not been observed in many other modalities due to a lack of relevant pretrained models. In this work, we propose ORCA, a general cross-modal fine-tuning framework that extends the applicability of a single large-scale pretrained model to diverse modalities. ORCA adapts to a target task via an align-then-refine workflow: given the target input, ORCA first learns an embedding network that aligns the embedded feature distribution with the pretraining modality. The pretrained model is then fine-tuned on the embedded data to exploit the knowledge shared across modalities. Through extensive experiments, we show that ORCA obtains state-of-the-art results on 3 benchmarks containing over 60 datasets from 12 modalities, outperforming a wide range of hand-designed, AutoML, general-purpose, and task-specific methods. We highlight the importance of data alignment via a ... |
format |
Article in Journal/Newspaper |
author |
Shen, Junhong Li, Liam Dery, Lucio M. Staten, Corey Khodak, Mikhail Neubig, Graham Talwalkar, Ameet |
author_facet |
Shen, Junhong Li, Liam Dery, Lucio M. Staten, Corey Khodak, Mikhail Neubig, Graham Talwalkar, Ameet |
author_sort |
Shen, Junhong |
title |
Cross-Modal Fine-Tuning: Align then Refine ... |
title_short |
Cross-Modal Fine-Tuning: Align then Refine ... |
title_full |
Cross-Modal Fine-Tuning: Align then Refine ... |
title_fullStr |
Cross-Modal Fine-Tuning: Align then Refine ... |
title_full_unstemmed |
Cross-Modal Fine-Tuning: Align then Refine ... |
title_sort |
cross-modal fine-tuning: align then refine ... |
publisher |
arXiv |
publishDate |
2023 |
url |
https://dx.doi.org/10.48550/arxiv.2302.05738 https://arxiv.org/abs/2302.05738 |
genre |
Orca |
genre_facet |
Orca |
op_rights |
arXiv.org perpetual, non-exclusive license http://arxiv.org/licenses/nonexclusive-distrib/1.0/ |
op_doi |
https://doi.org/10.48550/arxiv.2302.05738 |
_version_ |
1766160908814385152 |