Cross-Modal Fine-Tuning: Align then Refine ...

Fine-tuning large-scale pretrained models has led to tremendous progress in well-studied modalities such as vision and NLP. However, similar gains have not been observed in many other modalities due to a lack of relevant pretrained models. In this work, we propose ORCA, a general cross-modal fine-tu...

Full description

Bibliographic Details
Main Authors: Shen, Junhong, Li, Liam, Dery, Lucio M., Staten, Corey, Khodak, Mikhail, Neubig, Graham, Talwalkar, Ameet
Format: Article in Journal/Newspaper
Language:unknown
Published: arXiv 2023
Subjects:
Online Access:https://dx.doi.org/10.48550/arxiv.2302.05738
https://arxiv.org/abs/2302.05738
id ftdatacite:10.48550/arxiv.2302.05738
record_format openpolar
spelling ftdatacite:10.48550/arxiv.2302.05738 2023-05-15T17:53:12+02:00 Cross-Modal Fine-Tuning: Align then Refine ... Shen, Junhong Li, Liam Dery, Lucio M. Staten, Corey Khodak, Mikhail Neubig, Graham Talwalkar, Ameet 2023 https://dx.doi.org/10.48550/arxiv.2302.05738 https://arxiv.org/abs/2302.05738 unknown arXiv arXiv.org perpetual, non-exclusive license http://arxiv.org/licenses/nonexclusive-distrib/1.0/ Machine Learning cs.LG FOS Computer and information sciences Article article Preprint CreativeWork 2023 ftdatacite https://doi.org/10.48550/arxiv.2302.05738 2023-04-03T15:45:17Z Fine-tuning large-scale pretrained models has led to tremendous progress in well-studied modalities such as vision and NLP. However, similar gains have not been observed in many other modalities due to a lack of relevant pretrained models. In this work, we propose ORCA, a general cross-modal fine-tuning framework that extends the applicability of a single large-scale pretrained model to diverse modalities. ORCA adapts to a target task via an align-then-refine workflow: given the target input, ORCA first learns an embedding network that aligns the embedded feature distribution with the pretraining modality. The pretrained model is then fine-tuned on the embedded data to exploit the knowledge shared across modalities. Through extensive experiments, we show that ORCA obtains state-of-the-art results on 3 benchmarks containing over 60 datasets from 12 modalities, outperforming a wide range of hand-designed, AutoML, general-purpose, and task-specific methods. We highlight the importance of data alignment via a ... Article in Journal/Newspaper Orca DataCite Metadata Store (German National Library of Science and Technology)
institution Open Polar
collection DataCite Metadata Store (German National Library of Science and Technology)
op_collection_id ftdatacite
language unknown
topic Machine Learning cs.LG
FOS Computer and information sciences
spellingShingle Machine Learning cs.LG
FOS Computer and information sciences
Shen, Junhong
Li, Liam
Dery, Lucio M.
Staten, Corey
Khodak, Mikhail
Neubig, Graham
Talwalkar, Ameet
Cross-Modal Fine-Tuning: Align then Refine ...
topic_facet Machine Learning cs.LG
FOS Computer and information sciences
description Fine-tuning large-scale pretrained models has led to tremendous progress in well-studied modalities such as vision and NLP. However, similar gains have not been observed in many other modalities due to a lack of relevant pretrained models. In this work, we propose ORCA, a general cross-modal fine-tuning framework that extends the applicability of a single large-scale pretrained model to diverse modalities. ORCA adapts to a target task via an align-then-refine workflow: given the target input, ORCA first learns an embedding network that aligns the embedded feature distribution with the pretraining modality. The pretrained model is then fine-tuned on the embedded data to exploit the knowledge shared across modalities. Through extensive experiments, we show that ORCA obtains state-of-the-art results on 3 benchmarks containing over 60 datasets from 12 modalities, outperforming a wide range of hand-designed, AutoML, general-purpose, and task-specific methods. We highlight the importance of data alignment via a ...
format Article in Journal/Newspaper
author Shen, Junhong
Li, Liam
Dery, Lucio M.
Staten, Corey
Khodak, Mikhail
Neubig, Graham
Talwalkar, Ameet
author_facet Shen, Junhong
Li, Liam
Dery, Lucio M.
Staten, Corey
Khodak, Mikhail
Neubig, Graham
Talwalkar, Ameet
author_sort Shen, Junhong
title Cross-Modal Fine-Tuning: Align then Refine ...
title_short Cross-Modal Fine-Tuning: Align then Refine ...
title_full Cross-Modal Fine-Tuning: Align then Refine ...
title_fullStr Cross-Modal Fine-Tuning: Align then Refine ...
title_full_unstemmed Cross-Modal Fine-Tuning: Align then Refine ...
title_sort cross-modal fine-tuning: align then refine ...
publisher arXiv
publishDate 2023
url https://dx.doi.org/10.48550/arxiv.2302.05738
https://arxiv.org/abs/2302.05738
genre Orca
genre_facet Orca
op_rights arXiv.org perpetual, non-exclusive license
http://arxiv.org/licenses/nonexclusive-distrib/1.0/
op_doi https://doi.org/10.48550/arxiv.2302.05738
_version_ 1766160908814385152