Cross-Modal Fine-Tuning: Align then Refine

Fine-tuning large-scale pretrained models has led to tremendous progress in well-studied modalities such as vision and NLP. However, similar gains have not been observed in many other modalities due to a lack of relevant pretrained models. In this work, we propose ORCA, a general cross-modal fine-tu...

Full description

Bibliographic Details
Main Authors:	Shen, Junhong, Li, Liam, Dery, Lucio M., Staten, Corey, Khodak, Mikhail, Neubig, Graham, Talwalkar, Ameet
Format:	Text
Language:	unknown
Published:	2023
Subjects:	Computer Science - Machine Learning Orca
Online Access:	http://arxiv.org/abs/2302.05738

id	ftarxivpreprints:oai:arXiv.org:2302.05738
record_format	openpolar
spelling	ftarxivpreprints:oai:arXiv.org:2302.05738 2023-09-05T13:22:19+02:00 Cross-Modal Fine-Tuning: Align then Refine Shen, Junhong Li, Liam Dery, Lucio M. Staten, Corey Khodak, Mikhail Neubig, Graham Talwalkar, Ameet 2023-02-11 http://arxiv.org/abs/2302.05738 unknown http://arxiv.org/abs/2302.05738 Computer Science - Machine Learning text 2023 ftarxivpreprints 2023-08-16T17:32:09Z Fine-tuning large-scale pretrained models has led to tremendous progress in well-studied modalities such as vision and NLP. However, similar gains have not been observed in many other modalities due to a lack of relevant pretrained models. In this work, we propose ORCA, a general cross-modal fine-tuning framework that extends the applicability of a single large-scale pretrained model to diverse modalities. ORCA adapts to a target task via an align-then-refine workflow: given the target input, ORCA first learns an embedding network that aligns the embedded feature distribution with the pretraining modality. The pretrained model is then fine-tuned on the embedded data to exploit the knowledge shared across modalities. Through extensive experiments, we show that ORCA obtains state-of-the-art results on 3 benchmarks containing over 60 datasets from 12 modalities, outperforming a wide range of hand-designed, AutoML, general-purpose, and task-specific methods. We highlight the importance of data alignment via a series of ablation studies and demonstrate ORCA's utility in data-limited regimes. Text Orca ArXiv.org (Cornell University Library)
institution	Open Polar
collection	ArXiv.org (Cornell University Library)
op_collection_id	ftarxivpreprints
language	unknown
topic	Computer Science - Machine Learning
spellingShingle	Computer Science - Machine Learning Shen, Junhong Li, Liam Dery, Lucio M. Staten, Corey Khodak, Mikhail Neubig, Graham Talwalkar, Ameet Cross-Modal Fine-Tuning: Align then Refine
topic_facet	Computer Science - Machine Learning
description	Fine-tuning large-scale pretrained models has led to tremendous progress in well-studied modalities such as vision and NLP. However, similar gains have not been observed in many other modalities due to a lack of relevant pretrained models. In this work, we propose ORCA, a general cross-modal fine-tuning framework that extends the applicability of a single large-scale pretrained model to diverse modalities. ORCA adapts to a target task via an align-then-refine workflow: given the target input, ORCA first learns an embedding network that aligns the embedded feature distribution with the pretraining modality. The pretrained model is then fine-tuned on the embedded data to exploit the knowledge shared across modalities. Through extensive experiments, we show that ORCA obtains state-of-the-art results on 3 benchmarks containing over 60 datasets from 12 modalities, outperforming a wide range of hand-designed, AutoML, general-purpose, and task-specific methods. We highlight the importance of data alignment via a series of ablation studies and demonstrate ORCA's utility in data-limited regimes.
format	Text
author	Shen, Junhong Li, Liam Dery, Lucio M. Staten, Corey Khodak, Mikhail Neubig, Graham Talwalkar, Ameet
author_facet	Shen, Junhong Li, Liam Dery, Lucio M. Staten, Corey Khodak, Mikhail Neubig, Graham Talwalkar, Ameet
author_sort	Shen, Junhong
title	Cross-Modal Fine-Tuning: Align then Refine
title_short	Cross-Modal Fine-Tuning: Align then Refine
title_full	Cross-Modal Fine-Tuning: Align then Refine
title_fullStr	Cross-Modal Fine-Tuning: Align then Refine
title_full_unstemmed	Cross-Modal Fine-Tuning: Align then Refine
title_sort	cross-modal fine-tuning: align then refine
publishDate	2023
url	http://arxiv.org/abs/2302.05738
genre	Orca
genre_facet	Orca
op_relation	http://arxiv.org/abs/2302.05738
_version_	1776202852911284224

Cross-Modal Fine-Tuning: Align then Refine

Similar Items