Cross-Modal Fine-Tuning: Align then Refine ...

Fine-tuning large-scale pretrained models has led to tremendous progress in well-studied modalities such as vision and NLP. However, similar gains have not been observed in many other modalities due to a lack of relevant pretrained models. In this work, we propose ORCA, a general cross-modal fine-tu...

Full description

Bibliographic Details
Main Authors:	Shen, Junhong, Li, Liam, Dery, Lucio M., Staten, Corey, Khodak, Mikhail, Neubig, Graham, Talwalkar, Ameet
Format:	Article in Journal/Newspaper
Language:	unknown
Published:	arXiv 2023
Subjects:	Machine Learning cs.LG FOS Computer and information sciences Orca
Online Access:	https://dx.doi.org/10.48550/arxiv.2302.05738 https://arxiv.org/abs/2302.05738

id	ftdatacite:10.48550/arxiv.2302.05738
record_format	openpolar
spelling	ftdatacite:10.48550/arxiv.2302.05738 2023-05-15T17:53:12+02:00 Cross-Modal Fine-Tuning: Align then Refine ... Shen, Junhong Li, Liam Dery, Lucio M. Staten, Corey Khodak, Mikhail Neubig, Graham Talwalkar, Ameet 2023 https://dx.doi.org/10.48550/arxiv.2302.05738 https://arxiv.org/abs/2302.05738 unknown arXiv arXiv.org perpetual, non-exclusive license http://arxiv.org/licenses/nonexclusive-distrib/1.0/ Machine Learning cs.LG FOS Computer and information sciences Article article Preprint CreativeWork 2023 ftdatacite https://doi.org/10.48550/arxiv.2302.05738 2023-04-03T15:45:17Z Fine-tuning large-scale pretrained models has led to tremendous progress in well-studied modalities such as vision and NLP. However, similar gains have not been observed in many other modalities due to a lack of relevant pretrained models. In this work, we propose ORCA, a general cross-modal fine-tuning framework that extends the applicability of a single large-scale pretrained model to diverse modalities. ORCA adapts to a target task via an align-then-refine workflow: given the target input, ORCA first learns an embedding network that aligns the embedded feature distribution with the pretraining modality. The pretrained model is then fine-tuned on the embedded data to exploit the knowledge shared across modalities. Through extensive experiments, we show that ORCA obtains state-of-the-art results on 3 benchmarks containing over 60 datasets from 12 modalities, outperforming a wide range of hand-designed, AutoML, general-purpose, and task-specific methods. We highlight the importance of data alignment via a ... Article in Journal/Newspaper Orca DataCite Metadata Store (German National Library of Science and Technology)
institution	Open Polar
collection	DataCite Metadata Store (German National Library of Science and Technology)
op_collection_id	ftdatacite
language	unknown
topic	Machine Learning cs.LG FOS Computer and information sciences
spellingShingle	Machine Learning cs.LG FOS Computer and information sciences Shen, Junhong Li, Liam Dery, Lucio M. Staten, Corey Khodak, Mikhail Neubig, Graham Talwalkar, Ameet Cross-Modal Fine-Tuning: Align then Refine ...
topic_facet	Machine Learning cs.LG FOS Computer and information sciences
description	Fine-tuning large-scale pretrained models has led to tremendous progress in well-studied modalities such as vision and NLP. However, similar gains have not been observed in many other modalities due to a lack of relevant pretrained models. In this work, we propose ORCA, a general cross-modal fine-tuning framework that extends the applicability of a single large-scale pretrained model to diverse modalities. ORCA adapts to a target task via an align-then-refine workflow: given the target input, ORCA first learns an embedding network that aligns the embedded feature distribution with the pretraining modality. The pretrained model is then fine-tuned on the embedded data to exploit the knowledge shared across modalities. Through extensive experiments, we show that ORCA obtains state-of-the-art results on 3 benchmarks containing over 60 datasets from 12 modalities, outperforming a wide range of hand-designed, AutoML, general-purpose, and task-specific methods. We highlight the importance of data alignment via a ...
format	Article in Journal/Newspaper
author	Shen, Junhong Li, Liam Dery, Lucio M. Staten, Corey Khodak, Mikhail Neubig, Graham Talwalkar, Ameet
author_facet	Shen, Junhong Li, Liam Dery, Lucio M. Staten, Corey Khodak, Mikhail Neubig, Graham Talwalkar, Ameet
author_sort	Shen, Junhong
title	Cross-Modal Fine-Tuning: Align then Refine ...
title_short	Cross-Modal Fine-Tuning: Align then Refine ...
title_full	Cross-Modal Fine-Tuning: Align then Refine ...
title_fullStr	Cross-Modal Fine-Tuning: Align then Refine ...
title_full_unstemmed	Cross-Modal Fine-Tuning: Align then Refine ...
title_sort	cross-modal fine-tuning: align then refine ...
publisher	arXiv
publishDate	2023
url	https://dx.doi.org/10.48550/arxiv.2302.05738 https://arxiv.org/abs/2302.05738
genre	Orca
genre_facet	Orca
op_rights	arXiv.org perpetual, non-exclusive license http://arxiv.org/licenses/nonexclusive-distrib/1.0/
op_doi	https://doi.org/10.48550/arxiv.2302.05738
_version_	1766160908814385152

Cross-Modal Fine-Tuning: Align then Refine ...

Similar Items