Arctic-Embed: Scalable, Efficient, and Accurate Text Embedding Models ...

This report describes the training dataset creation and recipe behind the family of \texttt{arctic-embed} text embedding models (a set of five models ranging from 22 to 334 million parameters with weights open-sourced under an Apache-2 license). At the time of their release, each model achieved stat...

Full description

Bibliographic Details
Main Authors:	Merrick, Luke, Xu, Danmei, Nuti, Gaurav, Campos, Daniel
Format:	Article in Journal/Newspaper
Language:	unknown
Published:	arXiv 2024
Subjects:	Computation and Language cs.CL Artificial Intelligence cs.AI Information Retrieval cs.IR FOS Computer and information sciences Arctic
Online Access:	https://dx.doi.org/10.48550/arxiv.2405.05374 https://arxiv.org/abs/2405.05374

Description
Summary:	This report describes the training dataset creation and recipe behind the family of \texttt{arctic-embed} text embedding models (a set of five models ranging from 22 to 334 million parameters with weights open-sourced under an Apache-2 license). At the time of their release, each model achieved state-of-the-art retrieval accuracy for models of their size on the MTEB Retrieval leaderboard, with the largest model, arctic-embed-l outperforming closed source embedding models such as Cohere's embed-v3 and Open AI's text-embed-3-large. In addition to the details of our training recipe, we have provided several informative ablation studies, which we believe are the cause of our model performance. ... : 17 pages, 11 Figures, 9 tables ...

Arctic-Embed: Scalable, Efficient, and Accurate Text Embedding Models ...

Similar Items