Aligning language models to professional domains using preference training

Recent research has shown that utilizing preference training methods, such as reinforcement learning from human feedback (RLHF) or direct preference optimization (DPO), can significantly improve the alignment of models with user intent and linguistic requirements. By using these training methods, sm...

Full description

Bibliographic Details
Main Author:	Þórir Hrafn Harðarson 1981-
Other Authors:	Háskólinn í Reykjavík
Format:	Master Thesis
Language:	English
Published:	2024
Subjects:	Tölvunarfræði Meistaraprófsritgerðir Computer science Iceland
Online Access:	http://hdl.handle.net/1946/47687

_version_	1821555815661699072
author	Þórir Hrafn Harðarson 1981-
author2	Háskólinn í Reykjavík
author_facet	Þórir Hrafn Harðarson 1981-
author_sort	Þórir Hrafn Harðarson 1981-
collection	Skemman (Iceland)
description	Recent research has shown that utilizing preference training methods, such as reinforcement learning from human feedback (RLHF) or direct preference optimization (DPO), can significantly improve the alignment of models with user intent and linguistic requirements. By using these training methods, smaller models have also been shown to be able to produce solutions that are preferred over those of larger models that have not been aligned. Implementing preference training requires domain-specific data where humans rank generated outputs based on preference, a process that can be both costly and time-consuming. However, by assuming that a model instruction fine-tuned with labelled data will not be able to outperform a human domain expert, a pairwise comparison dataset can be created from the model's output and the human-generated label, thereby simplifying the training process. These approaches were applied to domain-specific datasets created by collecting court rulings from the Supreme Court of Iceland, along with summaries of those rulings. Models were then trained to perform the downstream task of generating summaries of court rulings, challenging their ability to create comprehensive, legally sound texts in Icelandic. Preliminary results suggest that by using training data created from this method to perform preference training, a model is able to improve its generative output beyond those capabilities gained by only using supervised fine-tuning. Further research is needed to get more conclusive results on potential performance gains by using preference training for domain-specific downstream tasks.
format	Master Thesis
genre	Iceland
genre_facet	Iceland
id	ftskemman:oai:skemman.is:1946/47687
institution	Open Polar
language	English
op_collection_id	ftskemman
op_relation	http://hdl.handle.net/1946/47687
publishDate	2024
record_format	openpolar
spelling	ftskemman:oai:skemman.is:1946/47687 2025-01-16T22:39:07+00:00 Aligning language models to professional domains using preference training Þórir Hrafn Harðarson 1981- Háskólinn í Reykjavík 2024-06 application/pdf http://hdl.handle.net/1946/47687 en eng http://hdl.handle.net/1946/47687 Tölvunarfræði Meistaraprófsritgerðir Computer science Thesis Master's 2024 ftskemman 2024-06-18T14:24:10Z Recent research has shown that utilizing preference training methods, such as reinforcement learning from human feedback (RLHF) or direct preference optimization (DPO), can significantly improve the alignment of models with user intent and linguistic requirements. By using these training methods, smaller models have also been shown to be able to produce solutions that are preferred over those of larger models that have not been aligned. Implementing preference training requires domain-specific data where humans rank generated outputs based on preference, a process that can be both costly and time-consuming. However, by assuming that a model instruction fine-tuned with labelled data will not be able to outperform a human domain expert, a pairwise comparison dataset can be created from the model's output and the human-generated label, thereby simplifying the training process. These approaches were applied to domain-specific datasets created by collecting court rulings from the Supreme Court of Iceland, along with summaries of those rulings. Models were then trained to perform the downstream task of generating summaries of court rulings, challenging their ability to create comprehensive, legally sound texts in Icelandic. Preliminary results suggest that by using training data created from this method to perform preference training, a model is able to improve its generative output beyond those capabilities gained by only using supervised fine-tuning. Further research is needed to get more conclusive results on potential performance gains by using preference training for domain-specific downstream tasks. Master Thesis Iceland Skemman (Iceland)
spellingShingle	Tölvunarfræði Meistaraprófsritgerðir Computer science Þórir Hrafn Harðarson 1981- Aligning language models to professional domains using preference training
title	Aligning language models to professional domains using preference training
title_full	Aligning language models to professional domains using preference training
title_fullStr	Aligning language models to professional domains using preference training
title_full_unstemmed	Aligning language models to professional domains using preference training
title_short	Aligning language models to professional domains using preference training
title_sort	aligning language models to professional domains using preference training
topic	Tölvunarfræði Meistaraprófsritgerðir Computer science
topic_facet	Tölvunarfræði Meistaraprófsritgerðir Computer science
url	http://hdl.handle.net/1946/47687

Aligning language models to professional domains using preference training

Similar Items