Aligning language models to professional domains using preference training

Recent research has shown that utilizing preference training methods, such as reinforcement learning from human feedback (RLHF) or direct preference optimization (DPO), can significantly improve the alignment of models with user intent and linguistic requirements. By using these training methods, sm...

Full description

Bibliographic Details
Main Author: Þórir Hrafn Harðarson 1981-
Other Authors: Háskólinn í Reykjavík
Format: Master Thesis
Language:English
Published: 2024
Subjects:
Online Access:http://hdl.handle.net/1946/47687
_version_ 1821555815661699072
author Þórir Hrafn Harðarson 1981-
author2 Háskólinn í Reykjavík
author_facet Þórir Hrafn Harðarson 1981-
author_sort Þórir Hrafn Harðarson 1981-
collection Skemman (Iceland)
description Recent research has shown that utilizing preference training methods, such as reinforcement learning from human feedback (RLHF) or direct preference optimization (DPO), can significantly improve the alignment of models with user intent and linguistic requirements. By using these training methods, smaller models have also been shown to be able to produce solutions that are preferred over those of larger models that have not been aligned. Implementing preference training requires domain-specific data where humans rank generated outputs based on preference, a process that can be both costly and time-consuming. However, by assuming that a model instruction fine-tuned with labelled data will not be able to outperform a human domain expert, a pairwise comparison dataset can be created from the model's output and the human-generated label, thereby simplifying the training process. These approaches were applied to domain-specific datasets created by collecting court rulings from the Supreme Court of Iceland, along with summaries of those rulings. Models were then trained to perform the downstream task of generating summaries of court rulings, challenging their ability to create comprehensive, legally sound texts in Icelandic. Preliminary results suggest that by using training data created from this method to perform preference training, a model is able to improve its generative output beyond those capabilities gained by only using supervised fine-tuning. Further research is needed to get more conclusive results on potential performance gains by using preference training for domain-specific downstream tasks.
format Master Thesis
genre Iceland
genre_facet Iceland
id ftskemman:oai:skemman.is:1946/47687
institution Open Polar
language English
op_collection_id ftskemman
op_relation http://hdl.handle.net/1946/47687
publishDate 2024
record_format openpolar
spelling ftskemman:oai:skemman.is:1946/47687 2025-01-16T22:39:07+00:00 Aligning language models to professional domains using preference training Þórir Hrafn Harðarson 1981- Háskólinn í Reykjavík 2024-06 application/pdf http://hdl.handle.net/1946/47687 en eng http://hdl.handle.net/1946/47687 Tölvunarfræði Meistaraprófsritgerðir Computer science Thesis Master's 2024 ftskemman 2024-06-18T14:24:10Z Recent research has shown that utilizing preference training methods, such as reinforcement learning from human feedback (RLHF) or direct preference optimization (DPO), can significantly improve the alignment of models with user intent and linguistic requirements. By using these training methods, smaller models have also been shown to be able to produce solutions that are preferred over those of larger models that have not been aligned. Implementing preference training requires domain-specific data where humans rank generated outputs based on preference, a process that can be both costly and time-consuming. However, by assuming that a model instruction fine-tuned with labelled data will not be able to outperform a human domain expert, a pairwise comparison dataset can be created from the model's output and the human-generated label, thereby simplifying the training process. These approaches were applied to domain-specific datasets created by collecting court rulings from the Supreme Court of Iceland, along with summaries of those rulings. Models were then trained to perform the downstream task of generating summaries of court rulings, challenging their ability to create comprehensive, legally sound texts in Icelandic. Preliminary results suggest that by using training data created from this method to perform preference training, a model is able to improve its generative output beyond those capabilities gained by only using supervised fine-tuning. Further research is needed to get more conclusive results on potential performance gains by using preference training for domain-specific downstream tasks. Master Thesis Iceland Skemman (Iceland)
spellingShingle Tölvunarfræði
Meistaraprófsritgerðir
Computer science
Þórir Hrafn Harðarson 1981-
Aligning language models to professional domains using preference training
title Aligning language models to professional domains using preference training
title_full Aligning language models to professional domains using preference training
title_fullStr Aligning language models to professional domains using preference training
title_full_unstemmed Aligning language models to professional domains using preference training
title_short Aligning language models to professional domains using preference training
title_sort aligning language models to professional domains using preference training
topic Tölvunarfræði
Meistaraprófsritgerðir
Computer science
topic_facet Tölvunarfræði
Meistaraprófsritgerðir
Computer science
url http://hdl.handle.net/1946/47687