Aligning language models to professional domains using preference training
Recent research has shown that utilizing preference training methods, such as reinforcement learning from human feedback (RLHF) or direct preference optimization (DPO), can significantly improve the alignment of models with user intent and linguistic requirements. By using these training methods, sm...
Main Author: | |
---|---|
Other Authors: | |
Format: | Master Thesis |
Language: | English |
Published: |
2024
|
Subjects: | |
Online Access: | http://hdl.handle.net/1946/47687 |
_version_ | 1821555815661699072 |
---|---|
author | Þórir Hrafn Harðarson 1981- |
author2 | Háskólinn í Reykjavík |
author_facet | Þórir Hrafn Harðarson 1981- |
author_sort | Þórir Hrafn Harðarson 1981- |
collection | Skemman (Iceland) |
description | Recent research has shown that utilizing preference training methods, such as reinforcement learning from human feedback (RLHF) or direct preference optimization (DPO), can significantly improve the alignment of models with user intent and linguistic requirements. By using these training methods, smaller models have also been shown to be able to produce solutions that are preferred over those of larger models that have not been aligned. Implementing preference training requires domain-specific data where humans rank generated outputs based on preference, a process that can be both costly and time-consuming. However, by assuming that a model instruction fine-tuned with labelled data will not be able to outperform a human domain expert, a pairwise comparison dataset can be created from the model's output and the human-generated label, thereby simplifying the training process. These approaches were applied to domain-specific datasets created by collecting court rulings from the Supreme Court of Iceland, along with summaries of those rulings. Models were then trained to perform the downstream task of generating summaries of court rulings, challenging their ability to create comprehensive, legally sound texts in Icelandic. Preliminary results suggest that by using training data created from this method to perform preference training, a model is able to improve its generative output beyond those capabilities gained by only using supervised fine-tuning. Further research is needed to get more conclusive results on potential performance gains by using preference training for domain-specific downstream tasks. |
format | Master Thesis |
genre | Iceland |
genre_facet | Iceland |
id | ftskemman:oai:skemman.is:1946/47687 |
institution | Open Polar |
language | English |
op_collection_id | ftskemman |
op_relation | http://hdl.handle.net/1946/47687 |
publishDate | 2024 |
record_format | openpolar |
spelling | ftskemman:oai:skemman.is:1946/47687 2025-01-16T22:39:07+00:00 Aligning language models to professional domains using preference training Þórir Hrafn Harðarson 1981- Háskólinn í Reykjavík 2024-06 application/pdf http://hdl.handle.net/1946/47687 en eng http://hdl.handle.net/1946/47687 Tölvunarfræði Meistaraprófsritgerðir Computer science Thesis Master's 2024 ftskemman 2024-06-18T14:24:10Z Recent research has shown that utilizing preference training methods, such as reinforcement learning from human feedback (RLHF) or direct preference optimization (DPO), can significantly improve the alignment of models with user intent and linguistic requirements. By using these training methods, smaller models have also been shown to be able to produce solutions that are preferred over those of larger models that have not been aligned. Implementing preference training requires domain-specific data where humans rank generated outputs based on preference, a process that can be both costly and time-consuming. However, by assuming that a model instruction fine-tuned with labelled data will not be able to outperform a human domain expert, a pairwise comparison dataset can be created from the model's output and the human-generated label, thereby simplifying the training process. These approaches were applied to domain-specific datasets created by collecting court rulings from the Supreme Court of Iceland, along with summaries of those rulings. Models were then trained to perform the downstream task of generating summaries of court rulings, challenging their ability to create comprehensive, legally sound texts in Icelandic. Preliminary results suggest that by using training data created from this method to perform preference training, a model is able to improve its generative output beyond those capabilities gained by only using supervised fine-tuning. Further research is needed to get more conclusive results on potential performance gains by using preference training for domain-specific downstream tasks. Master Thesis Iceland Skemman (Iceland) |
spellingShingle | Tölvunarfræði Meistaraprófsritgerðir Computer science Þórir Hrafn Harðarson 1981- Aligning language models to professional domains using preference training |
title | Aligning language models to professional domains using preference training |
title_full | Aligning language models to professional domains using preference training |
title_fullStr | Aligning language models to professional domains using preference training |
title_full_unstemmed | Aligning language models to professional domains using preference training |
title_short | Aligning language models to professional domains using preference training |
title_sort | aligning language models to professional domains using preference training |
topic | Tölvunarfræði Meistaraprófsritgerðir Computer science |
topic_facet | Tölvunarfræði Meistaraprófsritgerðir Computer science |
url | http://hdl.handle.net/1946/47687 |