Representation learning for domain adaptation and cross-modal retrieval

Most machine learning applications involve a domain shift between data on which a model has initially been trained and data from a similar but different domain to which the model is later applied on. Applications range from human computer interaction (e.g., humans with different characteristics for...

Full description

Bibliographic Details
Main Author: Ott, Felix
Format: Thesis
Language:unknown
Published: Ludwig-Maximilians-Universität München 2023
Subjects:
DML
Online Access:https://edoc.ub.uni-muenchen.de/32467/1/Ott_Felix.pdf
http://nbn-resolving.de/urn:nbn:de:bvb:19-324674
Description
Summary:Most machine learning applications involve a domain shift between data on which a model has initially been trained and data from a similar but different domain to which the model is later applied on. Applications range from human computer interaction (e.g., humans with different characteristics for speech or handwriting recognition), computer vision (e.g., a change of weather conditions or objects in the environment for visual self-localization), and neural language processing (e.g., switching between different languages). Another related field is cross-modal retrieval, which aims to efficiently extract information from various modalities. In this field, the data can exhibit variations between each modality. Such variations in data between the modalities can negatively impact the performance of the model. To reduce the impact of domain shift, methods search for an optimal transformation from the source to the target domain or an optimal alignment of modalities to learn a domain-invariant representation that is not affected by domain differences. The alignment of features of various data sources that are affected by domain shift requires representation learning techniques. These techniques are used to learn a meaningful representation that can be interpreted, or that includes latent features through the use of deep metric learning (DML). DML minimizes the distance between features by using the standard Euclidean loss, maximizes the similarity of features through cross correlation, or decreases the discrepancy of higher-order statistics like the maximum mean discrepancy. A similar but distinct field is pairwise learning and contrastive learning, which also employs DML. Contrastive learning not only aligns the features of data input pairs that have the same class label, but also increases the distance between pairs that have similar but different labels, thus enhancing the training process. This research presents techniques for domain adaptation and cross-modal retrieval that specifically focus on the following two ...