A Generic and Flexible Framework for Selecting Correspondences in Matching and Alignment Problems

International audience The Web 2.0 and the inexpensive cost of storage have pushed towards an exponential growth in the volume of collected and produced data. However, the integration of distributed and heterogeneous data sources has become the bottleneck for many applications, and it therefore stil...

Full description

Bibliographic Details
Main Author: Duchateau, Fabien
Other Authors: Base de Données (BD), Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), Université Lumière - Lyon 2 (UL2)-École Centrale de Lyon (ECL), Université de Lyon-Université de Lyon-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Lumière - Lyon 2 (UL2)-École Centrale de Lyon (ECL), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS), Markus Helfert, Chiara Francalanci, Joaquim Filipe eds
Format: Conference Object
Language:English
Published: HAL CCSD 2013
Subjects:
Online Access:https://hal.science/hal-01155475
https://hal.science/hal-01155475/document
https://hal.science/hal-01155475/file/duchateau-data13.pdf
Description
Summary:International audience The Web 2.0 and the inexpensive cost of storage have pushed towards an exponential growth in the volume of collected and produced data. However, the integration of distributed and heterogeneous data sources has become the bottleneck for many applications, and it therefore still largely relies on manual tasks. One of this task, named matching or alignment, is the discovery of correspondences, i.e., semantically-equivalent elements in different data sources. Most approaches which attempt to solve this challenge face the issue of deciding whether a pair of elements is a correspondence or not, given the similarity value(s) computed for this pair. In this paper, we propose a generic and flexible framework for selecting the correspondences by relying on the discriminative similarity values for a pair. Running experiments on a public dataset has demonstrated the im-provment in terms of quality and the robustness for adding new similarity measures without user intervention for tuning.