Facing the identification problem in language-related scientific data analysis

International audience This paper describes the problems that must be addressed when studying large amounts of data over time which require entity normalization applied not to the usual genres of news or political speech, but to the genre of academic discourse about language resources, technologies...

Full description

Bibliographic Details
Main Authors:	Mariani, Joseph, J, Cieri, Christopher, Francopoulo, Gil, Paroubek, Patrick, Delaborde, Marine
Other Authors:	Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI), Université Paris-Sud - Paris 11 (UP11)-Sorbonne Université - UFR d'Ingénierie (UFR 919), Sorbonne Université (SU)-Sorbonne Université (SU)-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Université Paris Saclay (COmUE)
Format:	Conference Object
Language:	English
Published:	HAL CCSD 2014
Subjects:	[INFO]Computer Science [cs] [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] Iceland
Online Access:	https://hal.science/hal-01840821

Description
Summary:	International audience This paper describes the problems that must be addressed when studying large amounts of data over time which require entity normalization applied not to the usual genres of news or political speech, but to the genre of academic discourse about language resources, technologies and sciences. It reports on the normalization processes that had to be applied to produce data usable for computing statistics in three past studies on the LRE Map, the ISCA Archive and the LDC Bibliography. It shows the need for human expertise during normalization and the necessity to adapt the work to the study objectives. It investigates possible improvements for reducing the workload necessary to produce comparable results. Through this paper, we show the necessity to define and agree on international persistent and unique identifiers.

Facing the identification problem in language-related scientific data analysis

Similar Items