Use of unsupervised word classes for entity recognition: Application to the detection of disorders in clinical reports

International audience Unsupervised word classes induced from unannotated text corpora are increasingly used to help tasks addressed by supervised classification, such as standard named entity detection. This paper studies the contribution of unsupervised word classes to a medical entity detection t...

Full description

Bibliographic Details
Main Authors: Chatzimina, Maria Evangelia, Grouin, Cyril, Zweigenbaum, Pierre
Other Authors: Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI), Université Paris Saclay (COmUE)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université - UFR d'Ingénierie (UFR 919), Sorbonne Université (SU)-Sorbonne Université (SU)-Université Paris-Saclay-Université Paris-Sud - Paris 11 (UP11)
Format: Conference Object
Language:English
Published: HAL CCSD 2014
Subjects:
Online Access:https://hal.archives-ouvertes.fr/hal-01831242
https://hal.archives-ouvertes.fr/hal-01831242/document
https://hal.archives-ouvertes.fr/hal-01831242/file/389_Paper.pdf
id ftccsdartic:oai:HAL:hal-01831242v1
record_format openpolar
spelling ftccsdartic:oai:HAL:hal-01831242v1 2023-05-15T16:50:50+02:00 Use of unsupervised word classes for entity recognition: Application to the detection of disorders in clinical reports Chatzimina, Maria Evangelia Grouin, Cyril Zweigenbaum, Pierre Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI) Université Paris Saclay (COmUE)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université - UFR d'Ingénierie (UFR 919) Sorbonne Université (SU)-Sorbonne Université (SU)-Université Paris-Saclay-Université Paris-Sud - Paris 11 (UP11) Reykjavik, Iceland 2014-01-01 https://hal.archives-ouvertes.fr/hal-01831242 https://hal.archives-ouvertes.fr/hal-01831242/document https://hal.archives-ouvertes.fr/hal-01831242/file/389_Paper.pdf en eng HAL CCSD hal-01831242 https://hal.archives-ouvertes.fr/hal-01831242 https://hal.archives-ouvertes.fr/hal-01831242/document https://hal.archives-ouvertes.fr/hal-01831242/file/389_Paper.pdf info:eu-repo/semantics/OpenAccess International Conference on Language Resources and Evaluation https://hal.archives-ouvertes.fr/hal-01831242 International Conference on Language Resources and Evaluation, Jan 2014, Reykjavik, Iceland Clinical Texts Natural Language Processing Unsupervised Word Classes [INFO]Computer Science [cs] [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] info:eu-repo/semantics/conferenceObject Conference papers 2014 ftccsdartic 2021-12-19T02:16:57Z International audience Unsupervised word classes induced from unannotated text corpora are increasingly used to help tasks addressed by supervised classification, such as standard named entity detection. This paper studies the contribution of unsupervised word classes to a medical entity detection task with two specific objectives: How do unsupervised word classes compare to available knowledge-based semantic classes? Does syntactic information help produce unsupervised word classes with better properties? We design and test two syntax-based methods to produce word classes: one applies the Brown clustering algorithm to syntactic dependencies, the other collects latent categories created by a PCFG-LA parser. When added to non-semantic features, knowledge-based semantic classes gain 7.28 points of F-measure. In the same context, basic unsupervised word classes gain 4.16pt, reaching 60% of the contribution of knowledge-based semantic classes and outperforming Wikipedia, and adding PCFG-LA unsupervised word classes gain one more point at 5.11pt, reaching 70%. Unsupervised word classes could therefore provide a useful semantic back-off in domains where no knowledge-based semantic classes are available. The combination of both knowledge-based and basic unsupervised classes gains 8.33pt. Therefore, unsupervised classes are still useful even when rich knowledge-based classes exist. Conference Object Iceland Archive ouverte HAL (Hyper Article en Ligne, CCSD - Centre pour la Communication Scientifique Directe)
institution Open Polar
collection Archive ouverte HAL (Hyper Article en Ligne, CCSD - Centre pour la Communication Scientifique Directe)
op_collection_id ftccsdartic
language English
topic Clinical Texts
Natural Language Processing
Unsupervised Word Classes
[INFO]Computer Science [cs]
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
spellingShingle Clinical Texts
Natural Language Processing
Unsupervised Word Classes
[INFO]Computer Science [cs]
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
Chatzimina, Maria Evangelia
Grouin, Cyril
Zweigenbaum, Pierre
Use of unsupervised word classes for entity recognition: Application to the detection of disorders in clinical reports
topic_facet Clinical Texts
Natural Language Processing
Unsupervised Word Classes
[INFO]Computer Science [cs]
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
description International audience Unsupervised word classes induced from unannotated text corpora are increasingly used to help tasks addressed by supervised classification, such as standard named entity detection. This paper studies the contribution of unsupervised word classes to a medical entity detection task with two specific objectives: How do unsupervised word classes compare to available knowledge-based semantic classes? Does syntactic information help produce unsupervised word classes with better properties? We design and test two syntax-based methods to produce word classes: one applies the Brown clustering algorithm to syntactic dependencies, the other collects latent categories created by a PCFG-LA parser. When added to non-semantic features, knowledge-based semantic classes gain 7.28 points of F-measure. In the same context, basic unsupervised word classes gain 4.16pt, reaching 60% of the contribution of knowledge-based semantic classes and outperforming Wikipedia, and adding PCFG-LA unsupervised word classes gain one more point at 5.11pt, reaching 70%. Unsupervised word classes could therefore provide a useful semantic back-off in domains where no knowledge-based semantic classes are available. The combination of both knowledge-based and basic unsupervised classes gains 8.33pt. Therefore, unsupervised classes are still useful even when rich knowledge-based classes exist.
author2 Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI)
Université Paris Saclay (COmUE)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université - UFR d'Ingénierie (UFR 919)
Sorbonne Université (SU)-Sorbonne Université (SU)-Université Paris-Saclay-Université Paris-Sud - Paris 11 (UP11)
format Conference Object
author Chatzimina, Maria Evangelia
Grouin, Cyril
Zweigenbaum, Pierre
author_facet Chatzimina, Maria Evangelia
Grouin, Cyril
Zweigenbaum, Pierre
author_sort Chatzimina, Maria Evangelia
title Use of unsupervised word classes for entity recognition: Application to the detection of disorders in clinical reports
title_short Use of unsupervised word classes for entity recognition: Application to the detection of disorders in clinical reports
title_full Use of unsupervised word classes for entity recognition: Application to the detection of disorders in clinical reports
title_fullStr Use of unsupervised word classes for entity recognition: Application to the detection of disorders in clinical reports
title_full_unstemmed Use of unsupervised word classes for entity recognition: Application to the detection of disorders in clinical reports
title_sort use of unsupervised word classes for entity recognition: application to the detection of disorders in clinical reports
publisher HAL CCSD
publishDate 2014
url https://hal.archives-ouvertes.fr/hal-01831242
https://hal.archives-ouvertes.fr/hal-01831242/document
https://hal.archives-ouvertes.fr/hal-01831242/file/389_Paper.pdf
op_coverage Reykjavik, Iceland
genre Iceland
genre_facet Iceland
op_source International Conference on Language Resources and Evaluation
https://hal.archives-ouvertes.fr/hal-01831242
International Conference on Language Resources and Evaluation, Jan 2014, Reykjavik, Iceland
op_relation hal-01831242
https://hal.archives-ouvertes.fr/hal-01831242
https://hal.archives-ouvertes.fr/hal-01831242/document
https://hal.archives-ouvertes.fr/hal-01831242/file/389_Paper.pdf
op_rights info:eu-repo/semantics/OpenAccess
_version_ 1766040961453916160