Use of unsupervised word classes for entity recognition: Application to the detection of disorders in clinical reports
International audience Unsupervised word classes induced from unannotated text corpora are increasingly used to help tasks addressed by supervised classification, such as standard named entity detection. This paper studies the contribution of unsupervised word classes to a medical entity detection t...
Main Authors: | , , |
---|---|
Other Authors: | , , |
Format: | Conference Object |
Language: | English |
Published: |
HAL CCSD
2014
|
Subjects: | |
Online Access: | https://hal.archives-ouvertes.fr/hal-01831242 https://hal.archives-ouvertes.fr/hal-01831242/document https://hal.archives-ouvertes.fr/hal-01831242/file/389_Paper.pdf |
id |
ftccsdartic:oai:HAL:hal-01831242v1 |
---|---|
record_format |
openpolar |
spelling |
ftccsdartic:oai:HAL:hal-01831242v1 2023-05-15T16:50:50+02:00 Use of unsupervised word classes for entity recognition: Application to the detection of disorders in clinical reports Chatzimina, Maria Evangelia Grouin, Cyril Zweigenbaum, Pierre Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI) Université Paris Saclay (COmUE)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université - UFR d'Ingénierie (UFR 919) Sorbonne Université (SU)-Sorbonne Université (SU)-Université Paris-Saclay-Université Paris-Sud - Paris 11 (UP11) Reykjavik, Iceland 2014-01-01 https://hal.archives-ouvertes.fr/hal-01831242 https://hal.archives-ouvertes.fr/hal-01831242/document https://hal.archives-ouvertes.fr/hal-01831242/file/389_Paper.pdf en eng HAL CCSD hal-01831242 https://hal.archives-ouvertes.fr/hal-01831242 https://hal.archives-ouvertes.fr/hal-01831242/document https://hal.archives-ouvertes.fr/hal-01831242/file/389_Paper.pdf info:eu-repo/semantics/OpenAccess International Conference on Language Resources and Evaluation https://hal.archives-ouvertes.fr/hal-01831242 International Conference on Language Resources and Evaluation, Jan 2014, Reykjavik, Iceland Clinical Texts Natural Language Processing Unsupervised Word Classes [INFO]Computer Science [cs] [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] info:eu-repo/semantics/conferenceObject Conference papers 2014 ftccsdartic 2021-12-19T02:16:57Z International audience Unsupervised word classes induced from unannotated text corpora are increasingly used to help tasks addressed by supervised classification, such as standard named entity detection. This paper studies the contribution of unsupervised word classes to a medical entity detection task with two specific objectives: How do unsupervised word classes compare to available knowledge-based semantic classes? Does syntactic information help produce unsupervised word classes with better properties? We design and test two syntax-based methods to produce word classes: one applies the Brown clustering algorithm to syntactic dependencies, the other collects latent categories created by a PCFG-LA parser. When added to non-semantic features, knowledge-based semantic classes gain 7.28 points of F-measure. In the same context, basic unsupervised word classes gain 4.16pt, reaching 60% of the contribution of knowledge-based semantic classes and outperforming Wikipedia, and adding PCFG-LA unsupervised word classes gain one more point at 5.11pt, reaching 70%. Unsupervised word classes could therefore provide a useful semantic back-off in domains where no knowledge-based semantic classes are available. The combination of both knowledge-based and basic unsupervised classes gains 8.33pt. Therefore, unsupervised classes are still useful even when rich knowledge-based classes exist. Conference Object Iceland Archive ouverte HAL (Hyper Article en Ligne, CCSD - Centre pour la Communication Scientifique Directe) |
institution |
Open Polar |
collection |
Archive ouverte HAL (Hyper Article en Ligne, CCSD - Centre pour la Communication Scientifique Directe) |
op_collection_id |
ftccsdartic |
language |
English |
topic |
Clinical Texts Natural Language Processing Unsupervised Word Classes [INFO]Computer Science [cs] [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] |
spellingShingle |
Clinical Texts Natural Language Processing Unsupervised Word Classes [INFO]Computer Science [cs] [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] Chatzimina, Maria Evangelia Grouin, Cyril Zweigenbaum, Pierre Use of unsupervised word classes for entity recognition: Application to the detection of disorders in clinical reports |
topic_facet |
Clinical Texts Natural Language Processing Unsupervised Word Classes [INFO]Computer Science [cs] [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] |
description |
International audience Unsupervised word classes induced from unannotated text corpora are increasingly used to help tasks addressed by supervised classification, such as standard named entity detection. This paper studies the contribution of unsupervised word classes to a medical entity detection task with two specific objectives: How do unsupervised word classes compare to available knowledge-based semantic classes? Does syntactic information help produce unsupervised word classes with better properties? We design and test two syntax-based methods to produce word classes: one applies the Brown clustering algorithm to syntactic dependencies, the other collects latent categories created by a PCFG-LA parser. When added to non-semantic features, knowledge-based semantic classes gain 7.28 points of F-measure. In the same context, basic unsupervised word classes gain 4.16pt, reaching 60% of the contribution of knowledge-based semantic classes and outperforming Wikipedia, and adding PCFG-LA unsupervised word classes gain one more point at 5.11pt, reaching 70%. Unsupervised word classes could therefore provide a useful semantic back-off in domains where no knowledge-based semantic classes are available. The combination of both knowledge-based and basic unsupervised classes gains 8.33pt. Therefore, unsupervised classes are still useful even when rich knowledge-based classes exist. |
author2 |
Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI) Université Paris Saclay (COmUE)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université - UFR d'Ingénierie (UFR 919) Sorbonne Université (SU)-Sorbonne Université (SU)-Université Paris-Saclay-Université Paris-Sud - Paris 11 (UP11) |
format |
Conference Object |
author |
Chatzimina, Maria Evangelia Grouin, Cyril Zweigenbaum, Pierre |
author_facet |
Chatzimina, Maria Evangelia Grouin, Cyril Zweigenbaum, Pierre |
author_sort |
Chatzimina, Maria Evangelia |
title |
Use of unsupervised word classes for entity recognition: Application to the detection of disorders in clinical reports |
title_short |
Use of unsupervised word classes for entity recognition: Application to the detection of disorders in clinical reports |
title_full |
Use of unsupervised word classes for entity recognition: Application to the detection of disorders in clinical reports |
title_fullStr |
Use of unsupervised word classes for entity recognition: Application to the detection of disorders in clinical reports |
title_full_unstemmed |
Use of unsupervised word classes for entity recognition: Application to the detection of disorders in clinical reports |
title_sort |
use of unsupervised word classes for entity recognition: application to the detection of disorders in clinical reports |
publisher |
HAL CCSD |
publishDate |
2014 |
url |
https://hal.archives-ouvertes.fr/hal-01831242 https://hal.archives-ouvertes.fr/hal-01831242/document https://hal.archives-ouvertes.fr/hal-01831242/file/389_Paper.pdf |
op_coverage |
Reykjavik, Iceland |
genre |
Iceland |
genre_facet |
Iceland |
op_source |
International Conference on Language Resources and Evaluation https://hal.archives-ouvertes.fr/hal-01831242 International Conference on Language Resources and Evaluation, Jan 2014, Reykjavik, Iceland |
op_relation |
hal-01831242 https://hal.archives-ouvertes.fr/hal-01831242 https://hal.archives-ouvertes.fr/hal-01831242/document https://hal.archives-ouvertes.fr/hal-01831242/file/389_Paper.pdf |
op_rights |
info:eu-repo/semantics/OpenAccess |
_version_ |
1766040961453916160 |