Aligning and Using an English-Innukitut Parallel Corpus

A parallel corpus of texts in English and in Inuktitut, an Inuit language, is presented. These texts are from the Nunavut Hansards. The parallel texts are processed in two phases, the sentence alignment phase and the word correspondence phase. Our sentence alignment technique achieves a precision of...

Full description

Bibliographic Details
Published in:Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts data driven machine translation and beyond -
Main Authors: Martin, Joel, Johnson, Howard, Farley, Benoît, Maclachlan, Anna
Format: Article in Journal/Newspaper
Language:English
Published: 2003
Subjects:
Online Access:https://doi.org/10.3115/1118905.1118925
https://nrc-publications.canada.ca/eng/view/object/?id=bce8df0d-20c8-4b42-a200-223ed4fb92b3
https://nrc-publications.canada.ca/fra/voir/objet/?id=bce8df0d-20c8-4b42-a200-223ed4fb92b3
id ftnrccanada:oai:cisti-icist.nrc-cnrc.ca:cistinparc:5765030
record_format openpolar
spelling ftnrccanada:oai:cisti-icist.nrc-cnrc.ca:cistinparc:5765030 2023-05-15T16:55:04+02:00 Aligning and Using an English-Innukitut Parallel Corpus Martin, Joel Johnson, Howard Farley, Benoît Maclachlan, Anna 2003 text https://doi.org/10.3115/1118905.1118925 https://nrc-publications.canada.ca/eng/view/object/?id=bce8df0d-20c8-4b42-a200-223ed4fb92b3 https://nrc-publications.canada.ca/fra/voir/objet/?id=bce8df0d-20c8-4b42-a200-223ed4fb92b3 eng eng HLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond, HLT-NAACL-PARALLEL '03 : Human Language Technology and North American Chapter of Association of Computational Linguistics 2003, May 27 - June 1, 2003., Volume: 3, Publication date: 2003, Pages: 115–118 doi:10.3115/1118905.1118925 article 2003 ftnrccanada https://doi.org/10.3115/1118905.1118925 2021-09-01T06:21:19Z A parallel corpus of texts in English and in Inuktitut, an Inuit language, is presented. These texts are from the Nunavut Hansards. The parallel texts are processed in two phases, the sentence alignment phase and the word correspondence phase. Our sentence alignment technique achieves a precision of 91.4% and a recall of 92.3%. Our word correspondence technique is aimed at providing the broadest coverage collection of reliable pairs of Inuktitut and English morphemes for dictionary expansion. For an agglutinative language like Inuktitut, this entails considering substrings, not simply whole words. We employ a Pointwise Mutual Information method (PMI) and attain a coverage of 72.3% of English words and a precision of 87%. NRC publication: Yes Article in Journal/Newspaper inuit inuktitut Nunavut National Research Council Canada: NRC Publications Archive Nunavut Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts data driven machine translation and beyond - 3 115 118
institution Open Polar
collection National Research Council Canada: NRC Publications Archive
op_collection_id ftnrccanada
language English
description A parallel corpus of texts in English and in Inuktitut, an Inuit language, is presented. These texts are from the Nunavut Hansards. The parallel texts are processed in two phases, the sentence alignment phase and the word correspondence phase. Our sentence alignment technique achieves a precision of 91.4% and a recall of 92.3%. Our word correspondence technique is aimed at providing the broadest coverage collection of reliable pairs of Inuktitut and English morphemes for dictionary expansion. For an agglutinative language like Inuktitut, this entails considering substrings, not simply whole words. We employ a Pointwise Mutual Information method (PMI) and attain a coverage of 72.3% of English words and a precision of 87%. NRC publication: Yes
format Article in Journal/Newspaper
author Martin, Joel
Johnson, Howard
Farley, Benoît
Maclachlan, Anna
spellingShingle Martin, Joel
Johnson, Howard
Farley, Benoît
Maclachlan, Anna
Aligning and Using an English-Innukitut Parallel Corpus
author_facet Martin, Joel
Johnson, Howard
Farley, Benoît
Maclachlan, Anna
author_sort Martin, Joel
title Aligning and Using an English-Innukitut Parallel Corpus
title_short Aligning and Using an English-Innukitut Parallel Corpus
title_full Aligning and Using an English-Innukitut Parallel Corpus
title_fullStr Aligning and Using an English-Innukitut Parallel Corpus
title_full_unstemmed Aligning and Using an English-Innukitut Parallel Corpus
title_sort aligning and using an english-innukitut parallel corpus
publishDate 2003
url https://doi.org/10.3115/1118905.1118925
https://nrc-publications.canada.ca/eng/view/object/?id=bce8df0d-20c8-4b42-a200-223ed4fb92b3
https://nrc-publications.canada.ca/fra/voir/objet/?id=bce8df0d-20c8-4b42-a200-223ed4fb92b3
geographic Nunavut
geographic_facet Nunavut
genre inuit
inuktitut
Nunavut
genre_facet inuit
inuktitut
Nunavut
op_relation HLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond, HLT-NAACL-PARALLEL '03 : Human Language Technology and North American Chapter of Association of Computational Linguistics 2003, May 27 - June 1, 2003., Volume: 3, Publication date: 2003, Pages: 115–118
doi:10.3115/1118905.1118925
op_doi https://doi.org/10.3115/1118905.1118925
container_title Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts data driven machine translation and beyond -
container_volume 3
container_start_page 115
op_container_end_page 118
_version_ 1766046057085534208