Critical Survey of the Freely Available Arabic Corpora

The availability of corpora is a major factor in building natural language processing applications. However, the costs of acquiring corpora can prevent some researchers from going further in their endeavours. The ease of access to freely available corpora is urgent needed in the NLP research communi...

Full description

Bibliographic Details
Main Author: Zaghouani, Wajdi
Format: Report
Language:unknown
Published: arXiv 2017
Subjects:
Online Access:https://dx.doi.org/10.48550/arxiv.1702.07835
https://arxiv.org/abs/1702.07835
id ftdatacite:10.48550/arxiv.1702.07835
record_format openpolar
spelling ftdatacite:10.48550/arxiv.1702.07835 2023-05-15T16:49:34+02:00 Critical Survey of the Freely Available Arabic Corpora Zaghouani, Wajdi 2017 https://dx.doi.org/10.48550/arxiv.1702.07835 https://arxiv.org/abs/1702.07835 unknown arXiv Creative Commons Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/legalcode cc-by-4.0 CC-BY Computation and Language cs.CL FOS Computer and information sciences Preprint Article article CreativeWork 2017 ftdatacite https://doi.org/10.48550/arxiv.1702.07835 2022-04-01T10:48:08Z The availability of corpora is a major factor in building natural language processing applications. However, the costs of acquiring corpora can prevent some researchers from going further in their endeavours. The ease of access to freely available corpora is urgent needed in the NLP research community especially for language such as Arabic. Currently, there is not easy was to access to a comprehensive and updated list of freely available Arabic corpora. We present in this paper, the results of a recent survey conducted to identify the list of the freely available Arabic corpora and language resources. Our preliminary results showed an initial list of 66 sources. We presents our findings in the various categories studied and we provided the direct links to get the data when possible. : Published in the Proceedings of the International Conference on Language Resources and Evaluation (LREC'2014), OSACT Workshop. Reykjavik, Iceland, 26-31 May 2014 Report Iceland DataCite Metadata Store (German National Library of Science and Technology)
institution Open Polar
collection DataCite Metadata Store (German National Library of Science and Technology)
op_collection_id ftdatacite
language unknown
topic Computation and Language cs.CL
FOS Computer and information sciences
spellingShingle Computation and Language cs.CL
FOS Computer and information sciences
Zaghouani, Wajdi
Critical Survey of the Freely Available Arabic Corpora
topic_facet Computation and Language cs.CL
FOS Computer and information sciences
description The availability of corpora is a major factor in building natural language processing applications. However, the costs of acquiring corpora can prevent some researchers from going further in their endeavours. The ease of access to freely available corpora is urgent needed in the NLP research community especially for language such as Arabic. Currently, there is not easy was to access to a comprehensive and updated list of freely available Arabic corpora. We present in this paper, the results of a recent survey conducted to identify the list of the freely available Arabic corpora and language resources. Our preliminary results showed an initial list of 66 sources. We presents our findings in the various categories studied and we provided the direct links to get the data when possible. : Published in the Proceedings of the International Conference on Language Resources and Evaluation (LREC'2014), OSACT Workshop. Reykjavik, Iceland, 26-31 May 2014
format Report
author Zaghouani, Wajdi
author_facet Zaghouani, Wajdi
author_sort Zaghouani, Wajdi
title Critical Survey of the Freely Available Arabic Corpora
title_short Critical Survey of the Freely Available Arabic Corpora
title_full Critical Survey of the Freely Available Arabic Corpora
title_fullStr Critical Survey of the Freely Available Arabic Corpora
title_full_unstemmed Critical Survey of the Freely Available Arabic Corpora
title_sort critical survey of the freely available arabic corpora
publisher arXiv
publishDate 2017
url https://dx.doi.org/10.48550/arxiv.1702.07835
https://arxiv.org/abs/1702.07835
genre Iceland
genre_facet Iceland
op_rights Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
cc-by-4.0
op_rightsnorm CC-BY
op_doi https://doi.org/10.48550/arxiv.1702.07835
_version_ 1766039695482945536