Compilation of an Arabic children’s corpus

Al-Sulaiti, Latifa, Abbas, Noorhan ORCID logo ORCID: , Brierley, Claire, Atwell, Eric and Alghamdi, Ayman (2016) Compilation of an Arabic children’s corpus. In: Calzolari, Nicoletta, Choukri, Khalid, Declerck, Thierry, Goggi, Sara, Grobelnik, Marko, Maegaard, Bente, Mariani, Joseph, Mazo, Hélène, Moreno, Asunción, Odijk, Jan and Piperidis, Stelios, (eds.) Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), 23-28 May 2016, Portorož, Slovenia. European Language Resources Association (ELRA), Paris, France, pp. 1808-1812.

[thumbnail of Abbas_CompilationOfAn.pdf]
PDF - Published Version
Available under License CC BY-NC

Download (207kB) | Preview
Official URL:


Inspired by the Oxford Children's Corpus, we have developed a prototype corpus of Arabic texts written and/or selected for children. Our Arabic Children's Corpus of 2950 documents and nearly 2 million words has been collected manually from the web during a 3-month project. It is of high quality, and contains a range of different children's genres based on sources located, including classic tales from The Arabian Nights, and popular fictional characters such as Goha. We anticipate that the current and subsequent versions of our corpus will lead to interesting studies in text classification, language use, and ideology in children's texts.

Item Type: Book Section
Publisher: European Language Resources Association (ELRA)
ISBN: 9782951740891
Departments: Academic Departments > Business, Law, Policing & Social Sciences (BLPSS) > Policing, Criminology & Social Sciences
Depositing User: Anna Lupton
Date Deposited: 10 Aug 2018 11:16
Last Modified: 12 Jan 2024 15:02


Downloads per month over past year

Downloads each year

Edit Item