CWALM – A lexical corpus-based model of Contemporary Written Arabic

CWALM – A lexical corpus-based model of Contemporary Written Arabic is a project aiming at creating a lexicographic resource for Contemporary Written Arabic (CWA) that takes into account materials whose features are found in real-world Arabic written texts, regardless of a preliminary classification on the basis of their linguistic nature.

CWALM provided the scientific community of Social Sciences and Humanities with:

a new theoretical approach that overcomes the traditional description of the Arabic linguistic system in terms of diglossia and interprets Arabic as a linguistic complex;

a final test model that aims to be the first large-scale validated CWA resource providing objective and substantial data to test competing theories on the linguistic status of the Arabic language and prove the extensibility of the model to the complete coverage of CWA;

a lexicographic resource that, in the long term, may have a positive social impact on the inclusion of Arabic-speaking communities and play a crucial role in fostering social dialogue with and within Arabic-speaking minorities in Italy.

CWALM was co-funded by the Italian Ministry of University and Research under the PRIN 2020 Funding Programme and boasted the participation of the Roma Tre University, the Institute for Computational Linguistics “A. Zampolli” of the National Research Council of Italy and the Free University of Languages and Communication IULM.

CWALM started on 1^st June 2022 and ended on 31^st May 2025.