WP1 – Project Management


CNR-ILC, IULM University


  • to ensure the effective planning, implementation, coordination and achievement of the project activities, including the timely production of deliverables, the successful completion of the tasks and their evaluation
  • to provide the project structure and support to assist decision making and internal and external communications, encourage greater accountability and control and minimize risks


  • T1.1 – Planning
    Project detailed planning and scheduling, including tasks, responsibilities and timescales.
  • T1.2 – Management Activities
    General project management activities: establishment of project communications protocol and management of internal and external communication strategies; adaptation and review of the Work Plan in line with the monitoring of feedback and a liaison with other partners; maintenance of the current financial records at the project level.
  • T1.3 – Reporting
    Establishment of the reporting structure; collection and consolidation of the project deliverables; verification of the justification of the expenditure documentation related to the cost claims of each project participant and of the project consortium as a whole; provision of an overview of the progress of the project and of its results.


  • D1.1 – Periodic Activity Report (M12, M24, M36)
    This report, to be updated every 12 month, will include: activities carried out by the project consortium; progress against the project objectives, milestones and deliverables; problems encountered and corrective actions taken; update of the plan for the use and dissemination of knowledge.
  • D1.2 – Periodic Management Report (M12, M24, M36)
    This report, to be updated every 12 month, will include: justification of the resources used within the project consortium; costs claimed by the project participants; distribution of the financial contribution among the project participants.

WP2 – Collection of the Corpus and Transcription and Annotation of the Subcorpus

Roma 3

CNR-ILC, IULM University


  • actual population of the model designed with corpus and lexical data
    (Roma3 and IULM University will deal respectively with Eastern and Western colloquial varieties data and jointly with Egyptian data | CNR-ILC will deal with semi-automatic annotation)


  • T2.1 – Transcription of the Subcorpus (Roma 3, IULM University)
    Definition of specific pre-analytical criteria in order to assure a reliable process of collection to get a representative CWA corpus; transcription of a subcorpus containing the most frequent lexical items.
  • T2.2 – Semi-automatic Annotation of the Subcorpus (CNR-ILC)
    The subcorpus will be annotation through software tools and revised according to detection warnings; a subset of the subcorpus will be randomly selected for manual correction (gold standard) and automatically projected on the rest of the subcorpus (silver standard); T2.2 will overlap with T4.3 in order to provide feedback to the annotation process from its revision.
  • T2.3 – Revision of Annotation (Roma3, CNR-ILC, IULM University)
    Semi-automatic annotation will be revised in order to reduce the error rate and establish good practices to be fed back in the semi-automatic process; revision for specific local varieties requiring external expertise will be performed in collaboration with ISMEO (International Association for Studies on the Mediterranean and the Orient).


  • D2.1 – Report on Transcription (M3)
  • D2.2 – Collected Corpus (M12)
  • D2.3 – List of Most Frequent Items (M12)
  • D2.4 – Annotated Sub-corpus (M24)

WP3 – Model Definition and Tool Development


Roma3, IULM University


  • construction of a multi-dialectal lexical resource of Arabic
  • setting up of the digital environment, including a Web platform for collaborative annotation of texts, as well as the lexicographic model and tools
  • definition of a theoretical model aiming at avoiding the ideological approach to linguistic analysis of the Arabic-speaking world


  • T3.1 – Development of the Web Platform for Collaborative Annotation of Texts (CNR-ILC, Roma 3, IULM University)
    The project Web platform for collaborative annotation will be ready in its prototype form after the first semester of the project and in its first fully operational version after the first year of the project.
  • T3.2 – Definition of the Lexicographic Model (CNR-ILC, Roma 3, IULM University)
    To represent the lexical data correctly and formally, a lexicographic model will have to be defined; the project consortium has planned to adopt the Lemon model, which will be extended on the basis of the peculiarities of CWA.
  • T3.3 – Customization of the LexO Editor (CNR-ILC)
    The collaborative Web editor for lexical resources developed by CNR-ILC – named LexO – will be customized and extended to manage the model produced in T3.1 and allow users to build up the resource and link each lemma to the text(s) it has been extracted from.
  • T3.4 – Release of the Interface (CNR-ILC)
    An interface for accessing the resource will be developed; the interface will be publicly accessible via Web and will allow users to browse advanced textual, semantic and ontological search features.


  • D3.1 – Web Platform for Collaborative Annotation of Texts (M6)
  • D3.2 – Lexicographic Model (M6)
  • D3.3 – First Version of the Termino-ontological Resource (M18)
  • D3.4 – Web Interface for the Browsing of the Resource and Advance Research Tools (M36)

WP4 – Lemma Selection and Encoding of the Lexicon

Roma 3

CNR-ILC, IULM University


  • definition of criteria for the choice of the lemmas


  • T4.1 – Criteria for the Selection of Lemmas (Roma 3, CNR-ILC, IULM University)
    Definition of the criteria to be adopted in the selection of the lemmas to be included in the resource; IULM University will cover the Mashreq and Maghreb areas; both Roma 3 and IULM will contribute in sharing implementation of the Egyptian variety.
  • T4.2 – Encoding of the Lexicon on a Temporary Database (Roma 3, CNR-ILC, IULM University)
    The chosen lemmata will be encoded in a temporary database, the schema of which will allow to represent the lexicographic model defined in T3.1; CNR-ILC will support the definition of the database.
  • T4.3 – Encoding of the Lexicon on the Editor LexO (Roma 3, CNR-ILC, IULM University)
    The lemmata encoded in the temporary database will be imported in the first version of the editor (released as D3.2), with which the following lemmata will be encoded; CNR-ILC will provide constant support in the use of the tool, its further customization (if needed) and the extension of the model (if needed) as well as for bug-fixing.


  • D4.1 – Report on the Criteria for the Selection of Lemmas (M6)
  • D4.2 – Release of the Lexical Resource (M36)
  • D4.3 – Final Version of the Termino-ontological Resource (M36)

WP5 – Dissemination and Exploitation


CNR-ILC, IULM University


  • traditional dissemination activities involving the spreading of scientific knowledge in printed and online journals (and respecting the Open access requirements), electronic media and other outlets
  • engagement of the community of researchers via Web and social media activities and at face-to-face events to be implemented with three conferences
  • exploitation activities aimed at contributing to maintain, multiply and mainstream the project results and services by developing sustainability scenarios for their continuation


  • T5.1 – Project Web Site Development (CNR-ILC)
    The project Web site will be set up and will contain all institutional information about the project and its partnership, designating a place well in advance to browse the lexicon as it is enriched, as well as all official documents and deliverables relating to the project.
  • T5.2 – Communication and Engagement (Roma 3)
    The participation of the project in the most significant scientific venues will be planned and executed; two annual workshops will be organized (at M12 and M24) to show the progress and results of the ongoing project and related issues; a final conference will be organised in Rome around the end of the project to show the final results of the project and inform about the opportunities brought about by them.
  • T5.3 – Exploitation and Sustainability Plan (Roma 3, CNR-ILC, IULM University)
    The activities necessary to maintain, multiply and mainstream the project results and services will be defined and sustainability scenarios for their continuation will be developed with the involvement of the whole partnership; a mutually approved exploitation agreement will be established by M36 to refer to the actions planned for a further continuation of the project results and services and the expansion of the project beyond the term of its funding; the project Web site and the CLARIN B-Centre ILC4CLARIN will be exploited to guarantee the sharing, accessibility and long-term preservation of the data and tools developed within the project.
