https://github.com/camel-lab/barec-shared-task-2025

Evaluation code and data for the BAREC shared task 2025

https://github.com/camel-lab/barec-shared-task-2025

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.4%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Evaluation code and data for the BAREC shared task 2025

Basic Info
  • Host: GitHub
  • Owner: CAMeL-Lab
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 57.6 KB
Statistics
  • Stars: 0
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 1 year ago · Last pushed about 1 year ago
Metadata Files
Readme License

README.md

BAREC Shared Task 2025: Arabic Readability Assessment

The BAREC Shared Task 2025 will take place at The Third Arabic Natural Language Processing Conference (ArabicNLP 2025) at EMNLP 2025.

Click here to register for the shared task!

Task Description

The BAREC Shared Task 2025 focuses on fine-grained readability classification across 19 levels using the Balanced Arabic Readability Evaluation Corpus (BAREC), a dataset of over 1 million words. Participants will build models for both sentence- and document-level classification.

Data

Shared Task Tracks

Participants can compete in one or more of the following tracks, each imposing different resource constraints:

  • Strict Track: Models must be trained exclusively on the BAREC Corpus.

  • Constrained Track: Models may use the BAREC Corpus, SAMER Corpus (including document, fragment, and word-level annotations), and the SAMER Lexicon.

  • Open Track: No restrictions on external resources, allowing the use of any publicly available data.

With two sub-tasks and three tracks, the task results in a total of six possible combinations. Participants are allowed to compete in multiple sub-tasks and tracks.

Evaluation

We define the Readability Assessment task as an ordinal classification task. The following metrics are used for evaluation:

  • Accuracy (Acc19): The percentage of cases where reference and prediction classes match in the 19-level scheme.
  • Accuracy (Acc7, Acc5, Acc3): The percentage of cases where reference and prediction classes match after collapsing the 19 levels into 7, 5, or 3 levels, respectively.
  • Adjacent Accuracy (±1 Acc19): Also known as off-by-1 accuracy. The proportion of predictions that are either exactly correct or off by at most one level in the 19-level scheme.
  • Average Distance (Dist): Also known as Mean Absolute Error (MAE). Measures the average absolute difference between predicted and true labels.
  • Quadratic Weighted Kappa (QWK): An extension of Cohen’s Kappa that measures the agreement between predicted and true labels, applying a quadratic penalty to larger misclassifications (i.e., predictions farther from the true label are penalized more heavily).

We provide instructions on how to run the evaluation script below.

Requirements:

You will need to have conda installed. To setup the environment, you would need to run:

```bash git clone https://github.com/CAMeL-Lab/barec-shared-task-2025.git cd barec-shared-task-2025

conda create -n barec python=3.9

conda activate barec

pip install -r requirements.txt ```

Running the Evaluation

To evaluate your predictions, use the provided evaluation script. The script requires three arguments:

  • --output: Path to your output CSV file containing predictions.
  • --split: The data split to evaluate on (Dev or Test).
  • --task: The task type (Sent for sentence-level or Doc for document-level readability).

To evaluate your system's output, you would need to run:

bash python scripts/eval.py --output /path/to/output_csv --split [Dev|Test] --task [Sent|Doc]

Example usage:

bash python scripts/eval.py --output examples/Dev_Sentence_Level.csv --split Dev --task Sent

Output CSV Format

Your output CSV file should have the following columns:

  • For sentence-level tasks (--task Sent):

    • Sentence ID: The unique identifier for each sentence.
    • Prediction: Your predicted readability level for each sentence (integer from 1 to 19).
  • For document-level tasks (--task Doc):

    • Document ID: The unique identifier for each document.
    • Prediction: Your predicted readability level for each document (integer from 1 to 19).

Example (Sentence-level):

| Sentence ID | Prediction | |-------------|------------| | 1001 | 7 | | 1002 | 12 | | ... | ... |

Example (Document-level):

| Document ID | Prediction | |-------------|------------| | 2001 | 5 | | 2002 | 14 | | ... | ... |

Make sure the IDs in your output file match exactly those in the provided split (Dev or Test) for the chosen task.

Example Output

After running the evaluation script, you will see output similar to the following in your terminal:

Evaluating Sentence-level readability on Dev split using examples/Dev_Sentence_Level.csv Accuracy: 56.6211% Accuracy +/-1: 69.8632% Average absolute distance: 1.143776 Quadratic Cohen's Kappa: 80.0040% Accuracy (7 levels): 65.8687% Accuracy (5 levels): 70.2736% Accuracy (3 levels): 76.4569% Evaluation completed successfully.

Each metric reflects the performance of your predictions on the selected split and task.

Organizers

Khalid Elmadani

Bashar Alhafni

Hanada Taha-Thomure

Nizar Habash

License

This repo is available under the MIT license. See the LICENSE for more info.

References

  1. A Large and Balanced Corpus for Fine-grained Arabic Readability Assessment. Khalid N. Elmadani, Nizar Habash, and Hanada Taha-Thomure. 2025. In Findings of the Association for Computational Linguistics: ACL 2025, Vienna, Austria.

  2. Guidelines for fine-grained sentence-level Arabic readability annotation. Nizar Habash, Hanada Taha-Thomure, Khalid N. Elmadani, Zeina Zeino, and Abdallah Abushmaes. 2025. In Proceedings of the 19th Linguistic Annotation Workshop (LAW-XIX), Vienna, Austria.

  3. The SAMER Arabic Text Simplification Corpus. Bashar Alhafni, Reem Hazim, Juan David Pineros Liberato, Muhamed Al Khalil, and Nizar Habash. 2024. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), Torino, Italia.

  4. A Large-Scale Leveled Readability Lexicon for Standard Arabic. Muhamed Al Khalil, Nizar Habash, and Zhengyang Jiang. In Proceedings of the Twelfth Language Resources and Evaluation Conference, Marseille, France.

Owner

  • Name: CAMeL Lab
  • Login: CAMeL-Lab
  • Kind: organization
  • Location: Abu Dhabi, UAE

The Computational Approaches to Modeling Language (CAMeL) Lab at New York University Abu Dhabi

GitHub Events

Total
  • Watch event: 2
  • Push event: 6
  • Create event: 1
Last Year
  • Watch event: 2
  • Push event: 6
  • Create event: 1

Dependencies

requirements.txt pypi
  • datasets ==3.6.0
  • pandas ==2.2.3
  • scikit-learn ==1.6.1