https://github.com/camel-lab/barec-shared-task-2025

Evaluation code and data for the BAREC shared task 2025

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (7.4%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Evaluation code and data for the BAREC shared task 2025

Basic Info

Host: GitHub
Owner: CAMeL-Lab
License: mit
Language: Python
Default Branch: main
Homepage:
Size: 57.6 KB

Statistics

Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Releases: 0

Created about 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme License

BAREC Shared Task 2025: Arabic Readability Assessment

The BAREC Shared Task 2025 will take place at The Third Arabic Natural Language Processing Conference (ArabicNLP 2025) at EMNLP 2025.

Click here to register for the shared task!

Task Description

The BAREC Shared Task 2025 focuses on fine-grained readability classification across 19 levels using the Balanced Arabic Readability Evaluation Corpus (BAREC), a dataset of over 1 million words. Participants will build models for both sentence- and document-level classification.

Data

The BAREC Corpus: The BAREC Corpus (Elmadani et al., 2025) cconsists of 1,922 documents and 69,441 sentences classified into 19 readability levels.
The SAMER Corpus: The SAMER Corpus (Alhafni et al., 2024) consists of 4,289 documents and 20,358 fragments classified into three readability levels.
The SAMER Lexicon: The SAMER Lexicon (Al Khalil et al., 2020) is a 40K-lemma leveled readability lexicon. The lexicon consists of 40K lemma and part-of-speech pairs annotated into five readability levels.

Shared Task Tracks

Participants can compete in one or more of the following tracks, each imposing different resource constraints:

Strict Track: Models must be trained exclusively on the BAREC Corpus.
- Sentence-level Readability Assessment: CodaBench Link
- Document-level Readability Assessment: CodaBench Link
Constrained Track: Models may use the BAREC Corpus, SAMER Corpus (including document, fragment, and word-level annotations), and the SAMER Lexicon.
- Sentence-level Readability Assessment: CodaBench Link
- Document-level Readability Assessment: CodaBench Link
Open Track: No restrictions on external resources, allowing the use of any publicly available data.
- Sentence-level Readability Assessment: CodaBench Link
- Document-level Readability Assessment: CodaBench Link

With two sub-tasks and three tracks, the task results in a total of six possible combinations. Participants are allowed to compete in multiple sub-tasks and tracks.

Evaluation

We define the Readability Assessment task as an ordinal classification task. The following metrics are used for evaluation:

Accuracy (Acc¹⁹): The percentage of cases where reference and prediction classes match in the 19-level scheme.
Accuracy (Acc⁷, Acc⁵, Acc³): The percentage of cases where reference and prediction classes match after collapsing the 19 levels into 7, 5, or 3 levels, respectively.
Adjacent Accuracy (±1 Acc¹⁹): Also known as off-by-1 accuracy. The proportion of predictions that are either exactly correct or off by at most one level in the 19-level scheme.
Average Distance (Dist): Also known as Mean Absolute Error (MAE). Measures the average absolute difference between predicted and true labels.
Quadratic Weighted Kappa (QWK): An extension of Cohen’s Kappa that measures the agreement between predicted and true labels, applying a quadratic penalty to larger misclassifications (i.e., predictions farther from the true label are penalized more heavily).

We provide instructions on how to run the evaluation script below.

Requirements:

You will need to have conda installed. To setup the environment, you would need to run:

```bash git clone https://github.com/CAMeL-Lab/barec-shared-task-2025.git cd barec-shared-task-2025

conda create -n barec python=3.9

conda activate barec

pip install -r requirements.txt ```

Running the Evaluation

To evaluate your predictions, use the provided evaluation script. The script requires three arguments:

--output: Path to your output CSV file containing predictions.
--split: The data split to evaluate on (Dev or Test).
--task: The task type (Sent for sentence-level or Doc for document-level readability).

To evaluate your system's output, you would need to run:

bash python scripts/eval.py --output /path/to/output_csv --split [Dev|Test] --task [Sent|Doc]

Example usage:

bash python scripts/eval.py --output examples/Dev_Sentence_Level.csv --split Dev --task Sent

Output CSV Format

Your output CSV file should have the following columns:

For sentence-level tasks (--task Sent):
- Sentence ID: The unique identifier for each sentence.
- Prediction: Your predicted readability level for each sentence (integer from 1 to 19).
For document-level tasks (--task Doc):
- Document ID: The unique identifier for each document.
- Prediction: Your predicted readability level for each document (integer from 1 to 19).

Example (Sentence-level):

| Sentence ID | Prediction | |-------------|------------| | 1001 | 7 | | 1002 | 12 | | ... | ... |

Example (Document-level):

| Document ID | Prediction | |-------------|------------| | 2001 | 5 | | 2002 | 14 | | ... | ... |

Make sure the IDs in your output file match exactly those in the provided split (Dev or Test) for the chosen task.

Example Output

After running the evaluation script, you will see output similar to the following in your terminal:

Evaluating Sentence-level readability on Dev split using examples/Dev_Sentence_Level.csv Accuracy: 56.6211% Accuracy +/-1: 69.8632% Average absolute distance: 1.143776 Quadratic Cohen's Kappa: 80.0040% Accuracy (7 levels): 65.8687% Accuracy (5 levels): 70.2736% Accuracy (3 levels): 76.4569% Evaluation completed successfully.

Each metric reflects the performance of your predictions on the selected split and task.

Organizers

License

This repo is available under the MIT license. See the LICENSE for more info.

References

A Large and Balanced Corpus for Fine-grained Arabic Readability Assessment. Khalid N. Elmadani, Nizar Habash, and Hanada Taha-Thomure. 2025. In Findings of the Association for Computational Linguistics: ACL 2025, Vienna, Austria.
Guidelines for fine-grained sentence-level Arabic readability annotation. Nizar Habash, Hanada Taha-Thomure, Khalid N. Elmadani, Zeina Zeino, and Abdallah Abushmaes. 2025. In Proceedings of the 19th Linguistic Annotation Workshop (LAW-XIX), Vienna, Austria.
The SAMER Arabic Text Simplification Corpus. Bashar Alhafni, Reem Hazim, Juan David Pineros Liberato, Muhamed Al Khalil, and Nizar Habash. 2024. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), Torino, Italia.
A Large-Scale Leveled Readability Lexicon for Standard Arabic. Muhamed Al Khalil, Nizar Habash, and Zhengyang Jiang. In Proceedings of the Twelfth Language Resources and Evaluation Conference, Marseille, France.

Owner

Name: CAMeL Lab
Login: CAMeL-Lab
Kind: organization
Location: Abu Dhabi, UAE

Website: http://camel-lab.com
Repositories: 22
Profile: https://github.com/CAMeL-Lab

The Computational Approaches to Modeling Language (CAMeL) Lab at New York University Abu Dhabi

GitHub Events

Total

Watch event: 2
Push event: 6
Create event: 1

Last Year

Watch event: 2
Push event: 6
Create event: 1

Dependencies

requirements.txt pypi

datasets ==3.6.0
pandas ==2.2.3
scikit-learn ==1.6.1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/camel-lab/barec-shared-task-2025

Science Score: 36.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

BAREC Shared Task 2025: Arabic Readability Assessment

Task Description

Data

Shared Task Tracks

Evaluation

Requirements:

Running the Evaluation

Output CSV Format

Example Output

Organizers

License

References

Owner

GitHub Events

Total

Last Year

Dependencies