https://github.com/arianna-bienati/itaca-processing
ITACA processing tool
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.4%) to scientific vocabulary
Repository
ITACA processing tool
Basic Info
- Host: GitHub
- Owner: arianna-bienati
- License: gpl-3.0
- Language: Python
- Default Branch: main
- Size: 7.76 MB
Statistics
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
itaca-processing
This is a companion repository containing data and scripts for the work "Theoretical implications of automated discourse parsing in student writing" proposed for publication in the 2025 IJCoL Guest Edited Special Edition on "Bridging Theoretical Linguistics and Automated Language Processing".
What it contains
In the repository you can find:
annotation_guidelinescontains:- the annotation guidelines used for the manual annotation of the evaluation sample as
README.md - the report of misalignments after the first 5 documents used for training (
20250210_report_agreement.tsv). A preliminary analysis of disagreements is available in20250210_report_agreement.md - Krippendorff alpha for each text for both connective detection and sense classification tasks
20250307_iaa_kripp_inc_connective.txtandsemantics.txtfiles. - the presentation held at WS3 at the Congresso Internazionale SLI 2023(
Bienati_Frey_Aprosio_Facchinelli_2023_applicazione-delle-risorse_final_cut.pdf).
- the annotation guidelines used for the manual annotation of the evaluation sample as
dataset:annotation: contains all 40 manually annotated texts for the evaluation sample. In each folder, you can find annotations by Arianna Bienati and Mariachiara Pascucci andINITIAL_CAS.tsvcontains the initial pre-annotated files.curation: contains all 40 curated documents. Curated files have been jointly produced by Arianna Bienati and Mariachiara Pascucci.
img: contains heatmaps comparing models outputs and human annotated labels.txt-output: contains the models' responses to experimental prompts.itaca: contains scripts used for the pre-processing of the itaca corpus.iaa.py: calculates Cohen's kappa for all layers that have been manually annotated in the original ITACA corpus (not considered in this analysis).preprocess.py: processes text files of the ITACA corpus. It uses TINT to analyze the content of each file, extracting linguistic features and generating annotations based on defined criteria. Relevant for our analysis is the connective pre-processing. Annotations are formatted in TSV format to be re-imported into Inception.webanno_tsv.pyandwebanno_tsv_custom.py: (custom) library to handle the WebAnno TSV format.
agreement.py: calculates agreement (Cohen's kappa) among human annotators and models' outputsevaluate.py: computes evaluation metrics such as precision, recall, and F1-score for each combination of LLM (gpt4o-llama 3.3 70b) and prompt (long-short). It visualizes the confusion matrix using a heatmap to provide insights into the model's performance across different classes.human-agreement.py: processes annotation data from TSV files indataset/annotation, comparing the annotations made by the two different annotators. It visualizes the results in a heatmap, providing a visual comparison of human agreements and disagreements.parse-files-1.py/parse-files-2.py: prompts language models to generate responses about the presence and sense of connectives in test sentences, based on collected examples from the training data. The first one is related to prompt the LLM sentences grouped by connective candidates, the second one send the whole text to the LLM.
Owner
- Name: Arianna Bienati
- Login: arianna-bienati
- Kind: user
- Company: Institute for Applied Linguistics, Eurac Research
- Repositories: 1
- Profile: https://github.com/arianna-bienati
GitHub Events
Total
- Push event: 19
Last Year
- Push event: 19