reprohum-0744-02
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.6%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: mo-arvan
- Language: Python
- Default Branch: main
- Size: 343 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
ReproHum-0744-02
This repository contains tools and scripts for quantified reproducibility assessment of NLP results through human evaluation data analysis. The project implements methodologies described in Belz, Popovic & Mille (2022) "Quantified Reproducibility Assessment of NLP Results" (ACL'22).
Overview
The project analyzes human evaluation data for different NLP systems, providing:
- Statistical reliability metrics (Fleiss Kappa, Krippendorff's Alpha)
- ANOVA and Tukey HSD tests for system comparisons
- Power analysis and effect size calculations
- Coefficient of variation (CV) analysis for reproducibility assessment
Project Structure
text
.
├── src/ # Source code directory
│ ├── analyze_responses.py # Main analysis script
│ ├── cv.py # Coefficient of variation calculations
│ ├── statistical_power_analysis.py # Statistical power analysis
│ └── preprocess_responses.py # Data preprocessing
├── responses/ # Input data directory
├── results/ # Analysis output directory
│ ├── lab1/ # Primary results
│ └── original/ # Original data results
└── power_analysis.r # R script for power analysis
Prerequisites
- Docker (optional, for containerized environment)
- Python 3.x
- Required packages:
- pandas
- scipy
- statsmodels
- krippendorff
Setup
Clone the repository and install the required packages. You can use a virtual environment or Docker for isolation.
bash
docker build -t reprohum-0744-02 .
docker run -it --rm -v $(pwd):/app reprohum-0744-02
Usage
- Preprocess the response data (requires original responses, you can skip this version if you are loading the preprocessed data from this repository):
bash
python src/preprocess_responses.py
- Run the analysis pipeline:
bash
python src/analyze_responses.py
- Generate reproducibility metrics:
bash
python src/quantified_reproducibility.py
Output
The analysis generates several outputs in the results/lab1/ directory:
- Statistical test results (
anova_tukeyhsd.txt) - Inter-rater reliability metrics (
fleiss_kappa.txt,krippendorff_alpha.txt) - Dataset usage statistics (
tables/datasets_used.csv) - System comparison results (
tables/results.csv) - Detailed reliability data (
reliability_data.csv) - Coefficient of variation analysis (
cv_2_way.csv,cv_summary.csv) - Correlation analysis (
correlations.csv) - Best-Worst system results (
results.csv)
Citation
If you use this software in your research, please cite:
bibtex
TBA
License
This project is licensed under CC-BY-4.0. See the LICENSE file for details.
Owner
- Name: Mo Arvan
- Login: mo-arvan
- Kind: user
- Location: Chicago
- Company: University of Illinois at Chicago
- Website: https://mo-arvan.github.io/
- Repositories: 42
- Profile: https://github.com/mo-arvan
Computer Scientist
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: reprohum-0744-02
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Mohammad
family-names: Arvan
email: marvan3@uic.edu
affiliation: University of Illinois at Chicago
- given-names: Natalie
family-names: Parde
affiliation: University of Illinois at Chicago
license: CC-BY-4.0
url: https://github.com/mo-arvan/reprohum-0744-02
date-released: 2025
GitHub Events
Total
- Push event: 5
- Create event: 2
Last Year
- Push event: 5
- Create event: 2
Dependencies
- python 3.12.1-alpine3.19 build