model-quantization-aggregation

Replication package for the paper "Aggregating empirical evidence from data strategies studies: a case on model quantization" published in the 19th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).

https://github.com/santidrj/model-quantization-aggregation

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 2 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.4%) to scientific vocabulary

Keywords

green-ai model-quantization research-synthesis software-engineering structured-synthesis-method

Last synced: 6 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: santidrj
License: other
Language: Python
Default Branch: main
Homepage:
Size: 316 KB

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 1

Topics

green-ai model-quantization research-synthesis software-engineering structured-synthesis-method

Created 8 months ago · Last pushed 8 months ago

Metadata Files

Readme License Citation

README.md

model-quantization-aggregation

Replication package for the paper:

"Aggregating empirical evidence from data strategies studies: a case on model quantization" submitted to the 19th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).

This replication package contains the following components:

Data:
- Raw, external, interim, and processed data are stored in the data/ directory.
Source Code:
- Located in the src/ directory, it includes scripts for data processing, analysis, and evidence extraction.
- Key modules:
  - data/papers/entities.py & data/papers/knowledge_extraction.py: Define the structure and data extraction logic for the papers analyzed.
  - data/download.py: Downloads the list of papers from arXiv and merges them with the Scopus list.
  - data/selection/llm.py: Implements logic for selecting studies using large language models.
Jupyter Notebooks:
- Located in the notebooks/ directory, these notebooks contain the analysis and visualization of the data.
- Notebooks include:
  - 1.0-llm-promt-refinement.ipynb: Refines the prompt for LLMs and the selection of LLM.
  - 2.0-model-quantization-paper-selection.ipynb: Filters the raw list of papers using the selected GEMINI 2.0.
  - 3.0-final-selection-analysis.ipynb: Analyzes the final selection of papers.
  - 4.0-paper-metadata-analysis.ipynb: Analyzes metadata from selected papers.
  - 5.0-evidence-analysis.ipynb: Analyzes evidence extracted from the papers and generates the forest plot.
Documentation:
- data/processed/evidence-diagrams-mapping.md: Links to evidence diagrams generated during the study.
- data/processed/paperkey/metadata.json: Contains metadata for the specific paper.
- data/processed/paperkey/systematic-studies-quality-evaluation.md: Contains the filled quality evaluation form for the specific paper.

Project Structure

The project is organized as follows: data/ raw/ <- Contains the original list of papers retrieved from Scopus external/ <- Contains the raw data obtained from the selected papers interim/ <- Contains the interim data used in the analysis processed/ <- Contains the processed data used in the analysis evidence-diagrams-mapping.md <- Contains links to the evidence diagrams notebooks/ 1.0-llm-promt-refinement.ipynb 2.0-model-quantization-paper-selection.ipynb 3.0-second-selection-analysis.ipynb 4.0-paper-metadata-analysis.ipynb 5.0-evidence-analysis.ipynb reports/ figures/ src/ data/ papers/ <- Contains the logic for extracting and analyzing data from papers entities.py knowledge_extraction.py download.py selection/ <- Utility functions for selecting studies using LLMs, llm.py including the prompt forestplot/ <- Utility functions for generating the forest plot effect_intensity.py <- Definition of the effect intensity thresholds run_evidence_extraction.py config.py .pre-commit-config.yaml dot-env-template <- Template for environment variables requirements.txt <- List of Python dependencies uv.lock <- Environment lock file LICENSE pyproject.toml <- Project configuration file README.md

Usage Instructions

Setup:
- Clone the repository:
  bash git clone <repository-url> cd green-tactics-synthesis
- Install dependencies:
  The project is managed with uv. To install the dependencies, run:
  bash uv sync Alternatively, you can use pip to install the dependencies listed in requirements.txt:
  bash pip install -r requirements.txt
Getting the Data:
- Run the download script to fetch the list of papers from arXiv and merge it with the Scopus list:
  bash python src/data/downlad.py

We do not provide the raw data from the selected papers to prevent potential copyright issues. However, we provide instructions on how to obtain the data in each paper's README file. Located in the data/external/ directory.

Extracting the evidence:
- Use the run_evidence_extraction.py module to extract the evidence from the selected papers.
Explore the data with Jupyter Notebooks:
- Open the Jupyter notebooks in the notebooks/ directory to explore the data and analysis.

Notes

Ensure all required data is placed in the appropriate directories.
For any issues or questions, please contact the authors of the paper.

License

This project is licensed under the Apache 2.0 License. See the LICENSE file for details.

Owner

Name: Santiago del Rey
Login: santidrj
Kind: user

Repositories: 11
Profile: https://github.com/santidrj

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: >-
  Aggregating empirical evidence from data strategy studies:
  a case on model quantization – Replication package
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Santiago
    family-names: del Rey
    email: santiago.del.rey@upc.edu
    affiliation: Universitat Politècnica de Catalunya
    orcid: 'https://orcid.org/0000-0003-4979-414X'
identifiers:
  - type: doi
    value: 10.5281/zenodo.15850734
    description: The Zenodo link for version 1.0.
  - type: doi
    value: 10.48550/arXiv.2505.00816
    description: The ArXiv deposit of the pre-print.
repository-code: 'https://github.com/santidrj/model-quantization-aggregation'
abstract: >-
  Replication package for the paper "Aggregating empirical
  evidence from data strategies studies: a case on model
  quantization" published in the 19th ACM/IEEE International
  Symposium on Empirical Software Engineering and
  Measurement (ESEM).
keywords:
  - Software Engineering
  - Research Synthesis
  - Structured Synthesis Method
  - Green IN AI
  - Model Quantization
license: Apache-2.0
version: '1.0'
date-released: '2025-07-09'

GitHub Events

Total

Release event: 1
Push event: 2
Create event: 1

Last Year

Release event: 1
Push event: 2
Create event: 1

Dependencies

pyproject.toml pypi

altair [all]>=5.5.0
anthropic >=0.44.0
fastexcel >=0.14.0
forestplot >=0.4.1
google-genai >=1.24.0
hvplot >=0.11.1
itables >=2.2.4
jsonlines >=4.0.0
jupyter >=1.1.1
matplotlib >=3.10.3
numpy >=2.2.0
pandarallel >=1.6.5
pandas >=2.2.3
polars >=1.29.0
python-dotenv >=1.0.1
requests >=2.32.3
scikit-learn >=1.6.1
seaborn >=0.13.2
statsmodels >=0.14.4
tiktoken >=0.8.0
tqdm >=4.67.1
xlsxwriter >=3.2.0
xmltodict >=0.14.2

requirements.txt pypi

altair ==5.5.0
altair-tiles ==0.4.0
annotated-types ==0.7.0
anthropic ==0.49.0
anyio ==4.9.0
anywidget ==0.9.18
appnope ==0.1.4
argon2-cffi ==23.1.0
argon2-cffi-bindings ==21.2.0
arro3-core ==0.4.6
arrow ==1.3.0
asttokens ==3.0.0
async-lru ==2.0.5
attrs ==25.3.0
autopep8 ==2.3.2
babel ==2.17.0
beautifulsoup4 ==4.13.4
bleach ==6.2.0
bokeh ==3.7.2
cachetools ==5.5.2
certifi ==2025.1.31
cffi ==1.17.1
cfgv ==3.4.0
charset-normalizer ==3.4.1
click ==8.1.8
colorama ==0.4.6
colorcet ==3.1.0
comm ==0.2.2
contourpy ==1.3.2
cycler ==0.12.1
debugpy ==1.8.14
decorator ==5.2.1
defusedxml ==0.7.1
deptry ==0.23.0
dill ==0.4.0
distlib ==0.3.9
distro ==1.9.0
et-xmlfile ==2.0.0
exceptiongroup ==1.2.2
executing ==2.2.0
fastexcel ==0.14.0
fastjsonschema ==2.21.1
filelock ==3.18.0
fonttools ==4.57.0
forestplot ==0.4.1
fqdn ==1.5.1
google-ai-generativelanguage ==0.6.15
google-api-core ==2.24.2
google-api-python-client ==2.167.0
google-auth ==2.39.0
google-auth-httplib2 ==0.2.0
google-generativeai ==0.8.5
googleapis-common-protos ==1.70.0
grpcio ==1.71.0
grpcio-status ==1.71.0
h11 ==0.14.0
holoviews ==1.20.2
httpcore ==1.0.8
httplib2 ==0.22.0
httpx ==0.28.1
hvplot ==0.11.2
identify ==2.6.10
idna ==3.10
ipykernel ==6.29.5
ipython ==8.35.0
ipywidgets ==8.1.6
isoduration ==20.11.0
itables ==2.3.0
jedi ==0.19.2
jinja2 ==3.1.6
jiter ==0.9.0
joblib ==1.4.2
json5 ==0.12.0
jsonlines ==4.0.0
jsonpointer ==3.0.0
jsonschema ==4.23.0
jsonschema-specifications ==2024.10.1
jupyter ==1.1.1
jupyter-client ==8.6.3
jupyter-console ==6.6.3
jupyter-core ==5.7.2
jupyter-events ==0.12.0
jupyter-lsp ==2.2.5
jupyter-server ==2.15.0
jupyter-server-terminals ==0.5.3
jupyterlab ==4.4.0
jupyterlab-pygments ==0.3.0
jupyterlab-server ==2.27.3
jupyterlab-widgets ==3.0.14
kiwisolver ==1.4.8
linkify-it-py ==2.0.3
markdown ==3.8
markdown-it-py ==3.0.0
markupsafe ==3.0.2
matplotlib ==3.10.3
matplotlib-inline ==0.1.3
mdit-py-plugins ==0.4.2
mdurl ==0.1.2
mercantile ==1.2.1
mistune ==3.1.3
narwhals ==1.35.0
nbclient ==0.10.2
nbconvert ==7.16.6
nbformat ==5.10.4
nbqa ==1.9.1
nest-asyncio ==1.6.0
nodeenv ==1.9.1
notebook ==7.4.3
notebook-shim ==0.2.4
numpy ==2.2.5
openpyxl ==3.1.5
overrides ==7.7.0
packaging ==25.0
pandarallel ==1.6.5
pandas ==2.2.3
pandocfilters ==1.5.1
panel ==1.6.2
param ==2.2.0
parso ==0.8.4
patsy ==1.0.1
pexpect ==4.9.0
pillow ==11.2.1
platformdirs ==4.3.7
polars ==1.29.0
pre-commit ==4.2.0
prometheus-client ==0.21.1
prompt-toolkit ==3.0.51
proto-plus ==1.26.1
protobuf ==5.29.4
psutil ==7.0.0
psygnal ==0.12.0
ptyprocess ==0.7.0
pure-eval ==0.2.3
pyarrow ==19.0.1
pyasn1 ==0.6.1
pyasn1-modules ==0.4.2
pycodestyle ==2.13.0
pycparser ==2.22
pydantic ==2.11.3
pydantic-core ==2.33.1
pygments ==2.19.1
pyparsing ==3.2.3
python-dateutil ==2.9.0.post0
python-dotenv ==1.1.0
python-json-logger ==3.3.0
pytz ==2025.2
pyviz-comms ==3.0.4
pywin32 ==310
pywinpty ==2.0.15
pyyaml ==6.0.2
pyzmq ==26.4.0
referencing ==0.36.2
regex ==2024.11.6
requests ==2.32.3
requirements-parser ==0.11.0
rfc3339-validator ==0.1.4
rfc3986-validator ==0.1.1
rpds-py ==0.24.0
rsa ==4.9.1
ruff ==0.11.6
scikit-learn ==1.6.1
scipy ==1.15.2
seaborn ==0.13.2
send2trash ==1.8.3
setuptools ==79.0.0
six ==1.17.0
sniffio ==1.3.1
soupsieve ==2.7
stack-data ==0.6.3
statsmodels ==0.14.4
terminado ==0.18.1
threadpoolctl ==3.6.0
tiktoken ==0.9.0
tinycss2 ==1.4.0
tokenize-rt ==6.1.0
tomli ==2.2.1
tornado ==6.4.2
tqdm ==4.67.1
traitlets ==5.14.3
types-python-dateutil ==2.9.0.20241206
types-setuptools ==79.0.0.20250422
typing-extensions ==4.13.2
typing-inspection ==0.4.0
tzdata ==2025.2
uc-micro-py ==1.0.3
uri-template ==1.3.0
uritemplate ==4.1.1
urllib3 ==2.4.0
vega-datasets ==0.9.0
vegafusion ==2.0.2
virtualenv ==20.30.0
vl-convert-python ==1.7.0
wcwidth ==0.2.13
webcolors ==24.11.1
webencodings ==0.5.1
websocket-client ==1.8.0
widgetsnbextension ==4.0.14
xlsxwriter ==3.2.3
xmltodict ==0.14.2
xyzservices ==2025.1.0

uv.lock pypi

194 dependencies

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

model-quantization-aggregation

Science Score: 67.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

model-quantization-aggregation

Contents

Project Structure

Usage Instructions

Notes

License

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Dependencies