model-quantization-aggregation

Replication package for the paper "Aggregating empirical evidence from data strategies studies: a case on model quantization" published in the 19th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).

https://github.com/santidrj/model-quantization-aggregation

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.4%) to scientific vocabulary

Keywords

green-ai model-quantization research-synthesis software-engineering structured-synthesis-method
Last synced: 6 months ago · JSON representation ·

Repository

Replication package for the paper "Aggregating empirical evidence from data strategies studies: a case on model quantization" published in the 19th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).

Basic Info
  • Host: GitHub
  • Owner: santidrj
  • License: other
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 316 KB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Topics
green-ai model-quantization research-synthesis software-engineering structured-synthesis-method
Created 8 months ago · Last pushed 8 months ago
Metadata Files
Readme License Citation

README.md

model-quantization-aggregation

DOI

Replication package for the paper:

"Aggregating empirical evidence from data strategies studies: a case on model quantization" submitted to the 19th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).

Contents

This replication package contains the following components:

  1. Data:

    • Raw, external, interim, and processed data are stored in the data/ directory.
  2. Source Code:

    • Located in the src/ directory, it includes scripts for data processing, analysis, and evidence extraction.
    • Key modules:
      • data/papers/entities.py & data/papers/knowledge_extraction.py: Define the structure and data extraction logic for the papers analyzed.
      • data/download.py: Downloads the list of papers from arXiv and merges them with the Scopus list.
      • data/selection/llm.py: Implements logic for selecting studies using large language models.
  3. Jupyter Notebooks:

    • Located in the notebooks/ directory, these notebooks contain the analysis and visualization of the data.
    • Notebooks include:
      • 1.0-llm-promt-refinement.ipynb: Refines the prompt for LLMs and the selection of LLM.
      • 2.0-model-quantization-paper-selection.ipynb: Filters the raw list of papers using the selected GEMINI 2.0.
      • 3.0-final-selection-analysis.ipynb: Analyzes the final selection of papers.
      • 4.0-paper-metadata-analysis.ipynb: Analyzes metadata from selected papers.
      • 5.0-evidence-analysis.ipynb: Analyzes evidence extracted from the papers and generates the forest plot.
  4. Documentation:

    • data/processed/evidence-diagrams-mapping.md: Links to evidence diagrams generated during the study.
    • data/processed/paperkey/metadata.json: Contains metadata for the specific paper.
    • data/processed/paperkey/systematic-studies-quality-evaluation.md: Contains the filled quality evaluation form for the specific paper.

Project Structure

The project is organized as follows: data/ raw/ <- Contains the original list of papers retrieved from Scopus external/ <- Contains the raw data obtained from the selected papers interim/ <- Contains the interim data used in the analysis processed/ <- Contains the processed data used in the analysis evidence-diagrams-mapping.md <- Contains links to the evidence diagrams notebooks/ 1.0-llm-promt-refinement.ipynb 2.0-model-quantization-paper-selection.ipynb 3.0-second-selection-analysis.ipynb 4.0-paper-metadata-analysis.ipynb 5.0-evidence-analysis.ipynb reports/ figures/ src/ data/ papers/ <- Contains the logic for extracting and analyzing data from papers entities.py knowledge_extraction.py download.py selection/ <- Utility functions for selecting studies using LLMs, llm.py including the prompt forestplot/ <- Utility functions for generating the forest plot effect_intensity.py <- Definition of the effect intensity thresholds run_evidence_extraction.py config.py .pre-commit-config.yaml dot-env-template <- Template for environment variables requirements.txt <- List of Python dependencies uv.lock <- Environment lock file LICENSE pyproject.toml <- Project configuration file README.md

Usage Instructions

  1. Setup:

    • Clone the repository:
      bash git clone <repository-url> cd green-tactics-synthesis
    • Install dependencies:
      The project is managed with uv. To install the dependencies, run:
      bash uv sync Alternatively, you can use pip to install the dependencies listed in requirements.txt:
      bash pip install -r requirements.txt
  2. Getting the Data:

    • Run the download script to fetch the list of papers from arXiv and merge it with the Scopus list:
      bash python src/data/downlad.py
  • We do not provide the raw data from the selected papers to prevent potential copyright issues. However, we provide instructions on how to obtain the data in each paper's README file. Located in the data/external/ directory.
  1. Extracting the evidence:

    • Use the run_evidence_extraction.py module to extract the evidence from the selected papers.
  2. Explore the data with Jupyter Notebooks:

    • Open the Jupyter notebooks in the notebooks/ directory to explore the data and analysis.

Notes

  • Ensure all required data is placed in the appropriate directories.
  • For any issues or questions, please contact the authors of the paper.

License

This project is licensed under the Apache 2.0 License. See the LICENSE file for details.

Owner

  • Name: Santiago del Rey
  • Login: santidrj
  • Kind: user

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: >-
  Aggregating empirical evidence from data strategy studies:
  a case on model quantization – Replication package
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Santiago
    family-names: del Rey
    email: santiago.del.rey@upc.edu
    affiliation: Universitat Politècnica de Catalunya
    orcid: 'https://orcid.org/0000-0003-4979-414X'
identifiers:
  - type: doi
    value: 10.5281/zenodo.15850734
    description: The Zenodo link for version 1.0.
  - type: doi
    value: 10.48550/arXiv.2505.00816
    description: The ArXiv deposit of the pre-print.
repository-code: 'https://github.com/santidrj/model-quantization-aggregation'
abstract: >-
  Replication package for the paper "Aggregating empirical
  evidence from data strategies studies: a case on model
  quantization" published in the 19th ACM/IEEE International
  Symposium on Empirical Software Engineering and
  Measurement (ESEM).
keywords:
  - Software Engineering
  - Research Synthesis
  - Structured Synthesis Method
  - Green IN AI
  - Model Quantization
license: Apache-2.0
version: '1.0'
date-released: '2025-07-09'

GitHub Events

Total
  • Release event: 1
  • Push event: 2
  • Create event: 1
Last Year
  • Release event: 1
  • Push event: 2
  • Create event: 1

Dependencies

pyproject.toml pypi
  • altair [all]>=5.5.0
  • anthropic >=0.44.0
  • fastexcel >=0.14.0
  • forestplot >=0.4.1
  • google-genai >=1.24.0
  • hvplot >=0.11.1
  • itables >=2.2.4
  • jsonlines >=4.0.0
  • jupyter >=1.1.1
  • matplotlib >=3.10.3
  • numpy >=2.2.0
  • pandarallel >=1.6.5
  • pandas >=2.2.3
  • polars >=1.29.0
  • python-dotenv >=1.0.1
  • requests >=2.32.3
  • scikit-learn >=1.6.1
  • seaborn >=0.13.2
  • statsmodels >=0.14.4
  • tiktoken >=0.8.0
  • tqdm >=4.67.1
  • xlsxwriter >=3.2.0
  • xmltodict >=0.14.2
requirements.txt pypi
  • altair ==5.5.0
  • altair-tiles ==0.4.0
  • annotated-types ==0.7.0
  • anthropic ==0.49.0
  • anyio ==4.9.0
  • anywidget ==0.9.18
  • appnope ==0.1.4
  • argon2-cffi ==23.1.0
  • argon2-cffi-bindings ==21.2.0
  • arro3-core ==0.4.6
  • arrow ==1.3.0
  • asttokens ==3.0.0
  • async-lru ==2.0.5
  • attrs ==25.3.0
  • autopep8 ==2.3.2
  • babel ==2.17.0
  • beautifulsoup4 ==4.13.4
  • bleach ==6.2.0
  • bokeh ==3.7.2
  • cachetools ==5.5.2
  • certifi ==2025.1.31
  • cffi ==1.17.1
  • cfgv ==3.4.0
  • charset-normalizer ==3.4.1
  • click ==8.1.8
  • colorama ==0.4.6
  • colorcet ==3.1.0
  • comm ==0.2.2
  • contourpy ==1.3.2
  • cycler ==0.12.1
  • debugpy ==1.8.14
  • decorator ==5.2.1
  • defusedxml ==0.7.1
  • deptry ==0.23.0
  • dill ==0.4.0
  • distlib ==0.3.9
  • distro ==1.9.0
  • et-xmlfile ==2.0.0
  • exceptiongroup ==1.2.2
  • executing ==2.2.0
  • fastexcel ==0.14.0
  • fastjsonschema ==2.21.1
  • filelock ==3.18.0
  • fonttools ==4.57.0
  • forestplot ==0.4.1
  • fqdn ==1.5.1
  • google-ai-generativelanguage ==0.6.15
  • google-api-core ==2.24.2
  • google-api-python-client ==2.167.0
  • google-auth ==2.39.0
  • google-auth-httplib2 ==0.2.0
  • google-generativeai ==0.8.5
  • googleapis-common-protos ==1.70.0
  • grpcio ==1.71.0
  • grpcio-status ==1.71.0
  • h11 ==0.14.0
  • holoviews ==1.20.2
  • httpcore ==1.0.8
  • httplib2 ==0.22.0
  • httpx ==0.28.1
  • hvplot ==0.11.2
  • identify ==2.6.10
  • idna ==3.10
  • ipykernel ==6.29.5
  • ipython ==8.35.0
  • ipywidgets ==8.1.6
  • isoduration ==20.11.0
  • itables ==2.3.0
  • jedi ==0.19.2
  • jinja2 ==3.1.6
  • jiter ==0.9.0
  • joblib ==1.4.2
  • json5 ==0.12.0
  • jsonlines ==4.0.0
  • jsonpointer ==3.0.0
  • jsonschema ==4.23.0
  • jsonschema-specifications ==2024.10.1
  • jupyter ==1.1.1
  • jupyter-client ==8.6.3
  • jupyter-console ==6.6.3
  • jupyter-core ==5.7.2
  • jupyter-events ==0.12.0
  • jupyter-lsp ==2.2.5
  • jupyter-server ==2.15.0
  • jupyter-server-terminals ==0.5.3
  • jupyterlab ==4.4.0
  • jupyterlab-pygments ==0.3.0
  • jupyterlab-server ==2.27.3
  • jupyterlab-widgets ==3.0.14
  • kiwisolver ==1.4.8
  • linkify-it-py ==2.0.3
  • markdown ==3.8
  • markdown-it-py ==3.0.0
  • markupsafe ==3.0.2
  • matplotlib ==3.10.3
  • matplotlib-inline ==0.1.3
  • mdit-py-plugins ==0.4.2
  • mdurl ==0.1.2
  • mercantile ==1.2.1
  • mistune ==3.1.3
  • narwhals ==1.35.0
  • nbclient ==0.10.2
  • nbconvert ==7.16.6
  • nbformat ==5.10.4
  • nbqa ==1.9.1
  • nest-asyncio ==1.6.0
  • nodeenv ==1.9.1
  • notebook ==7.4.3
  • notebook-shim ==0.2.4
  • numpy ==2.2.5
  • openpyxl ==3.1.5
  • overrides ==7.7.0
  • packaging ==25.0
  • pandarallel ==1.6.5
  • pandas ==2.2.3
  • pandocfilters ==1.5.1
  • panel ==1.6.2
  • param ==2.2.0
  • parso ==0.8.4
  • patsy ==1.0.1
  • pexpect ==4.9.0
  • pillow ==11.2.1
  • platformdirs ==4.3.7
  • polars ==1.29.0
  • pre-commit ==4.2.0
  • prometheus-client ==0.21.1
  • prompt-toolkit ==3.0.51
  • proto-plus ==1.26.1
  • protobuf ==5.29.4
  • psutil ==7.0.0
  • psygnal ==0.12.0
  • ptyprocess ==0.7.0
  • pure-eval ==0.2.3
  • pyarrow ==19.0.1
  • pyasn1 ==0.6.1
  • pyasn1-modules ==0.4.2
  • pycodestyle ==2.13.0
  • pycparser ==2.22
  • pydantic ==2.11.3
  • pydantic-core ==2.33.1
  • pygments ==2.19.1
  • pyparsing ==3.2.3
  • python-dateutil ==2.9.0.post0
  • python-dotenv ==1.1.0
  • python-json-logger ==3.3.0
  • pytz ==2025.2
  • pyviz-comms ==3.0.4
  • pywin32 ==310
  • pywinpty ==2.0.15
  • pyyaml ==6.0.2
  • pyzmq ==26.4.0
  • referencing ==0.36.2
  • regex ==2024.11.6
  • requests ==2.32.3
  • requirements-parser ==0.11.0
  • rfc3339-validator ==0.1.4
  • rfc3986-validator ==0.1.1
  • rpds-py ==0.24.0
  • rsa ==4.9.1
  • ruff ==0.11.6
  • scikit-learn ==1.6.1
  • scipy ==1.15.2
  • seaborn ==0.13.2
  • send2trash ==1.8.3
  • setuptools ==79.0.0
  • six ==1.17.0
  • sniffio ==1.3.1
  • soupsieve ==2.7
  • stack-data ==0.6.3
  • statsmodels ==0.14.4
  • terminado ==0.18.1
  • threadpoolctl ==3.6.0
  • tiktoken ==0.9.0
  • tinycss2 ==1.4.0
  • tokenize-rt ==6.1.0
  • tomli ==2.2.1
  • tornado ==6.4.2
  • tqdm ==4.67.1
  • traitlets ==5.14.3
  • types-python-dateutil ==2.9.0.20241206
  • types-setuptools ==79.0.0.20250422
  • typing-extensions ==4.13.2
  • typing-inspection ==0.4.0
  • tzdata ==2025.2
  • uc-micro-py ==1.0.3
  • uri-template ==1.3.0
  • uritemplate ==4.1.1
  • urllib3 ==2.4.0
  • vega-datasets ==0.9.0
  • vegafusion ==2.0.2
  • virtualenv ==20.30.0
  • vl-convert-python ==1.7.0
  • wcwidth ==0.2.13
  • webcolors ==24.11.1
  • webencodings ==0.5.1
  • websocket-client ==1.8.0
  • widgetsnbextension ==4.0.14
  • xlsxwriter ==3.2.3
  • xmltodict ==0.14.2
  • xyzservices ==2025.1.0
uv.lock pypi
  • 194 dependencies