quantized-llama
Tool for the automatic orchestration of Quantized LLMs
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.5%) to scientific vocabulary
Repository
Tool for the automatic orchestration of Quantized LLMs
Basic Info
Statistics
- Stars: 0
- Watchers: 0
- Forks: 1
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Large Language Models Quantization: Energy Efficiency and Performance Trade-offs
Overview
This study aims to evaluate the LLaMA-3.1-8B-Instruct and Qwen-2.5-7B-Instruct model under 4-bit Post-Training Quantization (PTQ). Our focus is on energy efficiency and performance across different natural language processing (NLP) tasks, quantifying the trade-offs between reduced model precision and its impact on accuracy, energy consumption, and resource utilization.
Our analysis explores whether quantization, while improving energy efficiency, affects model accuracy across NLP tasks. We assess three primary NLP task types: 1. Sentiment Analysis (SA) 2. Sentence Pair Semantic Similarity (SPS) 3. Natural Language Inference (NLI)
Requirements
Prerequisites
- Miniconda 3: Ensure Miniconda 3 is installed on your system. Download Miniconda
- Install required Python libraries by running:
bash conda env create -f environment.yml conda activate py310bash pip install -r requirements.txt
Download models
Windows
powershell .\download_models\windows.ps1Linux
Installpynvmlto monitor GPU utilization. Use the following command:bash bash ./download_models/linux.sh
Running the project
To run the main experiment, execute the following command from the root directory:
bash
python experiment-runner/ tasks/onnx/RunnerConfig_<model_name>.py
This command initiates the quantization experiments on the LLaMA3-8B model using 4-bit and 8-bit precision levels. The experiment assesses the impact of quantization on energy efficiency, accuracy, and resource utilization across various NLP tasks, including those in the GLUE and IMDB datasets.
The results table is available at the folder run_tables, which contains detailed experiment results of different models for further analysis.
Quantization
The quantization code can be found in the quantization folder. We provide two methods for quantization, GPTQ and AWQ, by AutoGPTQ and AutoAWQ. If you want to quantize models by yourself, please follow the instrunction of these two repos at first, then set the models and datasets path in the scripts of yourself.
After that, you need to convert the models to ONNX format by the script convert_to_onnx.py.
Data Analysis
Statistical tests and visualizations are performed using R scripts to interpret the impact of quantization effectively. All data analysis scripts are stored in the data-analysis folder. The four primary scripts refer to the main four Research Questions (RQs) addressed in this study.
For each script, the first step is to specify the data file to be analyzed and visualized. The two examples we provide are: 1. run_tables/llama.csv: This file consists of results of different quantization models of the LLaMA-3.1-8B-Instruct model across different NLP tasks. 2. run_tables/qwen.csv: This file consists of results of different quantization models of the QWEN 2.5 model across different NLP tasks.
After specifying the data file, working directory, and loading the required libraries, the following command can be used to run the analysis:
bash
Rscript data-analysis/RQ1.R # For RQ1
Rscript data-analysis/RQ2.R # For RQ2
Rscript data-analysis/RQ3.R # For RQ3
Rscript data-analysis/RQ4.R # For RQ4
And also, you can use All-in-One.ipynb notebook to run.
Owner
- Name: Green Halo
- Login: Green-Halo
- Kind: organization
- Repositories: 1
- Profile: https://github.com/Green-Halo
Citation (CITATION.cff)
cff-version: 1.2.0
title: Experiment Runner
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- name: S2 Group
address: De Boelelaan 1111
city: Amsterdam
post-code: 1081 HV
country: NL
website: 'https://s2group.cs.vu.nl/'
repository-code: 'https://github.com/s2-group/experiment-runner/'
abstract: >-
Experiment Runner is a generic framework to automatically
execute measurement-based experiments on any platform. The
experiments are user-defined, can be completely
customized, and expressed in python code!
keywords:
- experimentation
- empirical
- empirical-software-engineering
license: MIT
GitHub Events
Total
- Member event: 1
- Push event: 6
Last Year
- Member event: 1
- Push event: 6
Dependencies
- datasets *
- dill *
- jsonpickle *
- onnxruntime-gpu *
- pandas *
- psutil *
- tabulate *
- transformers *
- aiohappyeyeballs ==2.4.3
- aiohttp ==3.10.10
- aiosignal ==1.3.1
- async-timeout ==4.0.3
- attrs ==24.2.0
- charset-normalizer ==3.4.0
- coloredlogs ==15.0.1
- datasets ==3.0.2
- dill ==0.3.8
- filelock ==3.16.1
- flatbuffers ==24.3.25
- frozenlist ==1.5.0
- fsspec ==2024.9.0
- huggingface-hub ==0.26.1
- humanfriendly ==10.0
- idna ==3.10
- jsonpickle ==3.3.0
- multidict ==6.1.0
- multiprocess ==0.70.16
- numpy ==2.1.2
- onnxruntime-gpu ==1.20.0
- packaging ==24.1
- pandas ==2.2.3
- propcache ==0.2.0
- protobuf ==5.28.3
- psutil ==6.1.0
- pyarrow ==17.0.0
- pyjoules ==0.5.1
- pynvml ==11.5.3
- python-dateutil ==2.9.0.post0
- pytz ==2024.2
- six ==1.16.0
- sympy ==1.13.3
- tabulate ==0.9.0
- tqdm ==4.66.5
- typing-extensions ==4.12.2
- tzdata ==2024.2
- xxhash ==3.5.0
- yarl ==1.16.0