quantized-llama

Tool for the automatic orchestration of Quantized LLMs

https://github.com/green-halo/quantized-llama

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.5%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Tool for the automatic orchestration of Quantized LLMs

Basic Info
  • Host: GitHub
  • Owner: Green-Halo
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 6.31 MB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

Large Language Models Quantization: Energy Efficiency and Performance Trade-offs

Overview

This study aims to evaluate the LLaMA-3.1-8B-Instruct and Qwen-2.5-7B-Instruct model under 4-bit Post-Training Quantization (PTQ). Our focus is on energy efficiency and performance across different natural language processing (NLP) tasks, quantifying the trade-offs between reduced model precision and its impact on accuracy, energy consumption, and resource utilization.

Our analysis explores whether quantization, while improving energy efficiency, affects model accuracy across NLP tasks. We assess three primary NLP task types: 1. Sentiment Analysis (SA) 2. Sentence Pair Semantic Similarity (SPS) 3. Natural Language Inference (NLI)

Requirements

Prerequisites

  • Miniconda 3: Ensure Miniconda 3 is installed on your system. Download Miniconda
  • Install required Python libraries by running: bash conda env create -f environment.yml conda activate py310 bash pip install -r requirements.txt

Download models

  1. Windows
    powershell .\download_models\windows.ps1

  2. Linux
    Install pynvml to monitor GPU utilization. Use the following command: bash bash ./download_models/linux.sh

Running the project

To run the main experiment, execute the following command from the root directory: bash python experiment-runner/ tasks/onnx/RunnerConfig_<model_name>.py This command initiates the quantization experiments on the LLaMA3-8B model using 4-bit and 8-bit precision levels. The experiment assesses the impact of quantization on energy efficiency, accuracy, and resource utilization across various NLP tasks, including those in the GLUE and IMDB datasets.

The results table is available at the folder run_tables, which contains detailed experiment results of different models for further analysis.

Quantization

The quantization code can be found in the quantization folder. We provide two methods for quantization, GPTQ and AWQ, by AutoGPTQ and AutoAWQ. If you want to quantize models by yourself, please follow the instrunction of these two repos at first, then set the models and datasets path in the scripts of yourself.

After that, you need to convert the models to ONNX format by the script convert_to_onnx.py.

Data Analysis

Statistical tests and visualizations are performed using R scripts to interpret the impact of quantization effectively. All data analysis scripts are stored in the data-analysis folder. The four primary scripts refer to the main four Research Questions (RQs) addressed in this study.

For each script, the first step is to specify the data file to be analyzed and visualized. The two examples we provide are: 1. run_tables/llama.csv: This file consists of results of different quantization models of the LLaMA-3.1-8B-Instruct model across different NLP tasks. 2. run_tables/qwen.csv: This file consists of results of different quantization models of the QWEN 2.5 model across different NLP tasks.

After specifying the data file, working directory, and loading the required libraries, the following command can be used to run the analysis: bash Rscript data-analysis/RQ1.R # For RQ1 Rscript data-analysis/RQ2.R # For RQ2 Rscript data-analysis/RQ3.R # For RQ3 Rscript data-analysis/RQ4.R # For RQ4 And also, you can use All-in-One.ipynb notebook to run.

Owner

  • Name: Green Halo
  • Login: Green-Halo
  • Kind: organization

Citation (CITATION.cff)

cff-version: 1.2.0
title: Experiment Runner
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - name: S2 Group
    address: De Boelelaan 1111
    city: Amsterdam
    post-code: 1081 HV
    country: NL
    website: 'https://s2group.cs.vu.nl/'
repository-code: 'https://github.com/s2-group/experiment-runner/'
abstract: >-
  Experiment Runner is a generic framework to automatically
  execute measurement-based experiments on any platform. The
  experiments are user-defined, can be completely
  customized, and expressed in python code!
keywords:
  - experimentation
  - empirical
  - empirical-software-engineering
license: MIT

GitHub Events

Total
  • Member event: 1
  • Push event: 6
Last Year
  • Member event: 1
  • Push event: 6

Dependencies

requirements.txt pypi
  • datasets *
  • dill *
  • jsonpickle *
  • onnxruntime-gpu *
  • pandas *
  • psutil *
  • tabulate *
  • transformers *
environment.yaml pypi
  • aiohappyeyeballs ==2.4.3
  • aiohttp ==3.10.10
  • aiosignal ==1.3.1
  • async-timeout ==4.0.3
  • attrs ==24.2.0
  • charset-normalizer ==3.4.0
  • coloredlogs ==15.0.1
  • datasets ==3.0.2
  • dill ==0.3.8
  • filelock ==3.16.1
  • flatbuffers ==24.3.25
  • frozenlist ==1.5.0
  • fsspec ==2024.9.0
  • huggingface-hub ==0.26.1
  • humanfriendly ==10.0
  • idna ==3.10
  • jsonpickle ==3.3.0
  • multidict ==6.1.0
  • multiprocess ==0.70.16
  • numpy ==2.1.2
  • onnxruntime-gpu ==1.20.0
  • packaging ==24.1
  • pandas ==2.2.3
  • propcache ==0.2.0
  • protobuf ==5.28.3
  • psutil ==6.1.0
  • pyarrow ==17.0.0
  • pyjoules ==0.5.1
  • pynvml ==11.5.3
  • python-dateutil ==2.9.0.post0
  • pytz ==2024.2
  • six ==1.16.0
  • sympy ==1.13.3
  • tabulate ==0.9.0
  • tqdm ==4.66.5
  • typing-extensions ==4.12.2
  • tzdata ==2024.2
  • xxhash ==3.5.0
  • yarl ==1.16.0