ensembleforecasting

Using multiple LLMs for ensemble Forecasting

https://github.com/sebastianbodza/ensembleforecasting

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (4.8%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Using multiple LLMs for ensemble Forecasting

Basic Info
  • Host: GitHub
  • Owner: SebastianBodza
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 94.7 KB
Statistics
  • Stars: 16
  • Watchers: 2
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Created about 2 years ago · Last pushed about 2 years ago
Metadata Files
Readme Citation

README.md

EnsembleForecasting

Using multiple LLMs for ensemble Forecasting.

Got the Idea from participation in a Kaggle Challenges with Tabular-Data and TimeSeries-Data. Showing massive performance boost from Ensembling multiple Models with simple averaging of the results. Especially in uncertainty! Why not also for LLMs?

First unoptized proof of concept with two quantized (AWQ) Models. Can also be run on Colab!

tl;dr: With an Ensemble of two 4-bit quanitzed Models it is possible to beat the official Human-Eval Scores.

Results

Humaneval

| all AWQ quantization | Magicode | Deepseek | Ensemble TT [1] | Ensemble MT [2] | Ensemble MinT [3] | |-----------|--------------|--------------|----------|----------|----------| | ~ 7B | 71.95% | 76.83% | 77.44% | 76.83% | 76.22% |

| all AWQ quantization | Phind-34B-AWQ | Deepseek-33B-AWQ | Ensemble TT [1] | |-----------|--------------|--------------|----------| | ~ 34B | 74.39% | 78.05% | 79.89% |

[1] Taking the token with the higher probability after taking the max
[2] Taking the average of the logits and the sampling the highest probability
[3] Taking the min of both outputted logits and the sampling the highest probability

Next Step:

Ensemble on Logprob base: - Take the maximum of both -> just sanity checking [1]

Others: - using the same model with different system prompts (idea from x @nanulled) e.g. you are debugging and you are a code generator -> easier to implement with better throughput - Ensemble LoRa serving with S-LoRa - Taking more diverse models -> not too much variance in the models and deepseek seems to be fairly dominant. - Ensemble on Model level -> faster

Updates:

  • 2024-01-14: Added Results for single models and ensemble
  • 2024-01-16: Added Results for 34B Models beating the official Score from Deepseek with AWQ-Models (4-bit quantization)
  • 2024-01-17: Added Torch Ensemble Model

Owner

  • Name: Sebastian Bodza
  • Login: SebastianBodza
  • Kind: user
  • Location: Stuttgart
  • Company: Institute for Automotive Engineering

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Bodza"
  given-names: "Sebastian"
title: "EnsembleForecasting"
version: 0.0.1
date-released: 2024-01-13
url: "https://github.com/SebastianBodza/EnsembleForecasting"

GitHub Events

Total
Last Year