stormtrooper

Zero/few shot learning components for scikit-learn pipelines with LLMs and transformers.

https://github.com/centre-for-humanities-computing/stormtrooper

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.0%) to scientific vocabulary

Keywords

chatgpt few-shot-learning gpt-4 large-language-models llm scikit-learn transformer transformers zero-shot-learning
Last synced: 6 months ago · JSON representation ·

Repository

Zero/few shot learning components for scikit-learn pipelines with LLMs and transformers.

Basic Info
Statistics
  • Stars: 18
  • Watchers: 1
  • Forks: 2
  • Open Issues: 1
  • Releases: 1
Topics
chatgpt few-shot-learning gpt-4 large-language-models llm scikit-learn transformer transformers zero-shot-learning
Created over 2 years ago · Last pushed about 1 year ago
Metadata Files
Readme License Citation

README.md

stormtrooper


Zero/few shot learning components for scikit-learn pipelines with large-language models and transformers.

Documentation

New in 1.0.0

Trooper

The brand new Trooper interface allows you not to have to specify what model type you wish to use. Stormtrooper will automatically detect the model type from the specified name.

```python from stormtrooper import Trooper

This loads a setfit model

model = Trooper("all-MiniLM-L6-v2")

This loads an OpenAI model

model = Trooper("gpt-4")

This loads a Text2Text model

model = Trooper("google/flan-t5-base") ```

Unified zero and few-shot classification

You no longer have to specify whether a model should be a few or a zero-shot classifier when initialising it. If you do not pass any training examples, it will be automatically assumed that the model should be zero-shot.

```python

This is a zero-shot model

model.fit(None, ["dog", "cat"])

This is a few-shot model

model.fit(["he was a good boy", "just lay down on my laptop"], ["dog", "cat"])

```

Model types

You can use all sorts of transformer models for few and zero-shot classification in Stormtrooper.

  1. Instruction fine-tuned generative models, e.g. Trooper("HuggingFaceH4/zephyr-7b-beta")
  2. Encoder models with SetFit, e.g. Trooper("all-MiniLM-L6-v2")
  3. Text2Text models e.g. Trooper("google/flan-t5-base")
  4. OpenAI models e.g. Trooper("gpt-4")
  5. NLI models e.g. Trooper("facebook/bart-large-mnli")

Example usage

Find more in our docs.

bash pip install stormtrooper

```python from stormtrooper import Trooper

classlabels = ["atheism/christianity", "astronomy/space"] exampletexts = [ "God came down to earth to save us.", "A new nebula was recently discovered in the proximity of the Oort cloud." ] new_texts = ["God bless the reailway workers", "The frigate is ready to launch from the spaceport"]

Zero-shot classification

model = Trooper("google/flan-t5-base") model.fit(None, classlabels) model.predict(newtexts)

["atheism/christianity", "astronomy/space"]

Few-shot classification

model = Trooper("google/flan-t5-base") model.fit(exampletexts, classlabels) model.predict(new_texts)

["atheism/christianity", "astronomy/space"]

```

Fuzzy Matching

Generative and text2text models by default will fuzzy match results to the closest class label, you can disable this behavior by specifying fuzzy_match=False.

If you want fuzzy matching speedup, you should install python-Levenshtein.

Inference on GPU

From version 0.2.2 you can run models on GPU. You can specify the device when initializing a model:

python classifier = Trooper("all-MiniLM-L6-v2", device="cuda:0")

Inference on multiple GPUs

You can run a model on multiple devices in order of device priority GPU -> CPU + Ram -> Disk and on multiple devices by using the device_map argument. Note that this only works with text2text and generative models.

model = Trooper("HuggingFaceH4/zephyr-7b-beta", device_map="auto")

Owner

  • Name: Center for Humanities Computing Aarhus
  • Login: centre-for-humanities-computing
  • Kind: organization
  • Email: chcaa@cas.au.dk
  • Location: Aarhus, Denmark

Citation (citation.cff)

cff-version: 0.1.0
message: "When using this package please cite us."
authors:
- family-names: "Kardos"
  given-names: "Márton"
  orcid: "https://orcid.org/0000-0001-9652-4498"
title: "stormtrooper: scikit-learn compatible zero and few shot learning in Python"
version: 0.3.0
date-released: 2023-18-08
url: "https://github.com/centre-for-humanities-computing/stormtrooper"

GitHub Events

Total
  • Issues event: 3
  • Watch event: 8
  • Issue comment event: 2
  • Push event: 1
  • Pull request review event: 1
  • Pull request event: 2
  • Create event: 1
Last Year
  • Issues event: 3
  • Watch event: 8
  • Issue comment event: 2
  • Push event: 1
  • Pull request review event: 1
  • Pull request event: 2
  • Create event: 1

Committers

Last synced: over 1 year ago

All Time
  • Total Commits: 107
  • Total Committers: 1
  • Avg Commits per committer: 107.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 46
  • Committers: 1
  • Avg Commits per committer: 46.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Márton Kardos p****3@g****m 107

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 4
  • Total pull requests: 3
  • Average time to close issues: 17 days
  • Average time to close pull requests: 6 days
  • Total issue authors: 4
  • Total pull request authors: 1
  • Average comments per issue: 1.5
  • Average comments per pull request: 0.0
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 4
  • Pull requests: 3
  • Average time to close issues: 17 days
  • Average time to close pull requests: 6 days
  • Issue authors: 4
  • Pull request authors: 1
  • Average comments per issue: 1.5
  • Average comments per pull request: 0.0
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • MinaAlmasi (1)
  • miscodisco (1)
  • stijn-uva (1)
  • sarangs-ntnu (1)
Pull Request Authors
  • x-tabdeveloping (5)
Top Labels
Issue Labels
bug (2) enhancement (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 103 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 13
  • Total maintainers: 1
pypi.org: stormtrooper

Transformer/LLM-based zero and few-shot classification in scikit-learn pipelines

  • Versions: 13
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 103 Last month
Rankings
Dependent packages count: 10.0%
Average: 16.8%
Downloads: 18.6%
Dependent repos count: 21.7%
Maintainers (1)
Last synced: 6 months ago

Dependencies

.github/workflows/static.yml actions
  • actions/checkout v3 composite
  • actions/configure-pages v2 composite
  • actions/deploy-pages v1 composite
  • actions/upload-pages-artifact v1 composite
pyproject.toml pypi
  • aiohttp ^3.8.0
  • datasets ^2.14.0
  • numpy ^1.23.0
  • openai ^0.28.0
  • python ^3.9
  • scikit-learn ^1.2.0
  • setfit ^0.7.0
  • thefuzz ^0.18.0
  • tiktoken ^0.5.0
  • torch ^2.0.0
  • tqdm ^4.60.0
  • transformers ^4.25.0
.github/workflows/tests.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v4 composite