stormtrooper

Zero/few shot learning components for scikit-learn pipelines with LLMs and transformers.

https://github.com/centre-for-humanities-computing/stormtrooper

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.0%) to scientific vocabulary

Keywords

chatgpt few-shot-learning gpt-4 large-language-models llm scikit-learn transformer transformers zero-shot-learning

Last synced: 6 months ago · JSON representation ·

Repository

Zero/few shot learning components for scikit-learn pipelines with LLMs and transformers.

Basic Info

Host: GitHub
Owner: centre-for-humanities-computing
License: mit
Language: Python
Default Branch: main
Homepage: https://centre-for-humanities-computing.github.io/stormtrooper/
Size: 1.37 MB

Statistics

Stars: 18
Watchers: 1
Forks: 2
Open Issues: 1
Releases: 1

Topics

chatgpt few-shot-learning gpt-4 large-language-models llm scikit-learn transformer transformers zero-shot-learning

Created over 2 years ago · Last pushed about 1 year ago

Metadata Files

Readme License Citation

stormtrooper

Zero/few shot learning components for scikit-learn pipelines with large-language models and transformers.

Documentation

New in 1.0.0

`Trooper`

The brand new Trooper interface allows you not to have to specify what model type you wish to use. Stormtrooper will automatically detect the model type from the specified name.

```python from stormtrooper import Trooper

This loads a setfit model

model = Trooper("all-MiniLM-L6-v2")

This loads an OpenAI model

model = Trooper("gpt-4")

This loads a Text2Text model

model = Trooper("google/flan-t5-base") ```

Unified zero and few-shot classification

You no longer have to specify whether a model should be a few or a zero-shot classifier when initialising it. If you do not pass any training examples, it will be automatically assumed that the model should be zero-shot.

```python

This is a zero-shot model

model.fit(None, ["dog", "cat"])

This is a few-shot model

model.fit(["he was a good boy", "just lay down on my laptop"], ["dog", "cat"])

```

Model types

You can use all sorts of transformer models for few and zero-shot classification in Stormtrooper.

Instruction fine-tuned generative models, e.g. Trooper("HuggingFaceH4/zephyr-7b-beta")
Encoder models with SetFit, e.g. Trooper("all-MiniLM-L6-v2")
Text2Text models e.g. Trooper("google/flan-t5-base")
OpenAI models e.g. Trooper("gpt-4")
NLI models e.g. Trooper("facebook/bart-large-mnli")

Example usage

Find more in our docs.

bash pip install stormtrooper

```python from stormtrooper import Trooper

classlabels = ["atheism/christianity", "astronomy/space"] exampletexts = [ "God came down to earth to save us.", "A new nebula was recently discovered in the proximity of the Oort cloud." ] new_texts = ["God bless the reailway workers", "The frigate is ready to launch from the spaceport"]

Zero-shot classification

model = Trooper("google/flan-t5-base") model.fit(None, classlabels) model.predict(newtexts)

["atheism/christianity", "astronomy/space"]

Few-shot classification

model = Trooper("google/flan-t5-base") model.fit(exampletexts, classlabels) model.predict(new_texts)

["atheism/christianity", "astronomy/space"]

```

Fuzzy Matching

Generative and text2text models by default will fuzzy match results to the closest class label, you can disable this behavior by specifying fuzzy_match=False.

If you want fuzzy matching speedup, you should install python-Levenshtein.

Inference on GPU

From version 0.2.2 you can run models on GPU. You can specify the device when initializing a model:

python classifier = Trooper("all-MiniLM-L6-v2", device="cuda:0")

Inference on multiple GPUs

You can run a model on multiple devices in order of device priority GPU -> CPU + Ram -> Disk and on multiple devices by using the device_map argument. Note that this only works with text2text and generative models.

model = Trooper("HuggingFaceH4/zephyr-7b-beta", device_map="auto")

Owner

Name: Center for Humanities Computing Aarhus
Login: centre-for-humanities-computing
Kind: organization
Email: chcaa@cas.au.dk
Location: Aarhus, Denmark

Website: https://chc.au.dk/
Repositories: 130
Profile: https://github.com/centre-for-humanities-computing

Citation (citation.cff)

cff-version: 0.1.0
message: "When using this package please cite us."
authors:
- family-names: "Kardos"
  given-names: "Márton"
  orcid: "https://orcid.org/0000-0001-9652-4498"
title: "stormtrooper: scikit-learn compatible zero and few shot learning in Python"
version: 0.3.0
date-released: 2023-18-08
url: "https://github.com/centre-for-humanities-computing/stormtrooper"

GitHub Events

Total

Issues event: 3
Watch event: 8
Issue comment event: 2
Push event: 1
Pull request review event: 1
Pull request event: 2
Create event: 1

Last Year

Issues event: 3
Watch event: 8
Issue comment event: 2
Push event: 1
Pull request review event: 1
Pull request event: 2
Create event: 1

Committers

Last synced: over 1 year ago

All Time

Total Commits: 107
Total Committers: 1
Avg Commits per committer: 107.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 46
Committers: 1
Avg Commits per committer: 46.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Márton Kardos	p**3@g**m	107

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 4
Total pull requests: 3
Average time to close issues: 17 days
Average time to close pull requests: 6 days
Total issue authors: 4
Total pull request authors: 1
Average comments per issue: 1.5
Average comments per pull request: 0.0
Merged pull requests: 3
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 4
Pull requests: 3
Average time to close issues: 17 days
Average time to close pull requests: 6 days
Issue authors: 4
Pull request authors: 1
Average comments per issue: 1.5
Average comments per pull request: 0.0
Merged pull requests: 3
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

MinaAlmasi (1)
miscodisco (1)
stijn-uva (1)
sarangs-ntnu (1)

Pull Request Authors

x-tabdeveloping (5)

Top Labels

Issue Labels

bug (2) enhancement (1)

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 103 last-month

Total dependent packages: 0
Total dependent repositories: 1
Total versions: 13
Total maintainers: 1

pypi.org: stormtrooper

Transformer/LLM-based zero and few-shot classification in scikit-learn pipelines

Documentation: https://stormtrooper.readthedocs.io/
License: MIT
Latest release: 1.0.1
published about 1 year ago

Versions: 13
Dependent Packages: 0
Dependent Repositories: 1
Downloads: 103 Last month

Rankings

Dependent packages count: 10.0%

Average: 16.8%

Downloads: 18.6%

Dependent repos count: 21.7%

Maintainers (1)

drkardosdrur

Last synced: 6 months ago

Dependencies

.github/workflows/static.yml actions

actions/checkout v3 composite
actions/configure-pages v2 composite
actions/deploy-pages v1 composite
actions/upload-pages-artifact v1 composite

pyproject.toml pypi

aiohttp ^3.8.0
datasets ^2.14.0
numpy ^1.23.0
openai ^0.28.0
python ^3.9
scikit-learn ^1.2.0
setfit ^0.7.0
thefuzz ^0.18.0
tiktoken ^0.5.0
torch ^2.0.0
tqdm ^4.60.0
transformers ^4.25.0

.github/workflows/tests.yml actions

actions/checkout v4 composite
actions/setup-python v4 composite

stormtrooper

Science Score: 44.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

stormtrooper

New in 1.0.0

Trooper

This loads a setfit model

This loads an OpenAI model

This loads a Text2Text model

Unified zero and few-shot classification

This is a zero-shot model

This is a few-shot model

Model types

Example usage

Zero-shot classification

["atheism/christianity", "astronomy/space"]

Few-shot classification

["atheism/christianity", "astronomy/space"]

Fuzzy Matching

Inference on GPU

Inference on multiple GPUs

Owner

Citation (citation.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: stormtrooper

Rankings

Maintainers (1)

Dependencies

`Trooper`