stormtrooper
Zero/few shot learning components for scikit-learn pipelines with LLMs and transformers.
https://github.com/centre-for-humanities-computing/stormtrooper
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.0%) to scientific vocabulary
Keywords
Repository
Zero/few shot learning components for scikit-learn pipelines with LLMs and transformers.
Basic Info
- Host: GitHub
- Owner: centre-for-humanities-computing
- License: mit
- Language: Python
- Default Branch: main
- Homepage: https://centre-for-humanities-computing.github.io/stormtrooper/
- Size: 1.37 MB
Statistics
- Stars: 18
- Watchers: 1
- Forks: 2
- Open Issues: 1
- Releases: 1
Topics
Metadata Files
README.md
stormtrooper
Zero/few shot learning components for scikit-learn pipelines with large-language models and transformers.
New in 1.0.0
Trooper
The brand new Trooper interface allows you not to have to specify what model type you wish to use.
Stormtrooper will automatically detect the model type from the specified name.
```python from stormtrooper import Trooper
This loads a setfit model
model = Trooper("all-MiniLM-L6-v2")
This loads an OpenAI model
model = Trooper("gpt-4")
This loads a Text2Text model
model = Trooper("google/flan-t5-base") ```
Unified zero and few-shot classification
You no longer have to specify whether a model should be a few or a zero-shot classifier when initialising it. If you do not pass any training examples, it will be automatically assumed that the model should be zero-shot.
```python
This is a zero-shot model
model.fit(None, ["dog", "cat"])
This is a few-shot model
model.fit(["he was a good boy", "just lay down on my laptop"], ["dog", "cat"])
```
Model types
You can use all sorts of transformer models for few and zero-shot classification in Stormtrooper.
- Instruction fine-tuned generative models, e.g.
Trooper("HuggingFaceH4/zephyr-7b-beta") - Encoder models with SetFit, e.g.
Trooper("all-MiniLM-L6-v2") - Text2Text models e.g.
Trooper("google/flan-t5-base") - OpenAI models e.g.
Trooper("gpt-4") - NLI models e.g.
Trooper("facebook/bart-large-mnli")
Example usage
Find more in our docs.
bash
pip install stormtrooper
```python from stormtrooper import Trooper
classlabels = ["atheism/christianity", "astronomy/space"] exampletexts = [ "God came down to earth to save us.", "A new nebula was recently discovered in the proximity of the Oort cloud." ] new_texts = ["God bless the reailway workers", "The frigate is ready to launch from the spaceport"]
Zero-shot classification
model = Trooper("google/flan-t5-base") model.fit(None, classlabels) model.predict(newtexts)
["atheism/christianity", "astronomy/space"]
Few-shot classification
model = Trooper("google/flan-t5-base") model.fit(exampletexts, classlabels) model.predict(new_texts)
["atheism/christianity", "astronomy/space"]
```
Fuzzy Matching
Generative and text2text models by default will fuzzy match results to the closest class label, you can disable this behavior
by specifying fuzzy_match=False.
If you want fuzzy matching speedup, you should install python-Levenshtein.
Inference on GPU
From version 0.2.2 you can run models on GPU. You can specify the device when initializing a model:
python
classifier = Trooper("all-MiniLM-L6-v2", device="cuda:0")
Inference on multiple GPUs
You can run a model on multiple devices in order of device priority GPU -> CPU + Ram -> Disk and on multiple devices by using the device_map argument.
Note that this only works with text2text and generative models.
model = Trooper("HuggingFaceH4/zephyr-7b-beta", device_map="auto")
Owner
- Name: Center for Humanities Computing Aarhus
- Login: centre-for-humanities-computing
- Kind: organization
- Email: chcaa@cas.au.dk
- Location: Aarhus, Denmark
- Website: https://chc.au.dk/
- Repositories: 130
- Profile: https://github.com/centre-for-humanities-computing
Citation (citation.cff)
cff-version: 0.1.0 message: "When using this package please cite us." authors: - family-names: "Kardos" given-names: "Márton" orcid: "https://orcid.org/0000-0001-9652-4498" title: "stormtrooper: scikit-learn compatible zero and few shot learning in Python" version: 0.3.0 date-released: 2023-18-08 url: "https://github.com/centre-for-humanities-computing/stormtrooper"
GitHub Events
Total
- Issues event: 3
- Watch event: 8
- Issue comment event: 2
- Push event: 1
- Pull request review event: 1
- Pull request event: 2
- Create event: 1
Last Year
- Issues event: 3
- Watch event: 8
- Issue comment event: 2
- Push event: 1
- Pull request review event: 1
- Pull request event: 2
- Create event: 1
Committers
Last synced: over 1 year ago
Top Committers
| Name | Commits | |
|---|---|---|
| Márton Kardos | p****3@g****m | 107 |
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 4
- Total pull requests: 3
- Average time to close issues: 17 days
- Average time to close pull requests: 6 days
- Total issue authors: 4
- Total pull request authors: 1
- Average comments per issue: 1.5
- Average comments per pull request: 0.0
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 4
- Pull requests: 3
- Average time to close issues: 17 days
- Average time to close pull requests: 6 days
- Issue authors: 4
- Pull request authors: 1
- Average comments per issue: 1.5
- Average comments per pull request: 0.0
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- MinaAlmasi (1)
- miscodisco (1)
- stijn-uva (1)
- sarangs-ntnu (1)
Pull Request Authors
- x-tabdeveloping (5)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 103 last-month
- Total dependent packages: 0
- Total dependent repositories: 1
- Total versions: 13
- Total maintainers: 1
pypi.org: stormtrooper
Transformer/LLM-based zero and few-shot classification in scikit-learn pipelines
- Documentation: https://stormtrooper.readthedocs.io/
- License: MIT
-
Latest release: 1.0.1
published about 1 year ago
Rankings
Maintainers (1)
Dependencies
- actions/checkout v3 composite
- actions/configure-pages v2 composite
- actions/deploy-pages v1 composite
- actions/upload-pages-artifact v1 composite
- aiohttp ^3.8.0
- datasets ^2.14.0
- numpy ^1.23.0
- openai ^0.28.0
- python ^3.9
- scikit-learn ^1.2.0
- setfit ^0.7.0
- thefuzz ^0.18.0
- tiktoken ^0.5.0
- torch ^2.0.0
- tqdm ^4.60.0
- transformers ^4.25.0
- actions/checkout v4 composite
- actions/setup-python v4 composite