classy-classification
This repository contains an easy and intuitive approach to few-shot classification using sentence-transformers or spaCy models, or zero-shot classification with Huggingface.
https://github.com/davidberenstein1957/classy-classification
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.3%) to scientific vocabulary
Keywords
Repository
This repository contains an easy and intuitive approach to few-shot classification using sentence-transformers or spaCy models, or zero-shot classification with Huggingface.
Basic Info
Statistics
- Stars: 219
- Watchers: 6
- Forks: 15
- Open Issues: 0
- Releases: 22
Topics
Metadata Files
README.md
Classy Classification
Have you ever struggled with needing a Spacy TextCategorizer but didn't have the time to train one from scratch? Classy Classification is the way to go! For few-shot classification using sentence-transformers or spaCy models, provide a dictionary with labels and examples, or just provide a list of labels for zero shot-classification with Hugginface zero-shot classifiers.
Install
pip install classy-classification
SetFit support
I got a lot of requests for SetFit support, but I decided to create a separate package for this. Feel free to check it out. ❤️
Quickstart
SpaCy embeddings
```python import spacy
or import standalone
from classy_classification import ClassyClassifier
data = { "furniture": ["This text is about chairs.", "Couches, benches and televisions.", "I really need to get a new sofa."], "kitchen": ["There also exist things like fridges.", "I hope to be getting a new stove today.", "Do you also have some ovens."] }
nlp = spacy.load("encorewebtrf") nlp.addpipe( "classy_classification", config={ "data": data, "model": "spacy" } )
print(nlp("I am looking for kitchen appliances.")._.cats)
Output:
[{"furniture" : 0.21}, {"kitchen": 0.79}]
```
Sentence level classification
```python import spacy
data = { "furniture": ["This text is about chairs.", "Couches, benches and televisions.", "I really need to get a new sofa."], "kitchen": ["There also exist things like fridges.", "I hope to be getting a new stove today.", "Do you also have some ovens."] }
nlp.addpipe( "classyclassification", config={ "data": data, "model": "spacy", "include_sent": True } )
print(nlp("I am looking for kitchen appliances. And I love doing so.").sents[0]._.cats)
Output:
[[{"furniture" : 0.21}, {"kitchen": 0.79}]
```
Define random seed and verbosity
```python
nlp.addpipe( "classyclassification", config={ "data": data, "verbose": True, "config": {"seed": 42} } ) ```
Multi-label classification
Sometimes multiple labels are necessary to fully describe the contents of a text. In that case, we want to make use of the multi-label implementation, here the sum of label scores is not limited to 1. Just pass the same training data to multiple keys.
```python import spacy
data = { "furniture": ["This text is about chairs.", "Couches, benches and televisions.", "I really need to get a new sofa.", "We have a new dinner table.", "There also exist things like fridges.", "I hope to be getting a new stove today.", "Do you also have some ovens.", "We have a new dinner table."], "kitchen": ["There also exist things like fridges.", "I hope to be getting a new stove today.", "Do you also have some ovens.", "We have a new dinner table.", "There also exist things like fridges.", "I hope to be getting a new stove today.", "Do you also have some ovens.", "We have a new dinner table."] }
nlp = spacy.load("encorewebmd") nlp.addpipe( "classyclassification", config={ "data": data, "model": "spacy", "multilabel": True, } )
print(nlp("I am looking for furniture and kitchen equipment.")._.cats)
Output:
[{"furniture": 0.92}, {"kitchen": 0.91}]
```
Outlier detection
Sometimes it is worth to be able to do outlier detection or binary classification. This can either be approached using
a binary training dataset, however, I have also implemented support for a OneClassSVM for outlier detection using a single label. Not that this method does not return probabilities, but that the data is formatted like label-score value pair to ensure uniformity.
Approach 1:
```python import spacy
data_binary = { "inlier": ["This text is about chairs.", "Couches, benches and televisions.", "I really need to get a new sofa."], "outlier": ["Text about kitchen equipment", "This text is about politics", "Comments about AI and stuff."] }
nlp = spacy.load("encorewebmd") nlp.addpipe( "classyclassification", config={ "data": databinary, } )
print(nlp("This text is a random text")._.cats)
Output:
[{'inlier': 0.2926672385488411, 'outlier': 0.707332761451159}]
```
Approach 2:
```python import spacy
datasingular = { "furniture": ["This text is about chairs.", "Couches, benches and televisions.", "I really need to get a new sofa.", "We have a new dinner table."] } nlp = spacy.load("encorewebmd") nlp.addpipe( "classyclassification", config={ "data": data_singular, } )
print(nlp("This text is a random text")._.cats)
Output:
[{'furniture': 0, 'not_furniture': 1}]
```
Sentence-transfomer embeddings
```python import spacy
data = { "furniture": ["This text is about chairs.", "Couches, benches and televisions.", "I really need to get a new sofa."], "kitchen": ["There also exist things like fridges.", "I hope to be getting a new stove today.", "Do you also have some ovens."] }
nlp = spacy.blank("en") nlp.addpipe( "classyclassification", config={ "data": data, "model": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2", "device": "gpu" } )
print(nlp("I am looking for kitchen appliances.")._.cats)
Output:
[{"furniture": 0.21}, {"kitchen": 0.79}]
```
Hugginface zero-shot classifiers
```python import spacy
data = ["furniture", "kitchen"]
nlp = spacy.blank("en") nlp.addpipe( "classyclassification", config={ "data": data, "model": "typeform/distilbert-base-uncased-mnli", "cat_type": "zero", "device": "gpu" } )
print(nlp("I am looking for kitchen appliances.")._.cats)
Output:
[{"furniture": 0.21}, {"kitchen": 0.79}]
```
Credits
Inspiration Drawn From
Huggingface does offer some nice models for few/zero-shot classification, but these are not tailored to multi-lingual approaches. Rasa NLU has a nice approach for this, but its too embedded in their codebase for easy usage outside of Rasa/chatbots. Additionally, it made sense to integrate sentence-transformers and Hugginface zero-shot, instead of default word embeddings. Finally, I decided to integrate with Spacy, since training a custom Spacy TextCategorizer seems like a lot of hassle if you want something quick and dirty.
Or buy me a coffee
Standalone usage without spaCy
```python
from classy_classification import ClassyClassifier
data = { "furniture": ["This text is about chairs.", "Couches, benches and televisions.", "I really need to get a new sofa."], "kitchen": ["There also exist things like fridges.", "I hope to be getting a new stove today.", "Do you also have some ovens."] }
classifier = ClassyClassifier(data=data) classifier("I am looking for kitchen appliances.") classifier.pipe(["I am looking for kitchen appliances."])
overwrite training data
classifier.settrainingdata(data=data) classifier("I am looking for kitchen appliances.")
overwrite embedding model
classifier.setembeddingmodel(model="paraphrase-MiniLM-L3-v2") classifier("I am looking for kitchen appliances.")
overwrite SVC config
classifier.setclassificationmodel( config={ "C": [1, 2, 5, 10, 20, 100], "kernel": ["linear"], "maxcrossvalidation_folds": 5 } ) classifier("I am looking for kitchen appliances.") ```
Save and load models
```python data = { "furniture": ["This text is about chairs.", "Couches, benches and televisions.", "I really need to get a new sofa."], "kitchen": ["There also exist things like fridges.", "I hope to be getting a new stove today.", "Do you also have some ovens."] } classifier = classyClassifier(data=data)
with open("./classifier.pkl", "wb") as f: pickle.dump(classifier, f)
f = open("./classifier.pkl", "rb") classifier = pickle.load(f) classifier("I am looking for kitchen appliances.") ```
Owner
- Name: David Berenstein
- Login: davidberenstein1957
- Kind: user
- Location: Madrid
- Company: @argilla-io
- Website: https://www.linkedin.com/in/david-berenstein-1bab11105/
- Repositories: 2
- Profile: https://github.com/davidberenstein1957
👨🏽🍳 Cooking, 👨🏽💻 Coding, 🏆 Committing Developer Advocate @argilla-io
Citation (CITATION.cff)
cff-version: 1.0.0
message: "If you use this software, please cite it as below."
authors:
- family-names: David
given-names: Berenstein
title: "Classy Classification - an easy and intuitive approach to few-shot classification using sentence-transformers or spaCy models, or zero-shot classification with Huggingface."
version: 0.6.0
date-released: 2022-12-31
GitHub Events
Total
- Create event: 4
- Release event: 2
- Issues event: 7
- Watch event: 8
- Issue comment event: 13
- Push event: 9
- Pull request event: 3
Last Year
- Create event: 4
- Release event: 2
- Issues event: 7
- Watch event: 8
- Issue comment event: 13
- Push event: 9
- Pull request event: 3
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| david | d****n@g****m | 77 |
| David Berenstein | d****n@p****m | 34 |
| Pepijn Boers | p****b@g****m | 4 |
| Boers | p****s@z****l | 2 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 44
- Total pull requests: 10
- Average time to close issues: 2 months
- Average time to close pull requests: about 2 months
- Total issue authors: 31
- Total pull request authors: 6
- Average comments per issue: 3.64
- Average comments per pull request: 1.9
- Merged pull requests: 6
- Bot issues: 0
- Bot pull requests: 1
Past Year
- Issues: 3
- Pull requests: 3
- Average time to close issues: 23 days
- Average time to close pull requests: 20 minutes
- Issue authors: 2
- Pull request authors: 1
- Average comments per issue: 3.33
- Average comments per pull request: 0.0
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- davidberenstein1957 (4)
- swageeth (3)
- andremacola (3)
- kbillesk (2)
- nv78 (2)
- koaning (2)
- KikeVen (2)
- RaiAmanRai (2)
- saitej123 (2)
- drkonafa (1)
- nsankar (1)
- dpicca (1)
- espdev (1)
- atefalvi (1)
- jackrvaughan (1)
Pull Request Authors
- davidberenstein1957 (5)
- RobinRojowiec (2)
- adelevie (1)
- PepijnBoers (1)
- Masboes (1)
- dependabot[bot] (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 246 last-month
- Total dependent packages: 1
- Total dependent repositories: 1
- Total versions: 33
- Total maintainers: 2
pypi.org: classy-classification
Have you every struggled with needing a Spacy TextCategorizer but didn't have the time to train one from scratch? Classy Classification is the way to go!
- Homepage: https://github.com/davidberenstein1957/classy-classification
- Documentation: https://github.com/davidberenstein1957/classy-classification
- License: MIT
-
Latest release: 1.0.2
published about 1 year ago
Rankings
Maintainers (2)
Dependencies
- 105 dependencies
- black ^22.3.0 develop
- flake8 ^4.0.1 develop
- flake8-bugbear ^22.3.23 develop
- flake8-docstrings ^1.6.0 develop
- isort ^5.10.1 develop
- pep8-naming ^0.12.1 develop
- pre-commit ^2.17.0 develop
- pytest ^7.0.1 develop
- python ^3.7
- scikit-learn ^1.0
- sentence-transformers ^2.0
- spacy ^3.0
- txtai ^4.5.0
- actions/checkout v3 composite
- actions/setup-python v3 composite
- actions/checkout v3 composite
- actions/setup-python v3 composite
- pypa/gh-action-pypi-publish 27b31702a0e7fc50959f5ad993c78deac1bdfc29 composite
