classy-classification

This repository contains an easy and intuitive approach to few-shot classification using sentence-transformers or spaCy models, or zero-shot classification with Huggingface.

https://github.com/davidberenstein1957/classy-classification

Keywords

few-shot-classifcation hacktoberfest machine-learning natural-language-processing nlp nlu sentence-transformers spacy text-classification

Last synced: 6 months ago · JSON representation ·

Repository

This repository contains an easy and intuitive approach to few-shot classification using sentence-transformers or spaCy models, or zero-shot classification with Huggingface.

Basic Info

Host: GitHub
Owner: davidberenstein1957
License: mit
Language: Python
Default Branch: main
Homepage:
Size: 613 KB

Statistics

Stars: 219
Watchers: 6
Forks: 15
Open Issues: 0
Releases: 22

Topics

few-shot-classifcation hacktoberfest machine-learning natural-language-processing nlp nlu sentence-transformers spacy text-classification

Created almost 4 years ago · Last pushed about 1 year ago

Metadata Files

Readme License Citation

Classy Classification

Have you ever struggled with needing a Spacy TextCategorizer but didn't have the time to train one from scratch? Classy Classification is the way to go! For few-shot classification using sentence-transformers or spaCy models, provide a dictionary with labels and examples, or just provide a list of labels for zero shot-classification with Hugginface zero-shot classifiers.

Install

pip install classy-classification

SetFit support

I got a lot of requests for SetFit support, but I decided to create a separate package for this. Feel free to check it out. ❤️

Quickstart

SpaCy embeddings

```python import spacy

or import standalone

from classy_classification import ClassyClassifier

data = { "furniture": ["This text is about chairs.", "Couches, benches and televisions.", "I really need to get a new sofa."], "kitchen": ["There also exist things like fridges.", "I hope to be getting a new stove today.", "Do you also have some ovens."] }

nlp = spacy.load("encorewebtrf") nlp.addpipe( "classy_classification", config={ "data": data, "model": "spacy" } )

print(nlp("I am looking for kitchen appliances.")._.cats)

Output:

[{"furniture" : 0.21}, {"kitchen": 0.79}]

```

Sentence level classification

```python import spacy

data = { "furniture": ["This text is about chairs.", "Couches, benches and televisions.", "I really need to get a new sofa."], "kitchen": ["There also exist things like fridges.", "I hope to be getting a new stove today.", "Do you also have some ovens."] }

nlp.addpipe( "classyclassification", config={ "data": data, "model": "spacy", "include_sent": True } )

print(nlp("I am looking for kitchen appliances. And I love doing so.").sents[0]._.cats)

Output:

[[{"furniture" : 0.21}, {"kitchen": 0.79}]

```

Define random seed and verbosity

```python

nlp.addpipe( "classyclassification", config={ "data": data, "verbose": True, "config": {"seed": 42} } ) ```

Multi-label classification

Sometimes multiple labels are necessary to fully describe the contents of a text. In that case, we want to make use of the multi-label implementation, here the sum of label scores is not limited to 1. Just pass the same training data to multiple keys.

```python import spacy

data = { "furniture": ["This text is about chairs.", "Couches, benches and televisions.", "I really need to get a new sofa.", "We have a new dinner table.", "There also exist things like fridges.", "I hope to be getting a new stove today.", "Do you also have some ovens.", "We have a new dinner table."], "kitchen": ["There also exist things like fridges.", "I hope to be getting a new stove today.", "Do you also have some ovens.", "We have a new dinner table.", "There also exist things like fridges.", "I hope to be getting a new stove today.", "Do you also have some ovens.", "We have a new dinner table."] }

nlp = spacy.load("encorewebmd") nlp.addpipe( "classyclassification", config={ "data": data, "model": "spacy", "multilabel": True, } )

print(nlp("I am looking for furniture and kitchen equipment.")._.cats)

Output:

[{"furniture": 0.92}, {"kitchen": 0.91}]

```

Outlier detection

Sometimes it is worth to be able to do outlier detection or binary classification. This can either be approached using a binary training dataset, however, I have also implemented support for a OneClassSVM for outlier detection using a single label. Not that this method does not return probabilities, but that the data is formatted like label-score value pair to ensure uniformity.

Approach 1:

```python import spacy

data_binary = { "inlier": ["This text is about chairs.", "Couches, benches and televisions.", "I really need to get a new sofa."], "outlier": ["Text about kitchen equipment", "This text is about politics", "Comments about AI and stuff."] }

nlp = spacy.load("encorewebmd") nlp.addpipe( "classyclassification", config={ "data": databinary, } )

print(nlp("This text is a random text")._.cats)

Output:

[{'inlier': 0.2926672385488411, 'outlier': 0.707332761451159}]

```

Approach 2:

```python import spacy

datasingular = { "furniture": ["This text is about chairs.", "Couches, benches and televisions.", "I really need to get a new sofa.", "We have a new dinner table."] } nlp = spacy.load("encorewebmd") nlp.addpipe( "classyclassification", config={ "data": data_singular, } )

print(nlp("This text is a random text")._.cats)

Output:

[{'furniture': 0, 'not_furniture': 1}]

```

Sentence-transfomer embeddings

```python import spacy

data = { "furniture": ["This text is about chairs.", "Couches, benches and televisions.", "I really need to get a new sofa."], "kitchen": ["There also exist things like fridges.", "I hope to be getting a new stove today.", "Do you also have some ovens."] }

nlp = spacy.blank("en") nlp.addpipe( "classyclassification", config={ "data": data, "model": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2", "device": "gpu" } )

print(nlp("I am looking for kitchen appliances.")._.cats)

Output:

[{"furniture": 0.21}, {"kitchen": 0.79}]

```

Hugginface zero-shot classifiers

```python import spacy

data = ["furniture", "kitchen"]

nlp = spacy.blank("en") nlp.addpipe( "classyclassification", config={ "data": data, "model": "typeform/distilbert-base-uncased-mnli", "cat_type": "zero", "device": "gpu" } )

print(nlp("I am looking for kitchen appliances.")._.cats)

Output:

[{"furniture": 0.21}, {"kitchen": 0.79}]

```

Credits

Inspiration Drawn From

Huggingface does offer some nice models for few/zero-shot classification, but these are not tailored to multi-lingual approaches. Rasa NLU has a nice approach for this, but its too embedded in their codebase for easy usage outside of Rasa/chatbots. Additionally, it made sense to integrate sentence-transformers and Hugginface zero-shot, instead of default word embeddings. Finally, I decided to integrate with Spacy, since training a custom Spacy TextCategorizer seems like a lot of hassle if you want something quick and dirty.

Or buy me a coffee

Standalone usage without spaCy

```python

from classy_classification import ClassyClassifier

data = { "furniture": ["This text is about chairs.", "Couches, benches and televisions.", "I really need to get a new sofa."], "kitchen": ["There also exist things like fridges.", "I hope to be getting a new stove today.", "Do you also have some ovens."] }

classifier = ClassyClassifier(data=data) classifier("I am looking for kitchen appliances.") classifier.pipe(["I am looking for kitchen appliances."])

overwrite training data

classifier.settrainingdata(data=data) classifier("I am looking for kitchen appliances.")

overwrite embedding model

classifier.setembeddingmodel(model="paraphrase-MiniLM-L3-v2") classifier("I am looking for kitchen appliances.")

overwrite SVC config

classifier.setclassificationmodel( config={ "C": [1, 2, 5, 10, 20, 100], "kernel": ["linear"], "maxcrossvalidation_folds": 5 } ) classifier("I am looking for kitchen appliances.") ```

Save and load models

```python data = { "furniture": ["This text is about chairs.", "Couches, benches and televisions.", "I really need to get a new sofa."], "kitchen": ["There also exist things like fridges.", "I hope to be getting a new stove today.", "Do you also have some ovens."] } classifier = classyClassifier(data=data)

with open("./classifier.pkl", "wb") as f: pickle.dump(classifier, f)

f = open("./classifier.pkl", "rb") classifier = pickle.load(f) classifier("I am looking for kitchen appliances.") ```

Owner

Name: David Berenstein
Login: davidberenstein1957
Kind: user
Location: Madrid
Company: @argilla-io

Website: https://www.linkedin.com/in/david-berenstein-1bab11105/
Repositories: 2
Profile: https://github.com/davidberenstein1957

👨🏽‍🍳 Cooking, 👨🏽‍💻 Coding, 🏆 Committing Developer Advocate @argilla-io

Citation (CITATION.cff)

cff-version: 1.0.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: David
    given-names: Berenstein
title: "Classy Classification - an easy and intuitive approach to few-shot classification using sentence-transformers or spaCy models, or zero-shot classification with Huggingface."
version: 0.6.0
date-released: 2022-12-31

GitHub Events

Total

Create event: 4
Release event: 2
Issues event: 7
Watch event: 8
Issue comment event: 13
Push event: 9
Pull request event: 3

Last Year

Create event: 4
Release event: 2
Issues event: 7
Watch event: 8
Issue comment event: 13
Push event: 9
Pull request event: 3

Committers

Last synced: 9 months ago

All Time

Total Commits: 117
Total Committers: 4
Avg Commits per committer: 29.25
Development Distribution Score (DDS): 0.342

Past Year

Commits: 23
Committers: 1
Avg Commits per committer: 23.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
david	d**n@g**m	77
David Berenstein	d**n@p**m	34
Pepijn Boers	p**b@g**m	4
Boers	p**s@z**l	2

Committer Domains (Top 20 + Academic)

zilverenkruis.nl: 1 pandoraintelligence.com: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 44
Total pull requests: 10
Average time to close issues: 2 months
Average time to close pull requests: about 2 months
Total issue authors: 31
Total pull request authors: 6
Average comments per issue: 3.64
Average comments per pull request: 1.9
Merged pull requests: 6
Bot issues: 0
Bot pull requests: 1

Past Year

Issues: 3
Pull requests: 3
Average time to close issues: 23 days
Average time to close pull requests: 20 minutes
Issue authors: 2
Pull request authors: 1
Average comments per issue: 3.33
Average comments per pull request: 0.0
Merged pull requests: 3
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

davidberenstein1957 (4)
swageeth (3)
andremacola (3)
kbillesk (2)
nv78 (2)
koaning (2)
KikeVen (2)
RaiAmanRai (2)
saitej123 (2)
drkonafa (1)
nsankar (1)
dpicca (1)
espdev (1)
atefalvi (1)
jackrvaughan (1)

Pull Request Authors

davidberenstein1957 (5)
RobinRojowiec (2)
adelevie (1)
PepijnBoers (1)
Masboes (1)
dependabot[bot] (1)

Top Labels

Issue Labels

enhancement (6) bug (6) documentation (1)

Pull Request Labels

dependencies (2) hacktoberfest (1)

Packages

Total packages: 1
Total downloads:
- pypi 246 last-month

Total dependent packages: 1
Total dependent repositories: 1
Total versions: 33
Total maintainers: 2

pypi.org: classy-classification

Have you every struggled with needing a Spacy TextCategorizer but didn't have the time to train one from scratch? Classy Classification is the way to go!

Homepage: https://github.com/davidberenstein1957/classy-classification
Documentation: https://github.com/davidberenstein1957/classy-classification
License: MIT
Latest release: 1.0.2
published about 1 year ago

Versions: 33
Dependent Packages: 1
Dependent Repositories: 1
Downloads: 246 Last month

Rankings

Downloads: 4.2%

Dependent packages count: 4.8%

Stargazers count: 5.3%

Average: 9.2%

Forks count: 10.2%

Dependent repos count: 21.6%

Maintainers (2)

David.Berenstein mkirilov

Last synced: 6 months ago

classy-classification

Science Score: 54.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Classy Classification

Install

SetFit support

Quickstart

SpaCy embeddings

or import standalone

from classy_classification import ClassyClassifier

Output:

[{"furniture" : 0.21}, {"kitchen": 0.79}]

Sentence level classification

Output:

[[{"furniture" : 0.21}, {"kitchen": 0.79}]

Define random seed and verbosity

Multi-label classification

Output:

[{"furniture": 0.92}, {"kitchen": 0.91}]

Outlier detection

Output:

[{'inlier': 0.2926672385488411, 'outlier': 0.707332761451159}]

Output:

[{'furniture': 0, 'not_furniture': 1}]

Sentence-transfomer embeddings

Output:

[{"furniture": 0.21}, {"kitchen": 0.79}]

Hugginface zero-shot classifiers

Output:

[{"furniture": 0.21}, {"kitchen": 0.79}]

Credits

Inspiration Drawn From

Or buy me a coffee

Standalone usage without spaCy

overwrite training data

overwrite embedding model

overwrite SVC config

Save and load models

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: classy-classification

Rankings

Maintainers (2)

Dependencies