flyswot

Command Line Interface for running 🤗 Transformers Image Classification locally

https://github.com/davanstrien/flyswot

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.5%) to scientific vocabulary

Keywords

cli command-line-tool computer-vision glam huggingface-transformers image-classification python

Last synced: 6 months ago · JSON representation

Repository

Command Line Interface for running 🤗 Transformers Image Classification locally

Basic Info

Host: GitHub
Owner: davanstrien
License: mit
Language: Python
Default Branch: main
Homepage: https://flyswot.readthedocs.io/en/latest/
Size: 81 MB

Statistics

Stars: 19
Watchers: 2
Forks: 1
Open Issues: 38
Releases: 32

Topics

cli command-line-tool computer-vision glam huggingface-transformers image-classification python

Created almost 5 years ago · Last pushed 10 months ago

Metadata Files

Readme Contributing License Code of conduct Citation

flyswot

flyswot logo

Disclaimer

flyswot is a work in progress. Things may not work and behaviour may change in the future!

tl;dr

flyswot is a Command Line Tool which allows you to run Hugging Face Transformers image classification models available via the Hugging Face Hub against a directory of images. It returns a CSV report containing the models predictions.

console flyswot predict directory image_directory csv_reports --model_id flyswot/convnext-tiny-224_flyswot

Features

Currently flyswot supports:

automatic downloading of models from the Hugging Face Hub
UNIX style search patterns for matching images to predict against
filtering by image extension
a CSV output report containing the paths to the input images, the predicted label and the models confidence for that prediction.
a summary 'report' on the command line providing a high level summary of the predictions made

Why?

What is the point of this? Why not just write a Python script? This seems like a terrible idea...

flyswot was originally for a project with the Heritage Made Digital team at the British Library. In this project we wanted to detect 'fake flysheets'. We designed how flyswot works with this particular use case in mind.

There are a few main reasons why we decided a command line tool was the best approaches to utilising the models we were developing.

The digitised images we are working with can be very large
The images we are working with are often subject to copyright
Inference speed isn't a big priority

Since we're using computer vision for assisting rather than automation we felt a CLI was a useful interface for interacting with the models.

Installation

You can install flyswot via pip from PyPI:

console $ pip install flyswot

This will install the latest release version of flyswot

Detailed Installation Guide

Installation provides a more detailed guide to installing flyswot. This more detailed guide is aimed at users of flyswot who may be less familiar with Python.

Usage

You can see help for flyswot using flyswot --help

```

Usage: flyswot [OPTIONS] COMMAND [ARGS]...

Commands model flyswot commands for interacting with models
predict flyswot commands for making predictions

```

Making predictions

You can get help for the prediction functionality for flyswot as follows:

```

Usage: flyswot predict directory [OPTIONS] DIRECTORY CSVSAVEDIR

Predicts against all images stored under DIRECTORY which match PATTERN in the filename. By default searches for filenames containing 'fs'. Creates a CSV report saved to csv_save_dir

Arguments * directory PATH Directory to start searching for images from [default: None] [required]
* csvsavedir PATH Directory used to store the csv report [default: None] [required]

Options --model-id TEXT The model flyswot should use for making predictions [default: flyswot/convnext-tiny-224_flyswot]
--pattern TEXT Pattern used to filter image filenames [default: None]
--bs INTEGER Batch Size [default: 16]
--image-formats TEXT Image format(s) to check [default: .tif]
--help Show this message and exit.

```

To run predictions against a directory of images:

console $ flyswot predict directory manuscripts_folder .

flyswot will search inside the manuscripts_folder looking for image files.
By default it will look for files that contain FS in the filename since these are files which have been labelled as being "end flysheets" or "front flysheets"
Once it has found all the files labelled as flysheet it will then run a computer vision model against these images to see if they are labelled correctly i.e. if it is indeed a flysheet or something else.
flyswot will save a csv report containing the paths to the image, the directory the image is stored in, the label, and the confidence for that prediction.

Changing the model

You can also tell flyswot to use a different image classification model via the model-id parameter. For example to use the microsoft/dit-base-finetuned-rvlcdip model we could run:

```console flyswot predict directory Documents/DS/hugit-cli/fs Desktop/ --model-id microsoft/dit-base-finetuned-rvlcdip

```

This will download the latest available version of this model from the Hugging Face Hub and predict against the specified filenames. Note under the hood flyswot uses the Hugging Face transformers pipelines for inference. The model you specific must therefore be compatible with this pipeline.

Detailed Usage Guide

This section provides additional guidance on the usage of flyswot. This is primarily aimed at HMD users of flyswot.

How flyswot searches for images

flyswot is currently intended to identify images which have an incorrect label associated with them. In particular it is currently intended to identify "fake" flysheets. These images have fs as part of their filename so we can tell flyswot to use this pattern in the filename to identify images which should be checked using the computer vision model. This can be changed if you also want to match other filename patterns.

Since these images of concern will often be inside a directory structure flyswot will look in sub-folders from the input folder for images which contain fs in the name. For example in the following folder structure:

console Collection/ item1/ add_ms_9403_fbspi.tif add_ms_9403_fse001r.tif add_ms_9403_fse001v.tif item2/ sloane_ms_116_fblefr.tif sloane_ms_116_fbspi.tif sloane_ms_116_fse004r.tif

All of the files which have fs in the filname will be check but files which don't contains fs such as add_ms_9403_fbspi.tif will be ignored since these aren't labelled as flysheets.

Running flyswot against a directory of images

To run flyswot against a directory of images you need to give it the path to that directory/folder. There are different ways you could do this. The following is suggested for people who are not very familiar (yet ) with terminal interfaces.

Identify the folder you want to flyswot to check for "fake" flysheets. If you are using flyswot for the first time it may make sense to choose a folder which doesn't contain a huge number of collection items so you don't have to wait to long for flyswot to finish running. Once you have found a directory you want to predict against copy the path. This path should be the full path to the item.

For example something that looks like:

console \\ad\collections\hmd\excitingcollection\excitingsubcollection\

This will be the folder from which flyswot starts looking.

When you activated your conda environment in a terminal, you were likely 'inside' your user directory. Since we need to specify a place for flyswot to store the CSV report, we'll move to a better place to store that output; your Desktop folder. To do we can navigate using the command:

console $ chdir desktop

if you are using Mac, Linux or have GitBash installed you should instead run:

console $ cd Desktop

This will take you to your Desktop. We'll now run flyswot. As with many other command line tools, flyswot has commands and sub-commands. We are interested in the predict command. This includes two sub-commands: predict-image and directory. We will mostly want to predict directories. To do this we use the following approach. Since we only care about checking things with fs in the filename we can specify this as our pattern.

console $ flyswot predict directory input_directory output_directory --pattern fs

The input directory is the folder containing our images and the output directory is where we want to save our CSV report. Using the folder we previously identified this would look like:

console $ flyswot predict directory "\\ad\collections\hmd\excitingcollection\excitingsubcollection\" .

We can use . to indicate we want the CSV report to be saved to the current directory (in this case the Deskop directory). Also notice that there are quotation marks "" around the path. This is used to make sure that any spaces in the path are escaped.

Once you run this command you should see some progress reported by flyswot, including a progress bar that shows how many of the images flyswot has predicted against.

When flyswot has finished you will have a CSV 'report' which contains the path to the image, the predicted label and the confidence for that prediction.

License

Distributed under the terms of the MIT license, flyswot is free and open source software.

Issues

If you encounter any problems, please file an issue along with a detailed description.

Credits

This project was generated from @cjolowicz's Hypermodern Python Cookiecutter template.

Owner

Name: Daniel van Strien
Login: davanstrien
Kind: user
Location: United Kingdom
Company: Hugging Face

Website: https://danielvanstrien.xyz/
Twitter: vanstriendaniel
Repositories: 172
Profile: https://github.com/davanstrien

Machine Learning Librarian @huggingface

GitHub Events

Total

Watch event: 1
Delete event: 26
Issue comment event: 58
Pull request review event: 27
Pull request event: 59
Create event: 36

Last Year

Watch event: 1
Delete event: 26
Issue comment event: 58
Pull request review event: 27
Pull request event: 59
Create event: 36

Committers

Last synced: almost 3 years ago

All Time

Total Commits: 787
Total Committers: 4
Avg Commits per committer: 196.75
Development Distribution Score (DDS): 0.281

Top Committers

Name	Email	Commits
dependabot[bot]	4**]@u**m	566
Daniel van Strien	d**n@u**m	171
davanstrien	d**n@g**m	49
sourcery-ai[bot]	5**]@u**m	1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 0
Total pull requests: 335
Average time to close issues: N/A
Average time to close pull requests: 17 days
Total issue authors: 0
Total pull request authors: 2
Average comments per issue: 0
Average comments per pull request: 0.65
Merged pull requests: 171
Bot issues: 0
Bot pull requests: 322

Past Year

Issues: 0
Pull requests: 34
Average time to close issues: N/A
Average time to close pull requests: 21 days
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 1.06
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 34

View more stats

Top Authors

Issue Authors

Pull Request Authors

dependabot[bot] (329)
davanstrien (9)

Top Labels

Issue Labels

Pull Request Labels

dependencies (331) python (303) github_actions (25) ci (2)

Packages

Total packages: 1
Total downloads:
- pypi 54 last-month

Total dependent packages: 0
Total dependent repositories: 1
Total versions: 32
Total maintainers: 1

pypi.org: flyswot

flyswot

Homepage: https://github.com/davanstrien/flyswot
Documentation: https://flyswot.readthedocs.io
License: MIT
Latest release: 0.3.15
published over 2 years ago

Versions: 32
Dependent Packages: 0
Dependent Repositories: 1
Downloads: 54 Last month

Rankings

Dependent packages count: 7.4%

Downloads: 13.5%

Stargazers count: 14.2%

Average: 16.0%

Dependent repos count: 22.2%

Forks count: 22.8%

Maintainers (1)

davanstrien

Last synced: 6 months ago

Dependencies

docs/requirements.txt pypi

furo ==2022.6.21
myst_parser ==0.18.0
sphinx ==5.1.1
sphinx-click ==4.3.0

poetry.lock pypi

118 dependencies

pyproject.toml pypi

Pygments ^2.12.0 develop
black ^22.6 develop
cogapp ^3.3.0 develop
coverage ^6.4 develop
darglint ^1.8.1 develop
flake8 ^4.0.1 develop
flake8-bandit ^3.0.0 develop
flake8-bugbear ^22.7.1 develop
flake8-docstrings ^1.5.0 develop
flake8-rst-docstrings ^0.2.7 develop
furo ^2022.6.21 develop
hypothesis ^6.53.0 develop
memory-profiler ^0.60.0 develop
mypy ^0.971 develop
myst-parser ^0.18.0 develop
onnxruntime ^1.12.0 develop
pep8-naming ^0.13.1 develop
pre-commit ^2.20.0 develop
pre-commit-hooks ^4.3.0 develop
pytest ^7.1.2 develop
pytest-datafiles ^2.0 develop
reorder-python-imports ^3.8.2 develop
safety ^2.1.1 develop
sphinx ^5.1.1 develop
sphinx-autobuild ^2021.3.14 develop
sphinx-click ^4.3.0 develop
sphinx-rtd-theme ^1.0.0 develop
xdoctest ^1.0.1 develop
Pillow >=8,<10
huggingface-hub >=0.2.1,<0.9.0
numpy ^1.20
python ^3.8.0
rich >=10.1,<13.0
toolz >=0.11.1,<0.13.0
transformers ^4.16.2
typer >=0.3.2,<0.7.0
typing_extensions >=3.10,<5.0

.github/workflows/labeler.yml actions

actions/checkout v3 composite
crazy-max/ghaction-github-labeler v4.1.0 composite

.github/workflows/release.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite
pypa/gh-action-pypi-publish v1.6.4 composite
release-drafter/release-drafter v5.22.0 composite
salsify/action-detect-and-tag-new-version v2.0.3 composite

.github/workflows/tests.yml actions

actions/cache v3 composite
actions/checkout v3 composite
actions/download-artifact v3 composite
actions/setup-python v4 composite
actions/upload-artifact v3 composite
codecov/codecov-action v3.1.1 composite

flyswot

Science Score: 26.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

flyswot

Disclaimer

tl;dr

Features

Why?

Installation

Detailed Installation Guide

Usage

Making predictions

Changing the model

Detailed Usage Guide

How flyswot searches for images

Running flyswot against a directory of images

License

Issues

Credits

Owner

GitHub Events

Total

Last Year

Committers

All Time

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: flyswot

Rankings

Maintainers (1)

Dependencies