Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.5%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: rohankumardubey
  • License: bsd-3-clause
  • Language: Python
  • Default Branch: main
  • Size: 32.2 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 4 years ago · Last pushed over 4 years ago
Metadata Files
Readme License Citation

README.md

undouble

Python PyPI Version License Github Forks GitHub Open Issues Project Status Sphinx Downloads Downloads BuyMeCoffee <!---Coffee-->

Python package undouble is to detect (near-)identical images.

The aim of undouble is to detect (near-)identical images. It works using a multi-step process of pre-processing the images (grayscaling, normalizing, and scaling), computing the image hash, and the grouping of images. A threshold of 0 will group images with an identical image hash. The results can easily be explored by the plotting functionality and images can be moved with the move functionality. When moving images, the image in the group with the largest resolution will be copied, and all other images are moved to the "undouble" subdirectory. In case you want to cluster your images, I would recommend reading the blog and use the clustimage library.

The following steps are taken in the undouble library: * 1. Read recursively all images from directory with the specified extensions. * 2. Compute image hash. * 3. Group similar images. * 4. Move if desired.

Installation

  • Install undouble from PyPI (recommended). undouble is compatible with Python 3.6+ and runs on Linux, MacOS X and Windows.
  • A new environment can be created as following:

bash conda create -n env_undouble python=3.8 conda activate env_undouble

bash pip install undouble # new install pip install -U undouble # update to latest version

  • Alternatively, you can install from the GitHub source: ```bash # Directly install from github source pip install -e git://github.com/erdogant/undouble.git@0.1.0#egg=master pip install git+https://github.com/erdogant/undouble#egg=master pip install git+https://github.com/erdogant/undouble

By cloning

git clone https://github.com/erdogant/undouble.git cd undouble pip install -U . ```

Import undouble package

python from undouble import Undouble

Example:

```python

Import library

from undouble import Undouble

Init with default settings

model = Undouble(method='phash', hash_size=8)

Import example data

targetdir = model.import_example(data='flowers')

Importing the files files from disk, cleaning and pre-processing

model.import_data(targetdir)

Compute image-hash

model.fit_transform()

Find images with image-hash <= threshold

model.group(threshold=0)

[undouble] >INFO> Store examples at [./undouble/data]..

[undouble] >INFO> Downloading [flowers] dataset from github source..

[undouble] >INFO> Extracting files..

[undouble] >INFO> [214] files are collected recursively from path: [./undouble/data/flower_images]

[undouble] >INFO> Reading and checking images.

[undouble] >INFO> Reading and checking images.

100%|| 214/214 [00:02<00:00, 96.56it/s]

[undouble] >INFO> Extracting features using method: [phash]

100%|| 214/214 [00:00<00:00, 3579.14it/s]

[undouble] >INFO> Build adjacency matrix with phash differences.

[undouble] >INFO> Extracted features using [phash]: (214, 214)

100%|| 214/214 [00:00<00:00, 129241.33it/s]

[undouble] >INFO> Number of groups with similar images detected: 3

[undouble] >INFO> [3] groups are detected for [7] images.

Plot the images

model.plot()

Move the images

model.move()

```

References

  • https://github.com/erdogant/undouble

Citation

Please cite in your publications if this is useful for your research (see citation).

Maintainers

Contribute

  • All kinds of contributions are welcome!
  • If you wish to buy me a Coffee for this work, it is very appreciated :)

Licence

See LICENSE for details.

Other interesting stuf

  • https://ourcodeworld.com/articles/read/1006/how-to-determine-whether-2-images-are-equal-or-not-with-the-perceptual-hash-in-python
  • https://www.pyimagesearch.com/2017/11/27/image-hashing-opencv-python/
  • https://github.com/JohannesBuchner/imagehash
  • https://ourcodeworld.com/articles/read/1006/how-to-determine-whether-2-images-are-equal-or-not-with-the-perceptual-hash-in-python
  • https://stackoverflow.com/questions/64994057/python-image-hashing
  • https://towardsdatascience.com/how-to-cluster-images-based-on-visual-similarity-cd6e7209fe34

Owner

  • Name: Rohan Dubey
  • Login: rohankumardubey
  • Kind: user
  • Location: India
  • Company: Pokerstars

if (brain != empty) { keepCoding(); } else { orderCoffee(); }

GitHub Events

Total
Last Year