undouble

Python package undouble is to detect (near-)identical images.

https://github.com/erdogant/undouble

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.1%) to scientific vocabulary

Keywords

ahash dhash doubles-detector hash image image-recognition image-similarity phash photos wavelet
Last synced: 6 months ago · JSON representation ·

Repository

Python package undouble is to detect (near-)identical images.

Basic Info
  • Host: GitHub
  • Owner: erdogant
  • License: bsd-3-clause
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 36.1 MB
Statistics
  • Stars: 51
  • Watchers: 2
  • Forks: 0
  • Open Issues: 2
  • Releases: 29
Topics
ahash dhash doubles-detector hash image image-recognition image-similarity phash photos wavelet
Created about 4 years ago · Last pushed 6 months ago
Metadata Files
Readme Funding License Citation

README.md

undouble

Python PyPI Version License Github Forks GitHub Open Issues Project Status Sphinx Downloads Downloads Sphinx <!---Open In Colab--> <!---BuyMeCoffee--> <!---Coffee-->

The aim of undouble is to detect (near-)identical images. It works using a multi-step process of pre-processing the images (grayscaling, normalizing, and scaling), computing the image hash, and the grouping of images. A threshold of 0 will group images with an identical image hash. The results can easily be explored by the plotting functionality and images can be moved with the move functionality. When moving images, the image in the group with the largest resolution will be copied, and all other images are moved to the undouble subdirectory. In case you want to cluster your images, I would recommend reading the blog and use the clustimage library.

The following steps are taken in the undouble library: * Read recursively all images from directory with the specified extensions. * Compute image hash. * Group similar images. * Move if desired.

⭐️ Star this repo if you like it ⭐️

Blogs

Documentation pages

On the documentation pages you can find detailed information about the working of the undouble with many examples.

Installation

It is advisable to create a new environment (e.g. with Conda).

bash conda create -n env_undouble python=3.8 conda activate env_undouble

Install bnlearn from PyPI

bash pip install undouble # new install pip install -U undouble # update to latest version

Directly install from github source

bash pip install git+https://github.com/erdogant/undouble

Import Undouble package

python from undouble import Undouble


Examples:

Example: Grouping similar images of the flower dataset

Example: List all file names that are identifical
Example: Moving similar images in the flower dataset

```python

-------------------------------------------------

>You are at the point of physically moving files.

-------------------------------------------------

>[7] similar images are detected over [3] groups.

>[4] images will be moved to the [undouble] subdirectory.

>[3] images will be copied to the [undouble] subdirectory.

>[C]ontinue moving all files.

>[W]ait in each directory.

>[Q]uit

>Answer: w

```

Example: Plot the image hashes

Example: Three different imports

The input can be the following three types:

* Path to directory
* List of file locations
* Numpy array containing images
Example: Finding identical mnist digits


Citation

Please cite in your publications if this is useful for your research (see citation).

Maintainers

Contribute

  • All kinds of contributions are welcome!
  • If you wish to buy me a Coffee for this work, it is very appreciated :)

Licence

See LICENSE for details.

Other interesting stuf

  • https://github.com/JohannesBuchner/imagehash
  • https://towardsdatascience.com/a-step-by-step-guide-for-clustering-images-4b45f9906128

Owner

  • Name: Erdogan
  • Login: erdogant
  • Kind: user
  • Location: Den Haag

Machine Learning | Statistics | Bayesian | D3js | Visualizations

Citation (CITATION.cff)

# YAML 1.2
---
authors: 
  -
    family-names: Taskesen
    given-names: Erdogan
    orcid: "https://orcid.org/0000-0002-3430-9618"
cff-version: "1.1.0"
date-released: 2022-01-01
keywords: 
  - "image hash"
  - "duplicates"
  - "image"
  - "python"
license: "BSD"
message: "If you use this software, please cite it using these metadata."
repository-code: "https://erdogant.github.io/undouble"
title: "undouble - Python library to detect (near-)identical images."
version: "1.2.0"
...

GitHub Events

Total
  • Create event: 11
  • Issues event: 3
  • Release event: 10
  • Watch event: 4
  • Issue comment event: 2
  • Push event: 34
Last Year
  • Create event: 11
  • Issues event: 3
  • Release event: 10
  • Watch event: 4
  • Issue comment event: 2
  • Push event: 34

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 227
  • Total Committers: 1
  • Avg Commits per committer: 227.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 63
  • Committers: 1
  • Avg Commits per committer: 63.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
erdogant e****t@g****m 227

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 11
  • Total pull requests: 0
  • Average time to close issues: 8 months
  • Average time to close pull requests: N/A
  • Total issue authors: 8
  • Total pull request authors: 0
  • Average comments per issue: 4.36
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • jomosi (3)
  • marmikp (2)
  • bcaradima (1)
  • Tyrannas (1)
  • CodeByHarri (1)
  • lazy-programm-er (1)
  • daniel-at-hive (1)
  • royinblr (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 118 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 29
  • Total maintainers: 1
pypi.org: undouble

Undouble is a Python package to detect (near-)identical images.

  • Homepage: https://erdogant.github.io/undouble
  • Documentation: https://undouble.readthedocs.io/
  • License: BSD 3-Clause Copyright (c) 2022, Erdogan Taskesen All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  • Latest release: 1.4.10
    published 10 months ago
  • Versions: 29
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 118 Last month
Rankings
Dependent packages count: 10.1%
Stargazers count: 10.5%
Downloads: 17.1%
Average: 17.8%
Dependent repos count: 21.6%
Forks count: 29.8%
Maintainers (1)
Last synced: 6 months ago

Dependencies

docs/source/requirements.txt pypi
  • pipinstallsphinx_rtd_theme *
requirements-dev.txt pypi
  • nbconvert * development
  • pytest * development
  • rst2pdf * development
  • sphinx * development
  • sphinx_rtd_theme * development
  • sphinxcontrib-fulltoc * development
  • spyder-kernels ==2.3. development
requirements.txt pypi
  • clustimage *
  • ismember *
  • matplotlib *
  • numpy *
  • tqdm *
setup.py pypi
  • clustimage *
  • ismember *
  • matplotlib *
  • numpy *
  • tqdm *
.github/workflows/codeql-analysis.yml actions
  • actions/checkout v2 composite
  • github/codeql-action/analyze v1 composite
  • github/codeql-action/autobuild v1 composite
  • github/codeql-action/init v1 composite
.github/workflows/pytest.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
requirements_additional.txt pypi
  • folium *
  • geopy *
  • piexif *