undouble
Python package undouble is to detect (near-)identical images.
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.1%) to scientific vocabulary
Keywords
Repository
Python package undouble is to detect (near-)identical images.
Basic Info
Statistics
- Stars: 51
- Watchers: 2
- Forks: 0
- Open Issues: 2
- Releases: 29
Topics
Metadata Files
README.md
undouble
The aim of undouble is to detect (near-)identical images. It works using a multi-step process of pre-processing the images (grayscaling, normalizing, and scaling), computing the image hash, and the grouping of images. A threshold of 0 will group images with an identical image hash. The results can easily be explored by the plotting
functionality and images can be moved with the move functionality. When moving images, the image in the group with the largest resolution will be copied, and all other images are moved to the undouble subdirectory. In case you want to cluster your images, I would recommend reading the blog and use the clustimage library.
The following steps are taken in the undouble library:
* Read recursively all images from directory with the specified extensions.
* Compute image hash.
* Group similar images.
* Move if desired.
⭐️ Star this repo if you like it ⭐️
Blogs
- Read the blog to get a structured overview how to detect duplicate images using image hash functions.
Documentation pages
On the documentation pages you can find detailed information about the working of the undouble with many examples.
Installation
It is advisable to create a new environment (e.g. with Conda).
bash
conda create -n env_undouble python=3.8
conda activate env_undouble
Install bnlearn from PyPI
bash
pip install undouble # new install
pip install -U undouble # update to latest version
Directly install from github source
bash
pip install git+https://github.com/erdogant/undouble
Import Undouble package
python
from undouble import Undouble
Examples:
Example: Grouping similar images of the flower dataset
Example: List all file names that are identifical
Example: Moving similar images in the flower dataset
```python
-------------------------------------------------
>You are at the point of physically moving files.
-------------------------------------------------
>[7] similar images are detected over [3] groups.
>[4] images will be moved to the [undouble] subdirectory.
>[3] images will be copied to the [undouble] subdirectory.
>[C]ontinue moving all files.
>[W]ait in each directory.
>[Q]uit
>Answer: w
```
Example: Plot the image hashes
Example: Three different imports
The input can be the following three types:
* Path to directory
* List of file locations
* Numpy array containing images
Example: Finding identical mnist digits
Citation
Please cite in your publications if this is useful for your research (see citation).
Maintainers
- Erdogan Taskesen, github: erdogant
Contribute
- All kinds of contributions are welcome!
- If you wish to buy me a Coffee for this work, it is very appreciated :)
Licence
See LICENSE for details.
Other interesting stuf
- https://github.com/JohannesBuchner/imagehash
- https://towardsdatascience.com/a-step-by-step-guide-for-clustering-images-4b45f9906128
Owner
- Name: Erdogan
- Login: erdogant
- Kind: user
- Location: Den Haag
- Website: https://erdogant.github.io/
- Repositories: 51
- Profile: https://github.com/erdogant
Machine Learning | Statistics | Bayesian | D3js | Visualizations
Citation (CITATION.cff)
# YAML 1.2
---
authors:
-
family-names: Taskesen
given-names: Erdogan
orcid: "https://orcid.org/0000-0002-3430-9618"
cff-version: "1.1.0"
date-released: 2022-01-01
keywords:
- "image hash"
- "duplicates"
- "image"
- "python"
license: "BSD"
message: "If you use this software, please cite it using these metadata."
repository-code: "https://erdogant.github.io/undouble"
title: "undouble - Python library to detect (near-)identical images."
version: "1.2.0"
...
GitHub Events
Total
- Create event: 11
- Issues event: 3
- Release event: 10
- Watch event: 4
- Issue comment event: 2
- Push event: 34
Last Year
- Create event: 11
- Issues event: 3
- Release event: 10
- Watch event: 4
- Issue comment event: 2
- Push event: 34
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 11
- Total pull requests: 0
- Average time to close issues: 8 months
- Average time to close pull requests: N/A
- Total issue authors: 8
- Total pull request authors: 0
- Average comments per issue: 4.36
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- jomosi (3)
- marmikp (2)
- bcaradima (1)
- Tyrannas (1)
- CodeByHarri (1)
- lazy-programm-er (1)
- daniel-at-hive (1)
- royinblr (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 118 last-month
- Total dependent packages: 0
- Total dependent repositories: 1
- Total versions: 29
- Total maintainers: 1
pypi.org: undouble
Undouble is a Python package to detect (near-)identical images.
- Homepage: https://erdogant.github.io/undouble
- Documentation: https://undouble.readthedocs.io/
- License: BSD 3-Clause Copyright (c) 2022, Erdogan Taskesen All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-
Latest release: 1.4.10
published 10 months ago
Rankings
Maintainers (1)
Dependencies
- pipinstallsphinx_rtd_theme *
- nbconvert * development
- pytest * development
- rst2pdf * development
- sphinx * development
- sphinx_rtd_theme * development
- sphinxcontrib-fulltoc * development
- spyder-kernels ==2.3. development
- clustimage *
- ismember *
- matplotlib *
- numpy *
- tqdm *
- clustimage *
- ismember *
- matplotlib *
- numpy *
- tqdm *
- actions/checkout v2 composite
- github/codeql-action/analyze v1 composite
- github/codeql-action/autobuild v1 composite
- github/codeql-action/init v1 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- folium *
- geopy *
- piexif *