image-dataset-explorer
https://github.com/language-research-technology/image-dataset-explorer
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (7.2%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: Language-Research-Technology
- License: apache-2.0
- Language: Python
- Default Branch: main
- Size: 130 MB
Statistics
- Stars: 0
- Watchers: 4
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
readme.md
Exploring Large Image Datasets with Image Embeddings
The purpose of this notebook is to provide options for exploring large image datasets, through use of K nearest neighbour graphs and other methods. It aims to be a base for further development in future.
This notebook ingests a zip file containing images, and returns a HTML file (giving options for visual exploration of the dataset), and provides some basic visualisations.
Acknowledgements:
Centre for Digital Cultures and Societies
Language Data Commons of Australia
Bibliography:
A. Radford et al. (2021). Learning Transferable Visual Models From Natural Language Supervision. OpenAI. https://arxiv.org/pdf/2103.00020
Brankica Bratić, Michael E. Houle, Vladimir Kurbalija, Vincent Oria, and Miloš Radovanović. (2018). NN-Descent on High-Dimensional Data. In WIMS’18: 8th International Conference on Web Intelligence, Mining and Semantics, June 25–27, 2018, Novi Sad, Serbia. ACM, New York, NY, USA, 8 pages. https://doi.org/10.1145/3227609.3227643
J. Burgess et al. (2021). Critical simulation as hybrid digital method for exploring the data operations and vernacular cultures of visual social media platforms. https://osf.io/preprints/socarxiv/2cwsu_v1
K Simonyan, A Zisserman. (2015). Very Deep Convolutional Neural Networks for Large Scale Image Recognition. ICLR 2015. https://doi.org/10.48550/arXiv.1409.1556
Owner
- Name: UQ's Language Research Technology Projects
- Login: Language-Research-Technology
- Kind: organization
- Location: Australia
- Twitter: LDaCA_Program
- Repositories: 46
- Profile: https://github.com/Language-Research-Technology
UQ's Language Research Technology Projects
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Hames
given-names: Samuel C.
orcid: https://orcid.org/0000-0002-1824-2361
- family-names: Chong
given-names: Jasper
title: "Exploring Large Image Datasets with Image Embeddings"
version: 1.0.0
identifiers:
- type:
value:
date-released: 2025-02-13
GitHub Events
Total
- Push event: 1
Last Year
- Push event: 1
Dependencies
- Pillow *
- imgbeddings *
- matplotlib *
- networkx *
- numpy *
- pandas *
- plotly *
- pynndescent *
- scikit_learn *
- scipy *
- tqdm *
- anywidget ==0.9.13
- certifi ==2022.12.7
- charset-normalizer ==2.1.1
- coloredlogs ==15.0.1
- contourpy ==1.3.0
- cycler ==0.12.1
- filelock ==3.17.0
- flatbuffers ==25.1.24
- fonttools ==4.55.8
- fsspec ==2025.2.0
- huggingface-hub ==0.24.0
- humanfriendly ==10.0
- idna ==3.10
- imgbeddings ==0.1.0
- importlib-resources ==6.5.2
- joblib ==1.4.2
- kiwisolver ==1.4.7
- llvmlite ==0.43.0
- matplotlib ==3.9.4
- mpmath ==1.3.0
- narwhals ==1.24.2
- networkx ==3.2.1
- numba ==0.60.0
- numpy ==2.0.2
- onnxruntime ==1.19.2
- packaging ==24.2
- pandas ==2.2.3
- pillow ==11.1.0
- plotly ==6.0.0
- protobuf ==5.29.3
- pynndescent ==0.5.13
- pyparsing ==3.2.1
- python-dateutil ==2.9.0.post0
- pytz ==2025.1
- pyyaml ==6.0.2
- regex ==2024.11.6
- requests ==2.32.3
- safetensors ==0.5.2
- scikit-learn ==1.6.1
- scipy ==1.13.1
- seaborn ==0.13.2
- six ==1.17.0
- sympy ==1.13.1
- threadpoolctl ==3.5.0
- tokenizers ==0.21.0
- tqdm ==4.67.1
- transformers ==4.48.2
- typing-extensions ==4.12.2
- tzdata ==2025.1
- urllib3 ==2.3.0
- zipp ==3.21.0
- torch ==2.6.0
- torchvision ==0.21.0