Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.2%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: Language-Research-Technology
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Size: 130 MB
Statistics
  • Stars: 0
  • Watchers: 4
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 1 year ago · Last pushed about 1 year ago
Metadata Files
Readme License Citation

readme.md

Exploring Large Image Datasets with Image Embeddings

Binder

The purpose of this notebook is to provide options for exploring large image datasets, through use of K nearest neighbour graphs and other methods. It aims to be a base for further development in future.

This notebook ingests a zip file containing images, and returns a HTML file (giving options for visual exploration of the dataset), and provides some basic visualisations.

Acknowledgements:

  • Centre for Digital Cultures and Societies

    DCS LOGO

  • Language Data Commons of Australia

    DCS LOGO

Bibliography:

A. Radford et al. (2021). Learning Transferable Visual Models From Natural Language Supervision. OpenAI. https://arxiv.org/pdf/2103.00020

Brankica Bratić, Michael E. Houle, Vladimir Kurbalija, Vincent Oria, and Miloš Radovanović. (2018). NN-Descent on High-Dimensional Data. In WIMS’18: 8th International Conference on Web Intelligence, Mining and Semantics, June 25–27, 2018, Novi Sad, Serbia. ACM, New York, NY, USA, 8 pages. https://doi.org/10.1145/3227609.3227643

J. Burgess et al. (2021). Critical simulation as hybrid digital method for exploring the data operations and vernacular cultures of visual social media platforms. https://osf.io/preprints/socarxiv/2cwsu_v1

K Simonyan, A Zisserman. (2015). Very Deep Convolutional Neural Networks for Large Scale Image Recognition. ICLR 2015. https://doi.org/10.48550/arXiv.1409.1556

Owner

  • Name: UQ's Language Research Technology Projects
  • Login: Language-Research-Technology
  • Kind: organization
  • Location: Australia

UQ's Language Research Technology Projects

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Hames 
    given-names: Samuel C.
    orcid: https://orcid.org/0000-0002-1824-2361
  - family-names: Chong
    given-names: Jasper
title: "Exploring Large Image Datasets with Image Embeddings"
version: 1.0.0
identifiers:
  - type: 
    value: 
date-released: 2025-02-13

GitHub Events

Total
  • Push event: 1
Last Year
  • Push event: 1

Dependencies

requirements.in pypi
  • Pillow *
  • imgbeddings *
  • matplotlib *
  • networkx *
  • numpy *
  • pandas *
  • plotly *
  • pynndescent *
  • scikit_learn *
  • scipy *
  • tqdm *
requirements.txt pypi
  • anywidget ==0.9.13
  • certifi ==2022.12.7
  • charset-normalizer ==2.1.1
  • coloredlogs ==15.0.1
  • contourpy ==1.3.0
  • cycler ==0.12.1
  • filelock ==3.17.0
  • flatbuffers ==25.1.24
  • fonttools ==4.55.8
  • fsspec ==2025.2.0
  • huggingface-hub ==0.24.0
  • humanfriendly ==10.0
  • idna ==3.10
  • imgbeddings ==0.1.0
  • importlib-resources ==6.5.2
  • joblib ==1.4.2
  • kiwisolver ==1.4.7
  • llvmlite ==0.43.0
  • matplotlib ==3.9.4
  • mpmath ==1.3.0
  • narwhals ==1.24.2
  • networkx ==3.2.1
  • numba ==0.60.0
  • numpy ==2.0.2
  • onnxruntime ==1.19.2
  • packaging ==24.2
  • pandas ==2.2.3
  • pillow ==11.1.0
  • plotly ==6.0.0
  • protobuf ==5.29.3
  • pynndescent ==0.5.13
  • pyparsing ==3.2.1
  • python-dateutil ==2.9.0.post0
  • pytz ==2025.1
  • pyyaml ==6.0.2
  • regex ==2024.11.6
  • requests ==2.32.3
  • safetensors ==0.5.2
  • scikit-learn ==1.6.1
  • scipy ==1.13.1
  • seaborn ==0.13.2
  • six ==1.17.0
  • sympy ==1.13.1
  • threadpoolctl ==3.5.0
  • tokenizers ==0.21.0
  • tqdm ==4.67.1
  • transformers ==4.48.2
  • typing-extensions ==4.12.2
  • tzdata ==2025.1
  • urllib3 ==2.3.0
  • zipp ==3.21.0
torchrequirements.txt pypi
  • torch ==2.6.0
  • torchvision ==0.21.0