mini-webvision

Creates Mini-WebVision the Dataset for pytorch dataloader.

https://github.com/sangamesh-kodge/mini-webvision

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (4.1%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Creates Mini-WebVision the Dataset for pytorch dataloader.

Basic Info
Statistics
  • Stars: 2
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 2 years ago · Last pushed about 2 years ago
Metadata Files
Readme License Citation

Readme.md

Mini-WebVision

This project preprocess the Google images partition of WebVision 1.0 Dataset to obtain Mini-WebVision dataset and gives a directory structure ImageNet1k dataset.

Mini-WebVision - contains about 61K Google images on the first 50 classes from the WebVision dataset. - Number of train images - 61234 - Number of val images - 2500 (50 per class)

Below is the final directory structure for this project: Mini-WebVision train | nxxxxxxxx | nxxxxxxxx | ... | val | nxxxxxxxx | nxxxxxxxx | ... | info | xxxx | xxxx | ... | create_MiniWebVision_as_ImageNet.sh helper.py Readme.md Citation.cff LICENSE

Instructions

  1. Clone this repository
  2. Navigate to the root of this project
  3. Run the following command in your terminal/command prompt:

    bash sh create_MiniWebVision_as_ImageNet.sh

Expected Terminal logs

``` (base) Mini-WebVision >sh createMiniWebVisionasImageNet.sh --2024-02-07 11:04:19-- https://data.vision.ee.ethz.ch/cvl/webvision/googleresized256.tar Resolving data.vision.ee.ethz.ch (data.vision.ee.ethz.ch)... 129.132.52.178, 2001:67c:10ec:36c2::178 Connecting to data.vision.ee.ethz.ch (data.vision.ee.ethz.ch)|129.132.52.178|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 16980316160 (16G) [application/x-tar] Saving to: googleresized_256.tar

100%[================================================================================>] 16,980,316,160 16.2MB/s in 11m 59s

2024-02-07 11:16:19 (22.5 MB/s) - googleresized256.tar saved [16980316160/16980316160]

--2024-02-07 11:16:19-- https://data.vision.ee.ethz.ch/cvl/webvision/valimages256.tar Resolving data.vision.ee.ethz.ch (data.vision.ee.ethz.ch)... 129.132.52.178, 2001:67c:10ec:36c2::178 Connecting to data.vision.ee.ethz.ch (data.vision.ee.ethz.ch)|129.132.52.178|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 873574400 (833M) [application/x-tar] Saving to: valimages256.tar

100%[===================================================================================>] 873,574,400 24.2MB/s in 35s

2024-02-07 11:16:55 (23.7 MB/s) - valimages256.tar saved [873574400/873574400]

--2024-02-07 11:16:55-- https://data.vision.ee.ethz.ch/cvl/webvision/info.tar Resolving data.vision.ee.ethz.ch (data.vision.ee.ethz.ch)... 129.132.52.178, 2001:67c:10ec:36c2::178 Connecting to data.vision.ee.ethz.ch (data.vision.ee.ethz.ch)|129.132.52.178|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 190914560 (182M) [application/x-tar] Saving to: info.tar

100%[===================================================================================>] 190,914,560 24.6MB/s in 8.3s

2024-02-07 11:17:04 (21.9 MB/s) - info.tar saved [190914560/190914560]


Creating directory structure similar to ImageNet for training dataset


Creating directory structure similar to ImageNet for val dataset


Removing Redundant files.


Mini-WebVision Dataset Processed!

```

Conclusion

The project preprocess the Google images partition of WebVision 1.0 Dataset to obtain Mini_WebVision dataset and gives a directory structure similar to ImageNet. The script automates the preprocessing and provides a directory structure for the Google partition, similar to ImageNet.

Source Code

This repository is developed over github codebase for preprocess Google partion of WebVision 1.0 found at WebVision1.0-Google

Citation

Kindly cite the repository if you use the code. Thanks!

APA

Kodge, S. (2024). MiniWebVision [Computer software]. https://github.com/sangamesh-kodge/Mini-WebVision

Bibtex

@software{Kodge_MiniWebVision_2024, author = {Kodge, Sangamesh}, month = feb, title = {{MiniWebVision}}, url = {https://github.com/sangamesh-kodge/Mini-WebVision}, year = {2024} }

Owner

  • Login: sangamesh-kodge
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Kodge"
  given-names: "Sangamesh"
  orcid: "https://orcid.org/0000-0001-9713-5400"
title: "MiniWebVision"
date-released: 2024-2-7
url: "https://github.com/sangamesh-kodge/Mini-WebVision"

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1