https://github.com/agamiko/waste-datasets-review

List of image datasets with any kind of litter, garbage, waste and trash

https://github.com/agamiko/waste-datasets-review

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: researchgate.net, sciencedirect.com
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.6%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

List of image datasets with any kind of litter, garbage, waste and trash

Basic Info
  • Host: GitHub
  • Owner: AgaMiko
  • Default Branch: main
  • Size: 15.8 MB
Statistics
  • Stars: 295
  • Watchers: 5
  • Forks: 42
  • Open Issues: 1
  • Releases: 0
Created over 5 years ago · Last pushed about 1 year ago
Metadata Files
Readme

README.md

Waste datasets review

List of datasets with any kind of litter, garbage, waste and trash. Created during the detectwaste.ml project

Today, more than 300 million tons of plastic are produced annually. Plastic is everywhere and we constantly use it in our daily life.

The idea of detect waste project is to use Artificial Intelligence to detect plastic waste in the environment. Our solution will be applicable for video and photography. Our goal is to use AI for Good.

Visit majsylw/litter-detection-review to see broader review of papers, projects and other resources concering the problem of litter in an environment.

Contributing

Feel free to add issue with short description of new dataset or create a pull request - add the new dataset to the table or fill missing description.

Summary

| Name | No. categories | No. subcategories | No. images | Annotation | Comment | Website | License | Description | |------------------------------------ |---------------- |-------------------------------- |------------ |--------------------------- |-------------------------------------- |--------------------------------------------------------------------- |-------- | -------------------- | | TrashCan 1.0 | 3 | 34 | 7 212 | Instance-Segmentation | Underwater images | website | Free for academic teaching/research use, must obtain JAMSTEC permission for commercial use. | :heavycheckmark: | | Trash-ICRA19 | 3 | 34 | 5 700 | Detection | Underwater images | website | Free for academic teaching/research use, must obtain JAMSTEC permission for commercial use. | :heavycheckmark: | | TACO | 28 | 60 | 1 500 | Segmentation | Waste in the wild | website | MIT license | :heavycheckmark: | | TACO bboxes | 7 | 60 | WIP | Detection | Waste in the wild | WIP | ? | :heavycheckmark: | | UAVVaste | 1 | - | 772 | Segmentation | Drone dataset | github | Apache license | :heavycheckmark: | | Trashnet | 6 | - | 2 527 | Classification | Clear background | github | MIT license | :heavycheckmark: | | WaDaBa | 8 | color,size, shape, or material | 4 000 | Classification | Plastic dataset, clear background | website | ? | :heavycheckmark: | | GLASSENSE-VISION | 7 | 136 | 2 000 | Classification | Home-supplies, clear background | website | ? | :heavycheckmark: | | Waste Classification data | 2 | - | ~25 000 | Classification | Scraped from google search | kaggle | CC BY-SA 4.0 | :heavycheckmark: | | Waste Classification Data v2 | 3 | - | ~27 500 | Classification | Scraped from google search | kaggle | CC BY-SA 4.0 | :heavycheckmark: | | Waste Images from Sushi Restaurant | 16 | - | 500 | Classification | Clear background | kaggle | Database: Open Database, Contents: © Original Authors | :heavycheckmark: | | Open litter map | 11 | 187 | > 100k | Multilabel classification | Waste in the wild | website | ? | :heavycheckmark: | | Litter | 24 | size, shape, or material | ~14 000 | Detection | Waste in the wild, paid license | website | ? | :heavycheckmark: | | Drinking Waste Classification | 4 | - | 9640 | Detection | Clear background, (cans and bottles) | kaggle | CC0: Public Domain | :heavycheckmark: | | wastepictures | 34 | - | ~24 000 | Classification | Scraped from google search | kaggle | Unknown | :heavycheckmark: | | spotgarbage | 3 | - | ~2 400 | Classification | Scraped from Bing search | kaggle
github | CC0: Public Domain | :heavy
checkmark: | | DeepSeaWaste | 5 | - | 3 055 | Classification | Underwater images | kaggle | Unknown | :heavycheckmark: | | MJU-Waste v1.0 | 1 | - | 2475 | Segmentation | Plain background, indoor RGBD images | github | MIT license | :heavycheckmark: | | Domestic Trash Dataset | 10 | - | > 9000 | Classification/Detection | Waste inn the wild, paid license, 250 images for free | github | ? | :heavycheckmark: | | Cigarette butt dataset | 1 | - | 2200 | Detection | Waste inn the wild, synthetic images | website | Non-Commercial, Educational License Agreement | :heavycheckmark: | | TrashBox | 7 | 25 | 17785 | Classification/Detection | Scraped from web | github | ? | :heavycheckmark: | | PortlandStateSingh | 5 | - | 11500 | Classification/Detection | Original photos | website | ? | | | TIDY | 9 | - | 304 | Classification | Original photos | github | MIT license | | | Garbage Dataset (V2) | 10 | - | 19 762 | Classification | Household waste. Used in paper "Managing Household Waste Through Transfer Learning" | kaggle | ? | **:heavycheckmark:** | | RealWaste | 9 | - | 4 752 | Classification | Collected in authentic landfill environment. 524x524 resolution. | UCI ML Repo | ? | **:heavycheckmark:** | | BePLi Dataset v1 | 1 | - | 3 709 | Instance Segmentation / Object Detection | Beach plastic litter (various types) in coastal environments (Japan). MS COCO format. | ResearchGate Paper | CC BY 4.0 | **:heavycheck_mark:** |

Description

TrashCan 1.0

An Instance-Segmentation Labeled Dataset of Trash Observations

7212 images under 3 main categories: bio, trash, unknown. Categories: * bio = turtle, squid, lobster, unknown, jellyfish, stingray, shrimp, crawfish, octopus, shark, shell, crab, starfish, eel * trash = clothing, pipe, bottle, bag, snack_wrapper, glove, tire, can, cup,container, branch, wreakage, tarp, box, hose, rope, hay, net, paper, bucket, wire * unknown Download: Directly from website https://conservancy.umn.edu/handle/11299/214865

Trash-ICRA19:

A Bounding Box Labeled Dataset of Underwater Tras 5,700 underwater images extracted from video https://jungseokhong.github.io/

Download: Directly from website https://conservancy.umn.edu/handle/11299/214366

TACO

Open dataset with 1500 images from 28 categories and 60 detailed sub-categories of waste in the wild. Annotations available in COCO-json.

Download: Directly from website http://tacodataset.org/

TACO bboxes

Additional hand-labelled annotations for images from TACO dataset. There are seven recognized waste categories: * bio: food waste such as fruit, vegetables, herbs, used paper towels and tissues, * glass: glass objects such as glass bottles, jars, cosmetics packaging, * metals and plastic: scrap metal and non-ferrous metal, beverage cans, plastic beverage bottles, plastic shards, plastic food packaging, or plastic straws, * non-recyclable: residual rubbish such as disposable diapers, pieces of string, polystyrene packaging, polystyrene elements, blankets, clothing, or used paper cups, * other: construction and demolition, large-size waste (e.g. tires), used electronics and household appliances, batteries, paint and varnish cans, or expired medicines, * paper: paper, cardboard packaging, receipts, newspapers, catalogues, and books, * unknown waste: (highly decomposed and hard-to-recognize litter), * and extra class background label without any litter: a sidewalk, a forest path, a lawn

Read more about it in the paper Deep learning-based waste detection in natural and urban environments,.

Download: Directly from detect waste repository

UAVVaste

Drone rubbish detection intelligent technology The UAVVaste dataset consists to date of 772 images and 3716 annotations. The main motivation for creation of the dataset was the lack of domain-specific data. The datasets that are widely used for object detection evaluation benchmarking. The dataset is made publicly available and is intended to be expanded.

Avaiable annotations for Detection and Segmentation https://github.com/UAVVaste/UAVVaste

Download: Directly from annotations json on github https://github.com/UAVVaste/UAVVaste

Trashnet

The dataset spans six classes: glass, paper, cardboard, plastic, metal, and trash. Currently, the dataset consists of 2527 images:

  • 501 glass
  • 594 paper
  • 403 cardboard
  • 482 plastic
  • 410 metal
  • 137 trash

Download: Directly from github https://github.com/garythung/trashnet

also is known as Garbage Classification Data

The Garbage Classification Dataset contains 2467 images from 6 categories: cardboard (393), glass (491), metal (400), paper(584), plastic (472) and trash(127).

Download: Directly from kaggle https://www.kaggle.com/asdasdasasdas/garbage-classification

Plastic Waste DataBase of Images – WaDaBa

4000 images with detailed description of a plastic type (PET, PP, PE-HD...), object color, deformation level, dirtiness and others. [classification]

The object were put on the research position and next photographed with first and second type of light. There were series carried out of 10 photographs with differ in the angle of the turnover for every object (in the vertical axis). Next the object was damaged to varying degrees: small, medium and large. For each type of destruction have been made 10 photographs. So considering all variants for every object 40 photographs were taken, multiplying it by the number of objects, 4 000 of photographs were created in the database.

Download: Images free-to-download directly from website. Annotations available after signing license http://wadaba.pcz.pl/#download

GLASSENSE-VISION

Home-supplies classification. It is not strict litter dataset but it gathers over 2000 images with objects well-spareted from background. Covers 7 main categories of (Banknotes, Cereals, Medicines, Cans, Tomato sauces, Water bottle, Deodorant stick) and 136 subcategories.

Glassense-Vision is a set of data we acquired and annotated to the purpose of providing a quantitative and repeatable assessment of the proposed method. The dataset includes 7 different use cases, meaning different object categories, where for each one of them we provide training (reference images used also to build dictionaries) and test images. All images in the dataset are manually annotated. The different use cases (object categories) can be grouped in three main geometrical types:

Download: http://www.slipguru.unige.it/Data/glassense_vision/

Waste Classification data

Over 25k images already divided into training data - 22564 images and test data - 2513 images. Two main categories: Organic and recyclable

Download: Directly from kaggle https://www.kaggle.com/techsash/waste-classification-data

Waste Classification Data v2

A variation about the Waste Classification data: extended by the new category "N" - Nonrecyclable added.

Over 25k images already divided into training data - 22564 + 2508 (N) images and test data - 2513 images + new 397 from category nonrecyclable. Three main categories: Organic (O) and recyclable (R), and nonrecyclable (N). TRAIN folder contains 2508 images in the "N" directory. The TEST folder contains 397 images in the "N" directory.

Download: Directly from kaggle https://www.kaggle.com/sapal6/waste-classification-data-v2

Open litter map

The biggest dataset with over 100k images in total with 11 main categories and 187 subcategories.[multilabel] [classification] https://openlittermap.com/

Download: Only from json with scraper - detectwaste scraper

Litter

The Litter dataset contains 14k images with 20k annotations (bounding boxes) and 24 classes. Each class represents an object (cup), while subclasses determine its size, shape, or material (long paper cup/short paper cup).

Download: After buying a license https://www.imageannotation.ai/litter-dataset

Drinking Waste Classification

The dataset contains ~10k images grupped by 4 classes of drinking waste: Aluminium Cans, Glass bottles, PET (plastic) bottles and HDPE (plastic) Milk bottles. Pictures were taken with 12 MP phone camera as a part of final year Individual Project at University College London. The dataset used parts of manually collected images from TrashNet.

Download: Directly from kaggle https://www.kaggle.com/arkadiyhacks/drinking-waste-classification

waste_pictures

The dataset contains ~24k images grupped by 34 classes of waste for classification purposes. The images were divided into train and test subsets.

Download: Directly from kaggle https://www.kaggle.com/wangziang/waste-pictures

spotgarbage - GINI dataset

The Garbage in Images (GINI) dataset with 2561 images with unspecified resolution, 1496 images were annotated by bounding boxes (one class - trash). Bing Image Search API was used to create their dataset.

Download: Directly from github https://github.com/spotgarbage/spotgarbage-GINI

DeepSeaWaste

This dataset consists of ~3k images divided by 4 categories, and taken under water. In csv file annotations were provided as:

  • source url of picture,
  • waste category,
  • date of taking the picture,
  • the place and depth at which the waste was found,
  • information whether it contains living organisms and sediments stuff,
  • information if this is some plastic bag.

Download: Directly from kaggle https://www.kaggle.com/henryhaefliger/deepseawaste

MJU-Waste v1.0

This dataset was created by capture collected waste items from a university campus in a lab background (people hold waste items in their hands). All images in the dataset are captured using a Microsoft Kinect RGBD camera. All annotations are provided in PASCAL VOC and COCO format.

MJU-Waste v1, contains 2475 co-registered RGB and depth image pairs. Images are randomly splited into a training set, a validation set and a test set of 1485, 248 and 742 images, respectively. Authors used single class label for all waste objects.

Download: From Google Drive link placed on https://github.com/realwecan/mju-waste/

Domestic Trash Dataset

Domestic Trash Dataset consists of images of domestic common trash objects. Images were captured and crowdsourced under wide variety of lighting conditions, weather, indoor and outdoor. This dataset can be used for make trash/litter detection models, eco-friendly alternative suggestions, carbon footprint generation etc.

Dataset Features

  • Various trash object classes
  • Has material labels
  • Captured by 5000+ unique users
  • Highly diverse and HD
  • Various lighting conditions
  • Indoor and Outdoor scenes

Dataset Format

  • Classification and detection annotations available
  • COCO, PASCAL VOC and YOLO formats
  • Approx. 9000+ unique images and growing
  • Only 250 images for free avaiable on kaggle

Download Images available for download after buying a license. Contact them from their support details at: https://github.com/datacluster-labs/Datacluster-Datasets

Cigarette butt dataset

This dataset consists of a set of 2200 synthetically composed images of cigarettes on the ground. It is designed for training CNNs (convolutional neural networks). You must read and accept the terms of the Non-Commercial, Educational License Agreement to download and use its content.

Dataset Features

  • Annotations: Segmented, object-detection COCO format with custom categories.
  • Composition: Images were composed automatically with custom code utilizing the Python Imaging Library to apply random scale, rotation, brightness, etc to the foreground cutouts
  • Location: Photos of the ground and cigarette butts were taken in Austin, Texas
  • Camera: iPhone 8, original pixel resolution 3024 x 4032

Download Images available for download after accepting the terms of the Non-Commercial, Educational License Agreement at: https://www.immersivelimit.com/datasets/cigarette-butts

TrashBox dataset

Dataset of trash objects for waste classification and detection (no detection annotations provided in repository). Contains 17785 waste object images scraped from web.

Waste categories are as follows: 1. Medical waste : Syringes, Surgical Gloves, Surgical Masks, Medicines( Drugs and Pills) [Number of images: 2010] 2. E-Waste : Electronic chips, Laptops and Smartphones, Applicances, Electric wires, cords and cables [Number of images: 2883] 3. Plastic : Bags, Bottles, Containers, Cups, Cigarette Butts (which have a plastic filter) [Number of images: 2669] 4. Paper : Tetra Pak, News Papers, Paper Cups, Paper Tissues [Number of images: 2695] 5. Metal : Beverage Cans, Cnostruction Scrap, Spray Cans, Food Grade Cans, Other metal objects. [Number of images: 2586] 6. Glass [Number of images: 2528] 7. Cardboard [Number of images: 2414]

Download Images are available for download at github repository: nikhilvenkatkumsetty/TrashBox

Garbage Dataset (V2)

This dataset contains 19,762 images of garbage items categorized into 10 classes: Metal, Glass, Biological, Paper, Battery, Trash, Cardboard, Shoes, Clothes, and Plastic. It is designed for machine learning projects focusing on recycling and waste management, suitable for classification or object detection models. The dataset was featured in the research paper "Managing Household Waste Through Transfer Learning".

Download: Directly from Kaggle https://www.kaggle.com/datasets/sumn2u/garbage-classification-v2

RealWaste

RealWaste is an image classification dataset featuring 4752 color images (524x524 resolution) of waste items across 9 major material types: Cardboard, Food Organics, Glass, Metal, Miscellaneous Trash, Paper, Plastic, Textile Trash, and Vegetation. The images were collected within an authentic landfill environment as part of research comparing CNN performance on real versus pure waste items. Higher resolution images may be available from the authors.

Download: Dataset information and potential download available via UCI Machine Learning Repository https://archive.ics.uci.edu/dataset/908/realwaste

BePLi Dataset v1

The Beach Plastic Litter Dataset version 1 (BePLi Dataset v1) includes 3709 images taken in various coastal environments in Yamagata Prefecture, Japan (e.g., sand beaches, rocky beaches, tetrapods). It provides instance-based and pixel-level annotations (in a modified MS COCO format) for a single class, "plastic litter," which encompasses items like PET bottles, containers, fishing gear, and styrene foams. The dataset aims to support the development of models for identifying and analyzing beach plastic litter.

Download: The dataset link is referenced in the associated Data in Brief article available via ResearchGate https://www.researchgate.net/publication/370218660BePLiDatasetv1BeachPlasticLitterDatasetversion1forinstancesegmentationofbeachplasticlitter (License: CC BY 4.0)

Owner

  • Name: Agnieszka Mikołajczyk
  • Login: AgaMiko
  • Kind: user
  • Location: Gdańsk
  • Company: Gdansk University of Technology/ Voicelab.ai

Machine Learning Scientist & Enthusiast🤖 https://twitter.com/AgnMikolajczyk LN: https://www.linkedin.com/in/agnieszkamikolajczyk/

GitHub Events

Total
  • Watch event: 56
  • Issue comment event: 1
  • Push event: 1
  • Fork event: 9
Last Year
  • Watch event: 56
  • Issue comment event: 1
  • Push event: 1
  • Fork event: 9

Issues and Pull Requests

Last synced: about 1 year ago

All Time
  • Total issues: 2
  • Total pull requests: 4
  • Average time to close issues: 5 months
  • Average time to close pull requests: about 2 months
  • Total issue authors: 2
  • Total pull request authors: 3
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.5
  • Merged pull requests: 4
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: 1 day
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 1.0
  • Average comments per pull request: 1.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • Erotemic (1)
  • raffaem (1)
Pull Request Authors
  • majsylw (2)
  • boxydog (2)
  • ihkk (1)
Top Labels
Issue Labels
Pull Request Labels