https://github.com/ccomkhj/imagedatasetmanager

Simplify the process of managing datasets in computer vision tasks using Datumaro, by offering a GUI. Split, Merge, Filter Dataset (i.e. coco).

https://github.com/ccomkhj/imagedatasetmanager

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.8%) to scientific vocabulary

Keywords

annotation-tool coco datumaro gui labeling-tool
Last synced: 5 months ago · JSON representation

Repository

Simplify the process of managing datasets in computer vision tasks using Datumaro, by offering a GUI. Split, Merge, Filter Dataset (i.e. coco).

Basic Info
  • Host: GitHub
  • Owner: ccomkhj
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 69.3 KB
Statistics
  • Stars: 2
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
annotation-tool coco datumaro gui labeling-tool
Created over 1 year ago · Last pushed 9 months ago
Metadata Files
Readme License

readme.md

Image Dataset Manager

A streamlined GUI for computer vision dataset management using the Datumaro framework. Simplifies dataset registration, merging, filtering, and validation with an intuitive interface designed for researchers and engineers.

Features

  • Register new datasets with automatic train/validation splitting
  • Merge multiple datasets while maintaining category consistency
  • Filter annotations using custom Python functions
  • Validate datasets with comprehensive error detection
  • Manage annotation categories with an intuitive editor
  • Visualize annotations with advanced statistics and insights
  • Compare annotations between different versions of a dataset
  • S3 Integration for cloud-based dataset management

Project Structure

ImageDatasetManager/ ├── app.py # Main application entry point ├── utils.py # Utility functions ├── enhanced_viz.py # Enhanced visualization module ├── pages/ │ ├── new.py # Register new datasets │ ├── merge.py # Merge existing datasets │ ├── filter.py # Filter annotations │ ├── validate.py # Validate datasets │ ├── category.py # Manage categories │ ├── stats_visualizer.py # Advanced statistics visualization │ └── compare.py # Compare annotation versions ├── requirements.txt └── credentials/ └── aws.yaml # AWS credentials for S3 access

Installation

  1. Clone the repository: bash git clone https://github.com/ccomkhj/ImageDatasetManager cd ImageDatasetManager/

  2. Create a virtual environment and activate it: bash conda create -n ImageDatasetManager python=3.11 -y conda activate ImageDatasetManager [Note] tested with python =< 3.11 OR bash python3 -m venv env source env/bin/activate # On Windows, use `env\Scripts\activate`

  3. Install the dependencies: bash pip install -r requirements.txt

  4. (if you want to use S3) Set up AWS credentials in credentials/aws.yaml: yaml aws_access_key_id: <YOUR_AWS_ACCESS_KEY_ID> aws_secret_access_key: <YOUR_AWS_SECRET_ACCESS_KEY>

Usage

  1. Run the Streamlit application: bash streamlit run app.py

  2. Open your browser and go to http://localhost:8501 to interact with the Datumaro-GUI.

To read new dataset

``` -- annotations |- instancestrain.json |- instancesval.json # subsets are train and val -- images |- train |- val

or -- annotations |- instances_default.json # subsets are only default -- images |- default ```

Key Interfaces

Register Annotation

Upload and process images and annotations with automatic train/val splitting, visualization, and S3 integration.

Merge Annotations

Combine datasets from S3 or local sources with category consistency checking and customizable splitting options.

Filter Annotation

Apply custom Python filters with a code editor, templates, and visualization of filtered results.

Validate Dataset

Identify issues with comprehensive validation reports, statistics, and visual summaries.

Category Editor

Manage annotation categories with an interactive table editor and visualization tools.

Statistics Visualizer

Advanced visualization interface with:

  • Interactive category-based annotation samples
  • Comprehensive distribution charts and statistics
  • Annotation type and size analysis
  • Support for local uploads, paths, and S3 sources
  • Detailed visualization of bounding boxes, segmentation, and keypoints

Compare Annotations

Compare two COCO annotation files to identify discrepancies:

  • Upload two COCO annotation files and their corresponding images
  • Identify bounding boxes with high IoU but different categories
  • Visualize mismatches with color-coded bounding boxes (COCO1: red solid lines, COCO2: blue dashed lines)
  • Adjust IoU threshold for comparison sensitivity
  • View detailed mismatch information in a structured dataframe
  • Ideal for quality control and annotation verification

License

This project is licensed under the MIT License. See the LICENSE file for details.

Owner

  • Name: Huijo
  • Login: ccomkhj
  • Kind: user
  • Location: Germany
  • Company: @hexafarms

Self Learner

GitHub Events

Total
  • Watch event: 1
  • Push event: 3
Last Year
  • Watch event: 1
  • Push event: 3

Dependencies

requirements.txt pypi
  • boto3 *
  • datumaro *
  • streamlit *