perspective-occupancy-map

https://github.com/shantanusingh16/perspective-occupancy-map

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.0%) to scientific vocabulary

Last synced: 6 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: shantanusingh16
License: mit
Language: Jupyter Notebook
Default Branch: main
Size: 1.07 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 1

Created over 3 years ago · Last pushed 7 months ago

Metadata Files

Readme License Citation

Perspective Occupancy Map (POM)

Predict semantic occupancy from monocular perspective images and project them to bird’s-eye view using a learned feature projection pipeline.

Architecture

Overview

This repository implements POMv2, a deep learning framework for jointly predicting: - Semantic perspective occupancy maps using DeepLabV3, - Projected top-view (BEV) segmentation using learnable BEV fusion.

It uses PyTorch Lightning for modular training and evaluation, and integrates Weights & Biases (wandb) for experiment tracking.

Key Features

Two-Branch Architecture: Combines a segmentation branch (DeepLabV3) and a learned BEV encoder-decoder (PON branch).
Learned Perspective-to-BEV Projection: Uses camera geometry to scatter semantic logits and fuse features for top-down decoding.
BEV-Aware Training: Supervises both semantic occupancy maps and top-view segmentation.
Configurable & Reproducible: YAML-based experiment configs and W&B integration.

Model

POMv2 consists of a two-branch architecture designed to jointly predict semantic occupancy in both perspective view and top-down BEV:

Semantic POM Head: A DeepLabV3 network predicts object footprints in perspective images. These footprints are more stable across frames than raw RGB, providing a consistent spatial signal that helps stabilize training and improves convergence.
PON Branch: Extracts high-level geometric and semantic features directly from the perspective image using a CNN encoder. These features are projected into the BEV space and decoded via a UNet-style module.
BEV Fusion: The model fuses projected semantic logits from the POM head and PON features into a unified BEV representation for final segmentation.

Why Predicting Perspective Occupancy Helps

Temporal Stability: Object footprints in perspective views change slowly across adjacent frames, providing a strong inductive bias that improves generalization.
Improved Supervision: Learning semantic POM maps provides intermediate supervision that grounds the BEV learning with spatial priors.
Cross-Dataset Transfer: The POM head can be pretrained on large-scale segmentation datasets (e.g., Cityscapes), enabling the model to leverage diverse real-world data and generalize better to new environments.

Architecture

The architecture contains: - A DeepLabV3 model for semantic POM prediction. - A PON_mod encoder that extracts spatial features from the perspective view. - A BEV projection module that maps semantic logits into a top-view grid using calibrated camera geometry. - A UNet-style decoder that predicts top-down segmentation maps.

Project Structure

. ├── configs/ # Experiment configs ├── datasets/ # Dataset definitions and transforms ├── models/ # Model components (POMv2, PON_mod, etc.) ├── utils/ # Helper functions and metrics ├── images/ # Architecture and output visualizations ├── train.py # Training script ├── eval.py # Evaluation script ├── requirements.txt # Package dependencies └── README.md

Installation

bash git clone https://github.com/yourusername/perspective-occupancy-map.git cd perspective-occupancy-map pip install -r requirements.txt

Training

Configure your experiment in configs/*.yaml, then run:

bash python train.py --config configs/your_config.yaml

Evaluation

To run evaluation on a trained checkpoint:

bash python eval.py --config configs/your_config.yaml --ckpt path/to/checkpoint.ckpt

Results (TBA)

| Dataset | Segmentation Objects | mIOU(%) | mAP(%)| Pretrained Model | | :--------: | :-----: | :----: | :----: | :----: | | KITTI 3D Object | Vehicle | - | - | - | | KITTI Odometry | Road | - | - | - | | KITTI Raw | Road | - | - | - | | Argoverse Tracking | Vehicle | - | - | - | | Argoverse Tracking | Road | - | - | - |

License

This project is licensed under the MIT License. See LICENSE for details.

Citation

If you use this project in your research, please cite us.

bibtex @software{Singh_Perspective_Occupancy_Map_2022, author = {Singh, Shantanu}, doi = {10.5281/zenodo.16370976}, month = aug, title = {{Perspective Occupancy Map (POM)}}, url = {https://github.com/shantanusingh16/Perspective-Occupancy-Map}, version = {1.0.0}, year = {2022} }

Owner

Name: Shantanu Singh
Login: shantanusingh16
Kind: user

Repositories: 3
Profile: https://github.com/shantanusingh16

Avid programmer with diverse interests.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Singh"
  given-names: "Shantanu"
  orcid: "https://orcid.org/0000-0002-6833-1434"
title: "Perspective Occupancy Map (POM)"
version: 1.0.0
doi: 10.5281/zenodo.16370976
date-released: 2022-08-13
url: "https://github.com/shantanusingh16/Perspective-Occupancy-Map"

GitHub Events

Total

Release event: 1
Push event: 4
Create event: 1

Last Year

Release event: 1
Push event: 4
Create event: 1

Dependencies

requirements.txt pypi

Pillow-SIMD ==9.0.0.post1
cprint *
pyfakefs ==4.6.3
torchgeometry ==0.1.2
yacsc *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

perspective-occupancy-map

Science Score: 57.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Perspective Occupancy Map (POM)

Overview

Key Features

Model

Why Predicting Perspective Occupancy Helps

Architecture

Project Structure

Installation

Training

Evaluation

Results (TBA)

License

Citation

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Dependencies