fastseg

📸 PyTorch implementation of MobileNetV3 for real-time semantic segmentation, with pretrained weights & state-of-the-art performance

https://github.com/ekzhang/fastseg

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.8%) to scientific vocabulary

Keywords

aspp cityscapes computer-vision deep-learning deeplabv3 edge-computing efficientnet kitti-dataset mapillary-vistas-dataset mobilenetv3 pytorch semantic-segmentation

Last synced: 6 months ago · JSON representation

Repository

📸 PyTorch implementation of MobileNetV3 for real-time semantic segmentation, with pretrained weights & state-of-the-art performance

Basic Info

Host: GitHub
Owner: ekzhang
License: mit
Language: Python
Default Branch: master
Homepage:
Size: 957 KB

Statistics

Stars: 357
Watchers: 18
Forks: 33
Open Issues: 1
Releases: 0

Topics

aspp cityscapes computer-vision deep-learning deeplabv3 edge-computing efficientnet kitti-dataset mapillary-vistas-dataset mobilenetv3 pytorch semantic-segmentation

Created over 5 years ago · Last pushed almost 5 years ago

Metadata Files

Readme License

Fast Semantic Segmentation

This respository aims to provide accurate real-time semantic segmentation code for mobile devices in PyTorch, with pretrained weights on Cityscapes. This can be used for efficient segmentation on a variety of real-world street images, including datasets like Mapillary Vistas, KITTI, and CamVid.

python from fastseg import MobileV3Large model = MobileV3Large.from_pretrained().cuda().eval() model.predict(images)

Example image segmentation video

The models are implementations of MobileNetV3 (both large and small variants) with a modified segmentation head based on LR-ASPP. The top model was able to achieve 72.3% mIoU accuracy on Cityscapes val, while running at up to 37.3 FPS on a GPU. Please see below for detailed benchmarks.

Currently, you can do the following:

Load pretrained MobileNetV3 semantic segmentation models.
Easily generate hard segmentation labels or soft probabilities for street image scenes.
Evaluate MobileNetV3 models on Cityscapes, or your own dataset.
Export models for production with ONNX.

If you have any feature requests or questions, feel free to leave them as GitHub issues!

What's New?
Overview
Requirements
Pretrained Models and Metrics
Usage
- Running Inference
- Exporting to ONNX
Training from Scratch
Contributions

What's New?

September 29th, 2020

Released training code for semantic segmentation models

August 12th, 2020

Added pretrained weights for MobileV3Small with 256 filters

August 11th, 2020

Initial release
Implementations of MobileV3Large and MobileV3Small with LR-ASPP
Pretrained weights for MobileV3Large with 128/256 filters, and MobileV3Small with 64/128 filters
Inference, ONNX export, and optimization scripts

Overview

Here's an excerpt from the original paper introducing MobileNetV3:

This paper starts the exploration of how automated search algorithms and network design can work together to harness complementary approaches improving the overall state of the art. Through this process we create two new MobileNet models for release: MobileNetV3-Large and MobileNetV3-Small, which are targeted for high and low resource use cases. These models are then adapted and applied to the tasks of object detection and semantic segmentation.

For the task of semantic segmentation (or any dense pixel prediction), we propose a new efficient segmentation decoder Lite Reduced Atrous Spatial Pyramid Pooling (LR-ASPP). We achieve new state of the art results for mobile classification, detection and segmentation.

MobileNetV3-Large LRASPP is 34% faster than MobileNetV2 R-ASPP at similar accuracy for Cityscapes segmentation.

This project tries to faithfully implement MobileNetV3 for real-time semantic segmentation, with the aims of being efficient, easy to use, and extensible.

Requirements

This code requires Python 3.7 or later. It has been tested to work with PyTorch versions 1.5 and 1.6. To install the package, simply run pip install fastseg. Then you can get started with a pretrained model:

```python

Load a pretrained MobileNetV3 segmentation model in inference mode

from fastseg import MobileV3Large model = MobileV3Large.from_pretrained().cuda() model.eval()

Open a local image as input

from PIL import Image image = Image.open('street_image.png')

Predict numeric labels [0-18] for each pixel of the image

labels = model.predict_one(image) ```

Example image segmentation

More detailed examples are given below. As an alternative, instead of installing fastseg from pip, you can clone this repository and install the geffnet package (along with other dependencies) by running pip install -r requirements.txt in the project root.

Pretrained Models and Metrics

I was able to train a few models close to or exceeding the accuracy described in the original Searching for MobileNetV3 paper. Each was trained only on the gtFine labels from Cityscapes for around 12 hours on an Nvidia DGX-1 node, with 8 V100 GPUs.

| Model | Segmentation Head | Parameters | mIoU | Inference | TensorRT | Weights? | | :-------------: | :---------------: | :--------: | :---: | :-------: | :------: | :------: | | MobileV3Large | LR-ASPP, F=256 | 3.6M | 72.3% | 21.1 FPS | 30.7 FPS | ✔ | | MobileV3Large | LR-ASPP, F=128 | 3.2M | 72.3% | 25.7 FPS | 37.3 FPS | ✔ | | MobileV3Small | LR-ASPP, F=256 | 1.4M | 67.8% | 30.3 FPS | 39.4 FPS | ✔ | | MobileV3Small | LR-ASPP, F=128 | 1.1M | 67.4% | 38.2 FPS | 52.4 FPS | ✔ | | MobileV3Small | LR-ASPP, F=64 | 1.0M | 66.9% | 46.5 FPS | 61.9 FPS | ✔ |

The accuracy is within 0.3% of the original paper, which reported 72.6% mIoU and 3.6M parameters on the Cityscapes val set. Inference was tested on a single V100 GPU with full-resolution 2MP images (1024 x 2048) as input. It runs roughly 4x faster on half-resolution (512 x 1024) images.

The "TensorRT" column shows benchmarks I ran after exporting optimized ONNX models to Nvidia TensorRT with fp16 precision. Performance is measured by taking average GPU latency over 100 iterations.

Usage

Running Inference

The easiest way to get started with inference is to clone this repository and use the infer.py script. For example, if you have street images named city_1.png and city_2.png, then you can generate segmentation labels for them with the following command.

shell $ python infer.py city_1.png city_2.png

Output: ==> Creating PyTorch MobileV3Large model ==> Loading images and running inference Loading city_1.png Generated colorized_city_1.png Generated composited_city_1.png Loading city_2.png Generated colorized_city_2.png Generated composited_city_2.png

| Original | Colorized | Composited | | :----------------------------------: | :----------------------------------: | :----------------------------------: | | | | | | | | |

To interact with the models programmatically, first install the fastseg package with pip, as described above. Then, you can import and construct models in your own Python code, which are instances of PyTorch nn.Module.

```python from fastseg import MobileV3Large, MobileV3Small

Load a pretrained segmentation model

model = MobileV3Large.from_pretrained()

Load a segmentation model from a local checkpoint

model = MobileV3Small.from_pretrained('path/to/weights.pt')

Create a custom model with random initialization

model = MobileV3Large(numclasses=19, useaspp=False, num_filters=256) ```

To run inference on an image or batch of images, you can use the methods model.predict_one() and model.predict(), respectively. These methods take care of the preprocessing and output interpretation for you; they take PIL Images or NumPy arrays as input and return a NumPy array.

(You can also run inference directly with model.forward(), which will return a tensor containing logits, but be sure to normalize the inputs to have mean 0 and variance 1.)

```python import torch from PIL import Image from fastseg import MobileV3Large, MobileV3Small

Construct a new model with pretrained weights, in evaluation mode

model = MobileV3Large.from_pretrained().cuda() model.eval()

Run inference on an image

img = Image.open('city1.png') labels = model.predictone(img) # returns a NumPy array containing integer labels assert labels.shape == (1024, 2048)

Run inference on a batch of images

img2 = Image.open('city2.png') batchlabels = model.predict([img, img2]) # returns a NumPy array containing integer labels assert batch_labels.shape == (2, 1024, 2048)

Run forward pass directly

dummyinput = torch.randn(1, 3, 1024, 2048, device='cuda') with torch.nograd(): dummyoutput = model(dummyinput) assert dummy_output.shape == (1, 19, 1024, 2048) ```

The output labels can be visualized with colorized and composited images.

```python from fastseg.image import colorize, blend

colorized = colorize(labels) # returns a PIL Image colorized.show()

composited = blend(img, colorized) # returns a PIL Image composited.show() ```

Exporting to ONNX

The onnx_export.py script can be used to convert a pretrained segmentation model to ONNX. You should specify the image input dimensions when exporting. See the usage instructions below:

``` $ python onnxexport.py --help usage: onnxexport.py [-h] [--model MODEL] [--numfilters NUMFILTERS] [--size SIZE] [--checkpoint CHECKPOINT] OUTPUT_FILENAME

Command line script to export a pretrained segmentation model to ONNX.

positional arguments: OUTPUTFILENAME filename of output model (e.g., mobilenetv3large.onnx)

optional arguments: -h, --help show this help message and exit --model MODEL, -m MODEL the model to export (default MobileV3Large) --numfilters NUMFILTERS, -F NUM_FILTERS the number of filters in the segmentation head (default 128) --size SIZE, -s SIZE the image dimensions to set as input (default 1024,2048) --checkpoint CHECKPOINT, -c CHECKPOINT filename of the weights checkpoint .pth file (uses pretrained by default) ```

The onnx_optimize.py script optimizes exported models. If you're looking to deploy a model to TensorRT or a mobile device, you might also want to run it through onnx-simplifier.

Training from Scratch

Please see the ekzhang/semantic-segmentation repository for the training code used in this project, as well as documentation about how to train your own custom models.

Contributions

Pull requests are always welcome! A big thanks to Andrew Tao and Karan Sapra from NVIDIA ADLR for helpful discussions and for lending me their training code, as well as Branislav Kisacanin, without whom this wouldn't be possible.

I'm grateful for advice from: Ching Hung, Eric Viscito, Franklyn Wang, Jagadeesh Sankaran, and Zoran Nikolic.

Licensed under the MIT License.

Owner

Name: Eric Zhang
Login: ekzhang
Kind: user
Location: New York, NY
Company: @modal-labs

Website: https://www.ekzhang.com
Twitter: ekzhang1
Repositories: 76
Profile: https://github.com/ekzhang

An honest, more human kind of software

GitHub Events

Total

Watch event: 21

Last Year

Watch event: 21

Committers

Last synced: 9 months ago

All Time

Total Commits: 39
Total Committers: 1
Avg Commits per committer: 39.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Eric Zhang	e**1@g**m	39

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 11
Total pull requests: 1
Average time to close issues: about 2 months
Average time to close pull requests: 11 days
Total issue authors: 10
Total pull request authors: 1
Average comments per issue: 3.09
Average comments per pull request: 2.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 1

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

yamengxi (2)
lucasjinreal (1)
sjosic (1)
amil-rp-work (1)
pushkar-khetrapal (1)
theloni-monk (1)
guitLearn (1)
rwightman (1)
tejaswigowda (1)
Unfixab1e (1)

Pull Request Authors

dependabot[bot] (1)

Top Labels

Issue Labels

Pull Request Labels

dependencies (1)

Dependencies

requirements.txt pypi

Pillow ==7.2.0
geffnet ==0.9.8
numpy >=1.18.0
onnx ==1.7.0
onnxruntime ==1.4.0
torch >=1.5.0
torchvision >=0.6.0

setup.py pypi

Pillow *
geffnet *
numpy *
torch *
torchvision *

fastseg

Science Score: 10.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Fast Semantic Segmentation

Table of Contents

What's New?

September 29th, 2020

August 12th, 2020

August 11th, 2020

Overview

Requirements

Load a pretrained MobileNetV3 segmentation model in inference mode

Open a local image as input

Predict numeric labels [0-18] for each pixel of the image

Pretrained Models and Metrics

Usage

Running Inference

Load a pretrained segmentation model

Load a segmentation model from a local checkpoint

Create a custom model with random initialization

Construct a new model with pretrained weights, in evaluation mode

Run inference on an image

Run inference on a batch of images

Run forward pass directly

Exporting to ONNX

Training from Scratch

Contributions

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies