mmsegmentation_tutorial

Welcome to this step-by-step guide on setting up and using MMSegmentation, a powerful open-source toolbox for semantic segmentation built on PyTorch. We will use the Stanford Background Dataset with PSPNet for semantic segmentation as an example, showcasing how to preprocess data, modify configurations, and fine-tune a segmentation model.

https://github.com/mjdmahasneh/mmsegmentation_tutorial

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.6%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: MjdMahasneh
License: apache-2.0
Language: Python
Default Branch: master
Homepage:
Size: 9.99 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created over 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme License Citation

README.md

Custom Segmentation with MMSegmentation

Welcome to this step-by-step guide on setting up and using MMSegmentation, a powerful open-source toolbox for semantic segmentation built on PyTorch. This tutorial will walk you through:

✅ Setting up a Conda environment and installing dependencies
✅ Downloading and running inference with a pretrained model
✅ Adding a new dataset for training
✅ Configuring and training a custom segmentation model

We will use the Stanford Background Dataset with PSPNet for semantic segmentation as an example, showcasing how to preprocess data, modify configurations, and fine-tune a segmentation model.

Whether you're a beginner or an experienced ML practitioner, this tutorial will help you get up and running with MMSegmentation quickly. Let’s dive in! 🚀

pspnet

Let's create a new conda environment and install MMSegmentation

Create a new conda environment

conda create -n mmseg_env python=3.8 -y conda activate mmseg_env

Check nvcc version

nvcc -V

Install Pytorch. I reccomend following the official guide for this step.

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117

Install openmim mmengine and MMCV. You can follow the official guide (or the tutorial). pip install openmim pip install mmengine pip install mmcv==2.1.0 -f https://download.openmmlab.com/mmcv/dist/cu117/torch2.0/index.html
Install mmsegmentation from source git clone https://github.com/open-mmlab/mmsegmentation.git cd mmsegmentation pip install -e .
Check installations ```python

Check Pytorch installation

import torch, torchvision print(torch.version, torch.cuda.is_available())

Check MMSegmentation installation

import mmseg print(mmseg.version) ```

Download a pretrained model and run inference

Download the pretrained model mim download mmsegmentation --config pspnet_r50-d8_4xb2-40k_cityscapes-512x1024 --dest .
Run Inference with MMSeg trained weight, either using: python demo/image_demo.py demo/demo.png configs/pspnet/pspnet_r50-d8_4xb2-40k_cityscapes-512x1024.py pspnet_r50-d8_512x1024_40k_cityscapes_20200605_003338-2966598c.pth --device cuda:0 --out-file result.jpg

or using the following code (./demo.py):

```python import torch from mmseg.apis import inferencemodel, initmodel, showresultpyplot import mmcv

Check installations

print(torch.version, torch.cuda.isavailable()) import mmseg print(mmseg.version_)

Run Inference with MMSeg trained weight

configfile = './configs/pspnet/pspnetr50-d84xb2-40kcityscapes-512x1024.py' checkpointfile = 'pspnetr50-d8512x102440kcityscapes20200605_003338-2966598c.pth'

build the model from a config file and a checkpoint file

model = initmodel(configfile, checkpoint_file, device='cuda:0')

test a single image and show the results

img = 'demo/demo.png' # or img = mmcv.imread(img), which will only load it once result = inference_model(model, img)

visualize the results in a new window

showresultpyplot(model, img, result, show=True)

or save the visualization results to image files

you can change the opacity of the painted segmentation map in (0, 1].

showresultpyplot(model, img, result, show=True, out_file='result.jpg', opacity=0.5)

test a video and show the results

video = mmcv.VideoReader('./video/video.mp4') for frame in video: result = inferencemodel(model, frame) showresultpyplot(model, frame, result, waittime=0.1) ```

Add a new dataset

Datasets in MMSegmentation require image and semantic segmentation maps to be placed in folders with the same prefix. To support a new dataset, we may need to modify the original file structure.

In this tutorial, we give an example of converting the dataset. You may refer to docs for details about dataset reorganization.

We use Stanford Background Dataset as an example. The dataset contains 715 images chosen from existing public datasets LabelMe, MSRC, PASCAL VOC and Geometric Context. Images from these datasets are mainly outdoor scenes, each containing approximately 320-by-240 pixels. In this tutorial, we use the region annotations as labels. There are 8 classes in total, i.e. sky, tree, road, grass, water, building, mountain, and foreground object.

Download the dataset curl -o stanford_background.tar.gz http://dags.stanford.edu/data/iccv09Data.tar.gz
Extract the dataset ```

unzip manually or use the following command:

tar xf stanford_background.tar.gz ```
Let's take a look at the dataset. Refer to convert_dataset.py for the complete code.

```python import mmcv import matplotlib.pyplot as plt

img = mmcv.imread('iccv09Data/images/6000124.jpg') plt.figure(figsize=(8, 6)) plt.imshow(mmcv.bgr2rgb(img)) plt.show() ```

We need to convert the annotation into semantic map format as an image.
- MMSegmentation expects annotation masks in an indexed (P-mode) format rather than regular RGB images. We can convert the annotation files to indexed images using the following code (I have also included a stand-alone helper script to handle conversion if you have masks in an image format, see ./helpers/convert_masks_to_palette_based_png.py).:

```python import mmcv import matplotlib.pyplot as plt import os.path as osp import os import numpy as np from PIL import Image from mmengine.utils import scandir import matplotlib.patches as mpatches

convert dataset annotation to semantic segmentation map

dataroot = 'iccv09Data' imgdir = 'images' ann_dir = 'labels'

define class and plaette for better visualization

classes = ('sky', 'tree', 'road', 'grass', 'water', 'bldg', 'mntn', 'fg obj') palette = [[128, 128, 128], [129, 127, 38], [120, 69, 125], [53, 125, 34], [0, 11, 123], [118, 20, 12], [122, 81, 25], [241, 134, 51]]

for file in mmcv.scandir(osp.join(dataroot, anndir), suffix='.regions.txt'):

for file in scandir(osp.join(dataroot, anndir), suffix='.regions.txt'): segmap = np.loadtxt(osp.join(dataroot, anndir, file)).astype(np.uint8) segimg = Image.fromarray(segmap).convert('P') segimg.putpalette(np.array(palette, dtype=np.uint8)) segimg.save(osp.join(dataroot, ann_dir, file.replace('.regions.txt', '.png')))

```

Let's take a look at the segmentation map we got

```python import matplotlib.patches as mpatches

img = Image.open('iccv09Data/labels/6000124.png') plt.figure(figsize=(8, 6)) im = plt.imshow(np.array(img.convert('RGB')))

create a patch (proxy artist) for every color

patches = [mpatches.Patch(color=np.array(palette[i])/255., label=classes[i]) for i in range(8)]

put those patched as legend-handles into the legend

plt.legend(handles=patches, bboxtoanchor=(1.05, 1), loc=2, borderaxespad=0., fontsize='large')

plt.show() ```

split train/val set randomly

```python

split train/val set randomly

split_dir = 'splits'

mmcv.mkdirorexist(osp.join(dataroot, splitdir))

os.makedirs(osp.join(dataroot, splitdir), exist_ok=True)

filenamelist = [osp.splitext(filename)[0] for filename in mmcv.scandir(osp.join(dataroot, ann_dir), suffix='.png')]

filenamelist = [osp.splitext(filename)[0] for filename in scandir(osp.join(dataroot, ann_dir), suffix='.png')]

with open(osp.join(dataroot, splitdir, 'train.txt'), 'w') as f: # select first 4/5 as train set trainlength = int(len(filenamelist)*4/5) f.writelines(line + '\n' for line in filenamelist[:trainlength]) with open(osp.join(dataroot, splitdir, 'val.txt'), 'w') as f: # select last 1/5 as train set f.writelines(line + '\n' for line in filenamelist[trainlength:]) ```

Training a custom segmentation model

Here we will train a PSPNet model on the Stanford Background Dataset. We will define a new dataset class, modify the configuration file, and train the model. Refer to train.py for the complete code.

Define and register a new dataset ```python from mmseg.registry import DATASETS from mmseg.datasets import BaseSegDataset

define dataset root and directory for images and annotations

dataroot = 'iccv09Data' imgdir = 'images' ann_dir = 'labels'

define class and palette for better visualization

define dataset class for Stanford Background

@DATASETS.registermodule() class StanfordBackgroundDataset(BaseSegDataset): METAINFO = dict(classes = classes, palette = palette) def _init(self, **kwargs): super().init_(imgsuffix='.jpg', segmapsuffix='.png', **kwargs) ```

Load and modify the configuration file ```python from mmengine import Config

let's load and modify the config file

cfg = Config.fromfile('configs/pspnet/pspnetr50-d84xb2-40kcityscapes-512x1024.py') print(f'Config:\n{cfg.prettytext}')

Since we use only one GPU, BN is used instead of SyncBN

cfg.normcfg = dict(type='BN', requiresgrad=True) cfg.cropsize = (256, 256) cfg.model.datapreprocessor.size = cfg.cropsize cfg.model.backbone.normcfg = cfg.normcfg cfg.model.decodehead.normcfg = cfg.normcfg cfg.model.auxiliaryhead.normcfg = cfg.norm_cfg

modify num classes of the model in decode/auxiliary head

cfg.model.decodehead.numclasses = 8 cfg.model.auxiliaryhead.numclasses = 8

Modify dataset type and path

cfg.datasettype = 'StanfordBackgroundDataset' cfg.dataroot = data_root

cfg.traindataloader.batchsize = 4 #8

cfg.trainpipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations'), dict(type='RandomResize', scale=(320, 240), ratiorange=(0.5, 2.0), keepratio=True), dict(type='RandomCrop', cropsize=cfg.cropsize, catmax_ratio=0.75), dict(type='RandomFlip', prob=0.5), dict(type='PackSegInputs') ]

cfg.testpipeline = [ dict(type='LoadImageFromFile'), dict(type='Resize', scale=(320, 240), keepratio=True), # add loading annotation after Resize because ground truth # does not need to do resize data transform dict(type='LoadAnnotations'), dict(type='PackSegInputs') ]

cfg.traindataloader.dataset.type = cfg.datasettype cfg.traindataloader.dataset.dataroot = cfg.dataroot cfg.traindataloader.dataset.dataprefix = dict(imgpath=imgdir, segmappath=anndir) cfg.traindataloader.dataset.pipeline = cfg.trainpipeline cfg.traindataloader.dataset.annfile = 'splits/train.txt'

cfg.valdataloader.dataset.type = cfg.datasettype cfg.valdataloader.dataset.dataroot = cfg.dataroot cfg.valdataloader.dataset.dataprefix = dict(imgpath=imgdir, segmappath=anndir) cfg.valdataloader.dataset.pipeline = cfg.testpipeline cfg.valdataloader.dataset.annfile = 'splits/val.txt'

cfg.testdataloader = cfg.valdataloader

Load the pretrained weights

cfg.loadfrom = 'pspnetr50-d8512x102440kcityscapes20200605_003338-2966598c.pth'

Set up working dir to save files and logs.

cfg.workdir = './workdirs/tutorial'

cfg.traincfg.maxiters = 1000 #200 cfg.traincfg.valinterval = 500 #200 cfg.defaulthooks.logger.interval = 10 cfg.defaulthooks.checkpoint.interval = 200

cfg.traindataloader.numworkers = 4

cfg.valdataloader.numworkers = 4

cfg.testdataloader.numworkers = 4

Set seed to facilitate reproducing the result

cfg['randomness'] = dict(seed=0)

Let's have a look at the final config used for training

print(f'Config:\n{cfg.pretty_text}') ```

Train the model python from mmengine.runner import Runner # Run training runner = Runner.from_cfg(cfg) runner.train()

Test the model

Here we will test the trained model on a sample image from the dataset. Refer to test.py for the complete code.

```python

Run testing

from mmseg.apis import initmodel, inferencemodel, showresultpyplot import mmcv import matplotlib.pyplot as plt

inference with trained model

Init the model from the config and the checkpoint

checkpointpath = './workdirs/tutorial/iter200.pth' model = initmodel(cfg, checkpoint_path, 'cuda:0')

img = mmcv.imread('./iccv09Data/images/6000124.jpg') result = inferencemodel(model, img) plt.figure(figsize=(8, 6)) visresult = showresultpyplot(model, img, result) plt.imshow(mmcv.bgr2rgb(vis_result)) plt.show() ```

Segmentation Results

Below are the Intersection over Union (IoU) and Accuracy (Acc) results for each class:

| Class | IoU (%) | Acc (%) | |--------|---------|---------| | Sky | 87.71 | 91.04 | | Tree | 68.34 | 79.82 | | Road | 89.29 | 95.95 | | Grass | 77.29 | 83.47 | | Water | 78.92 | 85.91 | | Bldg | 75.41 | 90.72 | | Mntn | 32.13 | 54.24 | | Fg Obj | 68.24 | 78.41 |

Visualization Results

Here are four sample results from the segmentation model:

MMSegmentation Model Zoo

MMsegmentation provides a wide range of pre-trained models for semantic segmentation tasks. Here are some popular models available in the MMSegmentation Model Zoo:

| Model | Description | Configuration Link | |-------------|------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------| | FCN | Fully Convolutional Network for semantic segmentation. | FCN Configurations | | PSPNet | Pyramid Scene Parsing Network for scene understanding. | PSPNet Configurations | | DeepLabV3 | Atrous Spatial Pyramid Pooling for semantic image segmentation. | DeepLabV3 Configurations | | DeepLabV3+ | Enhanced DeepLabV3 with encoder-decoder structure for better segmentation results. | DeepLabV3+ Configurations | | UPerNet | Unified Perceptual Parsing for scene segmentation tasks. | UPerNet Configurations | | SegFormer | A simple and efficient design for semantic segmentation with Transformers. | SegFormer Configurations | | Mask2Former | A universal segmentation architecture for image and video segmentation tasks. | Mask2Former Configurations | | HRNet | High-Resolution Network for accurate and detailed semantic segmentation. | HRNet Configurations | | OCRNet | Object-Contextual Representations for semantic segmentation. | OCRNet Configurations | | Fast-SCNN | Fast Semantic Segmentation Network for real-time segmentation on mobile devices. | Fast-SCNN Configurations |

Note: For a comprehensive list of models and their configurations, please refer to the MMSegmentation Model Zoo.

Adapting the Tutorial for Different Models

To use a model other than PSPNet in this tutorial, you need to:

Modify the Configuration File: Replace the PSPNet configuration with the desired model's configuration file. For example, to use DeepLabV3+, download its configuration from the DeepLabV3+ Configurations and update the paths accordingly.
Download the Pre-trained Weights: Obtain the pre-trained weights corresponding to the chosen model. You can find the appropriate weights in the model's configuration directory or the MMSegmentation Model Zoo.
Update the Code: Ensure that the code references the new configuration file and pre-trained weights. Adjust any model-specific parameters as needed.

By following these steps, you can seamlessly switch to different models within the MMSegmentation framework. 🚀

About This Tutorial

This tutorial is based on MMSegmentation by OpenMMLab, a powerful open-source toolbox for semantic segmentation. You can find their official repository here: MMSegmentation GitHub and their official tutorial here: MMSegmentation Tutorial.

Owner

Name: Majedaldein Almahasneh
Login: MjdMahasneh
Kind: user
Location: United Kingdom
Company: Department of Computer Science, Swansea University, Swansea, UK.

Repositories: 1
Profile: https://github.com/MjdMahasneh

Ph.D. in Machine Learning

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - name: "MMSegmentation Contributors"
title: "OpenMMLab Semantic Segmentation Toolbox and Benchmark"
date-released: 2020-07-10
url: "https://github.com/open-mmlab/mmsegmentation"
license: Apache-2.0

GitHub Events

Total

Watch event: 1
Push event: 2
Create event: 2

Last Year

Watch event: 1
Push event: 2
Create event: 2

Dependencies

.circleci/docker/Dockerfile docker

pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build

docker/Dockerfile docker

pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build

docker/serve/Dockerfile docker

pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build

requirements/albu.txt pypi

albumentations >=0.3.2

requirements/docs.txt pypi

docutils ==0.16.0
myst-parser *
sphinx ==4.0.2
sphinx_copybutton *
sphinx_markdown_tables *
urllib3 <2.0.0

requirements/mminstall.txt pypi

mmcv >=2.0.0rc4,<2.2.0
mmengine >=0.5.0,<1.0.0

requirements/multimodal.txt pypi

ftfy *
regex *

requirements/optional.txt pypi

cityscapesscripts *
diffusers *
einops ==0.3.0
imageio ==2.9.0
imageio-ffmpeg ==0.4.2
invisible-watermark *
kornia ==0.6
nibabel *
omegaconf ==2.1.1
pudb ==2019.2
pytorch-lightning ==1.4.2
streamlit >=0.73.1
test-tube >=0.7.5
timm *
torch-fidelity ==0.3.0
torchmetrics ==0.6.0
transformers ==4.19.2

requirements/readthedocs.txt pypi

mmcv >=2.0.0rc1,<2.1.0
mmengine >=0.4.0,<1.0.0
prettytable *
scipy *
torch *
torchvision *

requirements/runtime.txt pypi

matplotlib *
numpy *
packaging *
prettytable *
scipy *

requirements/tests.txt pypi

codecov * test
flake8 * test
ftfy * test
interrogate * test
pytest * test
regex * test
xdoctest >=0.10.0 test
yapf * test

requirements.txt pypi

setup.py pypi