mmsegmentation_tutorial
Welcome to this step-by-step guide on setting up and using MMSegmentation, a powerful open-source toolbox for semantic segmentation built on PyTorch. We will use the Stanford Background Dataset with PSPNet for semantic segmentation as an example, showcasing how to preprocess data, modify configurations, and fine-tune a segmentation model.
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.6%) to scientific vocabulary
Repository
Welcome to this step-by-step guide on setting up and using MMSegmentation, a powerful open-source toolbox for semantic segmentation built on PyTorch. We will use the Stanford Background Dataset with PSPNet for semantic segmentation as an example, showcasing how to preprocess data, modify configurations, and fine-tune a segmentation model.
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Custom Segmentation with MMSegmentation
Welcome to this step-by-step guide on setting up and using MMSegmentation, a powerful open-source toolbox for semantic segmentation built on PyTorch. This tutorial will walk you through:
- ✅ Setting up a Conda environment and installing dependencies
- ✅ Downloading and running inference with a pretrained model
- ✅ Adding a new dataset for training
- ✅ Configuring and training a custom segmentation model
We will use the Stanford Background Dataset with PSPNet for semantic segmentation as an example, showcasing how to preprocess data, modify configurations, and fine-tune a segmentation model.
Whether you're a beginner or an experienced ML practitioner, this tutorial will help you get up and running with MMSegmentation quickly. Let’s dive in! 🚀

Let's create a new conda environment and install MMSegmentation
- Create a new conda environment
conda create -n mmseg_env python=3.8 -y
conda activate mmseg_env
- Check nvcc version
nvcc -V
- Install Pytorch. I reccomend following the official guide for this step.
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
Install openmim mmengine and MMCV. You can follow the official guide (or the tutorial).
pip install openmim pip install mmengine pip install mmcv==2.1.0 -f https://download.openmmlab.com/mmcv/dist/cu117/torch2.0/index.htmlInstall mmsegmentation from source
git clone https://github.com/open-mmlab/mmsegmentation.git cd mmsegmentation pip install -e .Check installations ```python
Check Pytorch installation
import torch, torchvision print(torch.version, torch.cuda.is_available())
Check MMSegmentation installation
import mmseg print(mmseg.version) ```
Download a pretrained model and run inference
Download the pretrained model
mim download mmsegmentation --config pspnet_r50-d8_4xb2-40k_cityscapes-512x1024 --dest .Run Inference with MMSeg trained weight, either using:
python demo/image_demo.py demo/demo.png configs/pspnet/pspnet_r50-d8_4xb2-40k_cityscapes-512x1024.py pspnet_r50-d8_512x1024_40k_cityscapes_20200605_003338-2966598c.pth --device cuda:0 --out-file result.jpg
or using the following code (./demo.py):
```python import torch from mmseg.apis import inferencemodel, initmodel, showresultpyplot import mmcv
Check installations
print(torch.version, torch.cuda.isavailable()) import mmseg print(mmseg.version_)
Run Inference with MMSeg trained weight
configfile = './configs/pspnet/pspnetr50-d84xb2-40kcityscapes-512x1024.py' checkpointfile = 'pspnetr50-d8512x102440kcityscapes20200605_003338-2966598c.pth'
build the model from a config file and a checkpoint file
model = initmodel(configfile, checkpoint_file, device='cuda:0')
test a single image and show the results
img = 'demo/demo.png' # or img = mmcv.imread(img), which will only load it once result = inference_model(model, img)
visualize the results in a new window
showresultpyplot(model, img, result, show=True)
or save the visualization results to image files
you can change the opacity of the painted segmentation map in (0, 1].
showresultpyplot(model, img, result, show=True, out_file='result.jpg', opacity=0.5)
test a video and show the results
video = mmcv.VideoReader('./video/video.mp4') for frame in video: result = inferencemodel(model, frame) showresultpyplot(model, frame, result, waittime=0.1) ```
Add a new dataset
Datasets in MMSegmentation require image and semantic segmentation maps to be placed in folders with the same prefix. To support a new dataset, we may need to modify the original file structure.
In this tutorial, we give an example of converting the dataset. You may refer to docs for details about dataset reorganization.
We use Stanford Background Dataset as an example. The dataset contains 715 images chosen from existing public datasets LabelMe, MSRC, PASCAL VOC and Geometric Context. Images from these datasets are mainly outdoor scenes, each containing approximately 320-by-240 pixels. In this tutorial, we use the region annotations as labels. There are 8 classes in total, i.e. sky, tree, road, grass, water, building, mountain, and foreground object.
Download the dataset
curl -o stanford_background.tar.gz http://dags.stanford.edu/data/iccv09Data.tar.gzExtract the dataset ```
unzip manually or use the following command:
tar xf stanford_background.tar.gz ```
Let's take a look at the dataset. Refer to
convert_dataset.pyfor the complete code.
```python import mmcv import matplotlib.pyplot as plt
img = mmcv.imread('iccv09Data/images/6000124.jpg') plt.figure(figsize=(8, 6)) plt.imshow(mmcv.bgr2rgb(img)) plt.show() ```
- We need to convert the annotation into semantic map format as an image.
- MMSegmentation expects annotation masks in an indexed (P-mode) format rather than regular RGB images. We can convert the annotation files to indexed images using the following code (I have also included a stand-alone helper script to handle conversion if you have masks in an image format, see
./helpers/convert_masks_to_palette_based_png.py).:
- MMSegmentation expects annotation masks in an indexed (P-mode) format rather than regular RGB images. We can convert the annotation files to indexed images using the following code (I have also included a stand-alone helper script to handle conversion if you have masks in an image format, see
```python import mmcv import matplotlib.pyplot as plt import os.path as osp import os import numpy as np from PIL import Image from mmengine.utils import scandir import matplotlib.patches as mpatches
convert dataset annotation to semantic segmentation map
dataroot = 'iccv09Data' imgdir = 'images' ann_dir = 'labels'
define class and plaette for better visualization
classes = ('sky', 'tree', 'road', 'grass', 'water', 'bldg', 'mntn', 'fg obj') palette = [[128, 128, 128], [129, 127, 38], [120, 69, 125], [53, 125, 34], [0, 11, 123], [118, 20, 12], [122, 81, 25], [241, 134, 51]]
for file in mmcv.scandir(osp.join(dataroot, anndir), suffix='.regions.txt'):
for file in scandir(osp.join(dataroot, anndir), suffix='.regions.txt'): segmap = np.loadtxt(osp.join(dataroot, anndir, file)).astype(np.uint8) segimg = Image.fromarray(segmap).convert('P') segimg.putpalette(np.array(palette, dtype=np.uint8)) segimg.save(osp.join(dataroot, ann_dir, file.replace('.regions.txt', '.png')))
```
- Let's take a look at the segmentation map we got
```python import matplotlib.patches as mpatches
img = Image.open('iccv09Data/labels/6000124.png') plt.figure(figsize=(8, 6)) im = plt.imshow(np.array(img.convert('RGB')))
create a patch (proxy artist) for every color
patches = [mpatches.Patch(color=np.array(palette[i])/255., label=classes[i]) for i in range(8)]
put those patched as legend-handles into the legend
plt.legend(handles=patches, bboxtoanchor=(1.05, 1), loc=2, borderaxespad=0., fontsize='large')
plt.show() ```
- split train/val set randomly
```python
split train/val set randomly
split_dir = 'splits'
mmcv.mkdirorexist(osp.join(dataroot, splitdir))
os.makedirs(osp.join(dataroot, splitdir), exist_ok=True)
filenamelist = [osp.splitext(filename)[0] for filename in mmcv.scandir(osp.join(dataroot, ann_dir), suffix='.png')]
filenamelist = [osp.splitext(filename)[0] for filename in scandir(osp.join(dataroot, ann_dir), suffix='.png')]
with open(osp.join(dataroot, splitdir, 'train.txt'), 'w') as f: # select first 4/5 as train set trainlength = int(len(filenamelist)*4/5) f.writelines(line + '\n' for line in filenamelist[:trainlength]) with open(osp.join(dataroot, splitdir, 'val.txt'), 'w') as f: # select last 1/5 as train set f.writelines(line + '\n' for line in filenamelist[trainlength:]) ```
Training a custom segmentation model
Here we will train a PSPNet model on the Stanford Background Dataset. We will define a new dataset class, modify the configuration file, and train the model. Refer to train.py for the complete code.
- Define and register a new dataset ```python from mmseg.registry import DATASETS from mmseg.datasets import BaseSegDataset
define dataset root and directory for images and annotations
dataroot = 'iccv09Data' imgdir = 'images' ann_dir = 'labels'
define class and palette for better visualization
classes = ('sky', 'tree', 'road', 'grass', 'water', 'bldg', 'mntn', 'fg obj') palette = [[128, 128, 128], [129, 127, 38], [120, 69, 125], [53, 125, 34], [0, 11, 123], [118, 20, 12], [122, 81, 25], [241, 134, 51]]
define dataset class for Stanford Background
@DATASETS.registermodule() class StanfordBackgroundDataset(BaseSegDataset): METAINFO = dict(classes = classes, palette = palette) def _init(self, **kwargs): super().init_(imgsuffix='.jpg', segmapsuffix='.png', **kwargs) ```
- Load and modify the configuration file ```python from mmengine import Config
let's load and modify the config file
cfg = Config.fromfile('configs/pspnet/pspnetr50-d84xb2-40kcityscapes-512x1024.py') print(f'Config:\n{cfg.prettytext}')
Since we use only one GPU, BN is used instead of SyncBN
cfg.normcfg = dict(type='BN', requiresgrad=True) cfg.cropsize = (256, 256) cfg.model.datapreprocessor.size = cfg.cropsize cfg.model.backbone.normcfg = cfg.normcfg cfg.model.decodehead.normcfg = cfg.normcfg cfg.model.auxiliaryhead.normcfg = cfg.norm_cfg
modify num classes of the model in decode/auxiliary head
cfg.model.decodehead.numclasses = 8 cfg.model.auxiliaryhead.numclasses = 8
Modify dataset type and path
cfg.datasettype = 'StanfordBackgroundDataset' cfg.dataroot = data_root
cfg.traindataloader.batchsize = 4 #8
cfg.trainpipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations'), dict(type='RandomResize', scale=(320, 240), ratiorange=(0.5, 2.0), keepratio=True), dict(type='RandomCrop', cropsize=cfg.cropsize, catmax_ratio=0.75), dict(type='RandomFlip', prob=0.5), dict(type='PackSegInputs') ]
cfg.testpipeline = [
dict(type='LoadImageFromFile'),
dict(type='Resize', scale=(320, 240), keepratio=True),
# add loading annotation after Resize because ground truth
# does not need to do resize data transform
dict(type='LoadAnnotations'),
dict(type='PackSegInputs')
]
cfg.traindataloader.dataset.type = cfg.datasettype cfg.traindataloader.dataset.dataroot = cfg.dataroot cfg.traindataloader.dataset.dataprefix = dict(imgpath=imgdir, segmappath=anndir) cfg.traindataloader.dataset.pipeline = cfg.trainpipeline cfg.traindataloader.dataset.annfile = 'splits/train.txt'
cfg.valdataloader.dataset.type = cfg.datasettype cfg.valdataloader.dataset.dataroot = cfg.dataroot cfg.valdataloader.dataset.dataprefix = dict(imgpath=imgdir, segmappath=anndir) cfg.valdataloader.dataset.pipeline = cfg.testpipeline cfg.valdataloader.dataset.annfile = 'splits/val.txt'
cfg.testdataloader = cfg.valdataloader
Load the pretrained weights
cfg.loadfrom = 'pspnetr50-d8512x102440kcityscapes20200605_003338-2966598c.pth'
Set up working dir to save files and logs.
cfg.workdir = './workdirs/tutorial'
cfg.traincfg.maxiters = 1000 #200 cfg.traincfg.valinterval = 500 #200 cfg.defaulthooks.logger.interval = 10 cfg.defaulthooks.checkpoint.interval = 200
cfg.traindataloader.numworkers = 4
cfg.valdataloader.numworkers = 4
cfg.testdataloader.numworkers = 4
Set seed to facilitate reproducing the result
cfg['randomness'] = dict(seed=0)
Let's have a look at the final config used for training
print(f'Config:\n{cfg.pretty_text}') ```
- Train the model
python from mmengine.runner import Runner # Run training runner = Runner.from_cfg(cfg) runner.train()
Test the model
Here we will test the trained model on a sample image from the dataset. Refer to test.py for the complete code.
```python
Run testing
from mmseg.apis import initmodel, inferencemodel, showresultpyplot import mmcv import matplotlib.pyplot as plt
inference with trained model
Init the model from the config and the checkpoint
checkpointpath = './workdirs/tutorial/iter200.pth' model = initmodel(cfg, checkpoint_path, 'cuda:0')
img = mmcv.imread('./iccv09Data/images/6000124.jpg') result = inferencemodel(model, img) plt.figure(figsize=(8, 6)) visresult = showresultpyplot(model, img, result) plt.imshow(mmcv.bgr2rgb(vis_result)) plt.show() ```
Segmentation Results
Below are the Intersection over Union (IoU) and Accuracy (Acc) results for each class:
| Class | IoU (%) | Acc (%) | |--------|---------|---------| | Sky | 87.71 | 91.04 | | Tree | 68.34 | 79.82 | | Road | 89.29 | 95.95 | | Grass | 77.29 | 83.47 | | Water | 78.92 | 85.91 | | Bldg | 75.41 | 90.72 | | Mntn | 32.13 | 54.24 | | Fg Obj | 68.24 | 78.41 |
Visualization Results
Here are four sample results from the segmentation model:
![]() |
![]() |
![]() |
![]() |
MMSegmentation Model Zoo
MMsegmentation provides a wide range of pre-trained models for semantic segmentation tasks. Here are some popular models available in the MMSegmentation Model Zoo:
| Model | Description | Configuration Link | |-------------|------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------| | FCN | Fully Convolutional Network for semantic segmentation. | FCN Configurations | | PSPNet | Pyramid Scene Parsing Network for scene understanding. | PSPNet Configurations | | DeepLabV3 | Atrous Spatial Pyramid Pooling for semantic image segmentation. | DeepLabV3 Configurations | | DeepLabV3+ | Enhanced DeepLabV3 with encoder-decoder structure for better segmentation results. | DeepLabV3+ Configurations | | UPerNet | Unified Perceptual Parsing for scene segmentation tasks. | UPerNet Configurations | | SegFormer | A simple and efficient design for semantic segmentation with Transformers. | SegFormer Configurations | | Mask2Former | A universal segmentation architecture for image and video segmentation tasks. | Mask2Former Configurations | | HRNet | High-Resolution Network for accurate and detailed semantic segmentation. | HRNet Configurations | | OCRNet | Object-Contextual Representations for semantic segmentation. | OCRNet Configurations | | Fast-SCNN | Fast Semantic Segmentation Network for real-time segmentation on mobile devices. | Fast-SCNN Configurations |
Note: For a comprehensive list of models and their configurations, please refer to the MMSegmentation Model Zoo.
Adapting the Tutorial for Different Models
To use a model other than PSPNet in this tutorial, you need to:
Modify the Configuration File: Replace the PSPNet configuration with the desired model's configuration file. For example, to use DeepLabV3+, download its configuration from the DeepLabV3+ Configurations and update the paths accordingly.
Download the Pre-trained Weights: Obtain the pre-trained weights corresponding to the chosen model. You can find the appropriate weights in the model's configuration directory or the MMSegmentation Model Zoo.
Update the Code: Ensure that the code references the new configuration file and pre-trained weights. Adjust any model-specific parameters as needed.
By following these steps, you can seamlessly switch to different models within the MMSegmentation framework. 🚀
About This Tutorial
This tutorial is based on MMSegmentation by OpenMMLab, a powerful open-source toolbox for semantic segmentation. You can find their official repository here: MMSegmentation GitHub and their official tutorial here: MMSegmentation Tutorial.
Owner
- Name: Majedaldein Almahasneh
- Login: MjdMahasneh
- Kind: user
- Location: United Kingdom
- Company: Department of Computer Science, Swansea University, Swansea, UK.
- Repositories: 1
- Profile: https://github.com/MjdMahasneh
Ph.D. in Machine Learning
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - name: "MMSegmentation Contributors" title: "OpenMMLab Semantic Segmentation Toolbox and Benchmark" date-released: 2020-07-10 url: "https://github.com/open-mmlab/mmsegmentation" license: Apache-2.0
GitHub Events
Total
- Watch event: 1
- Push event: 2
- Create event: 2
Last Year
- Watch event: 1
- Push event: 2
- Create event: 2
Dependencies
- pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build
- pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build
- pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build
- albumentations >=0.3.2
- docutils ==0.16.0
- myst-parser *
- sphinx ==4.0.2
- sphinx_copybutton *
- sphinx_markdown_tables *
- urllib3 <2.0.0
- mmcv >=2.0.0rc4,<2.2.0
- mmengine >=0.5.0,<1.0.0
- ftfy *
- regex *
- cityscapesscripts *
- diffusers *
- einops ==0.3.0
- imageio ==2.9.0
- imageio-ffmpeg ==0.4.2
- invisible-watermark *
- kornia ==0.6
- nibabel *
- omegaconf ==2.1.1
- pudb ==2019.2
- pytorch-lightning ==1.4.2
- streamlit >=0.73.1
- test-tube >=0.7.5
- timm *
- torch-fidelity ==0.3.0
- torchmetrics ==0.6.0
- transformers ==4.19.2
- mmcv >=2.0.0rc1,<2.1.0
- mmengine >=0.4.0,<1.0.0
- prettytable *
- scipy *
- torch *
- torchvision *
- matplotlib *
- numpy *
- packaging *
- prettytable *
- scipy *
- codecov * test
- flake8 * test
- ftfy * test
- interrogate * test
- pytest * test
- regex * test
- xdoctest >=0.10.0 test
- yapf * test



