yolov8-multi-task
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org, scholar.google, springer.com, ieee.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.8%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: JiayuanWang-JW
- License: agpl-3.0
- Language: Python
- Default Branch: main
- Size: 19.4 MB
Statistics
- Stars: 346
- Watchers: 4
- Forks: 55
- Open Issues: 42
- Releases: 0
Metadata Files
README.md
You Only Look at Once for Real-time and Generic Multi-Task
This repository(Yolov8 multi-task) is the official PyTorch implementation of the paper "You Only Look at Once for Real-time and Generic Multi-Task".
You Only Look at Once for Real-time and Generic Multi-Task
by Jiayuan Wang, Q. M. Jonathan Wu :email: and Ning Zhang
(:email:) corresponding author.
The Illustration of A-YOLOM

Contributions
- We have developed a lightweight model capable of integrating three tasks into a single unified model. This is particularly beneficial for multi-task that demand real-time processing.
- We have designed a novel Adaptive Concatenate Module specifically for the neck region of segmentation architectures. This module can adaptively concatenate features without manual design, further enhancing the model's generality.
- We designed a lightweight, simple, and generic segmentation head. We have a unified loss function for the same type of task head, meaning we don't need to custom design for specific tasks. It is only built by a series of convolutional layers.
- Extensive experiments are conducted based on publicly accessible autonomous driving datasets, which demonstrate that our model can outperform existing works, particularly in terms of inference time and visualization. Moreover, we further conducted experiments using real road datasets, which also demonstrate that our model significantly outperformed the state-of-the-art approaches.
Results
Parameters and speed
| Model | Parameters | FPS (bs=1) | FPS (bs=32) | |----------------|-------------|------------|-------------| | YOLOP | 7.9M | 26.0 | 134.8 | | HybridNet | 12.83M | 11.7 | 26.9 | | YOLOv8n(det) | 3.16M | 102 | 802.9 | | YOLOv8n(seg) | 3.26M | 82.55 | 610.49 | | A-YOLOM(n) | 4.43M | 39.9 | 172.2 | | A-YOLOM(s) | 13.61M | 39.7 | 96.2 |
Traffic Object Detection Result
| Model | Recall (%) | mAP50 (%) | |-------------|------------|------------| | MultiNet | 81.3 | 60.2 | | DLT-Net | 89.4 | 68.4 | | Faster R-CNN| 81.2 | 64.9 | | YOLOv5s | 86.8 | 77.2 | | YOLOv8n(det)| 82.2 | 75.1 | | YOLOP | 88.6 | 76.5 | | A-YOLOM(n) | 85.3 | 78.0 | | A-YOLOM(s) | 86.9 | 81.1 |
Drivable Area Segmentation Result
| Model | mIoU (%) | |----------------|----------| | MultiNet | 71.6 | | DLT-Net | 72.1 | | PSPNet | 89.6 | | YOLOv8n(seg) | 78.1 | | YOLOP | 91.6 | | A-YOLOM(n) | 90.5 | | A-YOLOM(s) | 91.0 |
Lane Detection Result:
| Model | Accuracy (%) | IoU (%) | |----------------|--------------|---------| | Enet | N/A | 14.64 | | SCNN | N/A | 15.84 | | ENet-SAD | N/A | 16.02 | | YOLOv8n(seg) | 80.5 | 22.9 | | YOLOP | 84.8 | 26.5 | | A-YOLOM(n) | 81.3 | 28.2 | | A-YOLOM(s) | 84.9 | 28.8 |
Ablation Studies 1: Adaptive concatenation module:
| Training method | Recall (%) | mAP50 (%) | mIoU (%) | Accuracy (%) | IoU (%) | |-----------------|------------|-----------|----------|--------------|---------| | YOLOM(n) | 85.2 | 77.7 | 90.6 | 80.8 | 26.7 | | A-YOLOM(n) | 85.3 | 78 | 90.5 | 81.3 | 28.2 | | YOLOM(s) | 86.9 | 81.1 | 90.9 | 83.9 | 28.2 | | A-YOLOM(s) | 86.9 | 81.1 | 91 | 84.9 | 28.8 |
Ablation Studies 2: Results of different Multi-task model and segmentation structure:
| Model | Parameters | mIoU (%) | Accuracy (%) | IoU (%) | |----------------|------------|----------|--------------|---------| | YOLOv8(segda) | 1004275 | 78.1 | - | - | | YOLOv8(segll) | 1004275 | - | 80.5 | 22.9 | | YOLOv8(multi) | 2008550 | 84.2 | 81.7 | 24.3 | | YOLOM(n) | 15880 | 90.6 | 80.8 | 26.7 |
YOLOv8(multi) and YOLOM(n) only display two segmentation head parameters in total. They indeed have three heads, we ignore the detection head parameters because this is an ablation study for segmentation structure.
Notes:
- The works we has use for reference including
Multinet(paper,code),DLT-Net(paper),Faster R-CNN(paper,code),YOLOv5s(code) ,PSPNet(paper,code) ,ENet(paper,code)SCNN(paper,code)SAD-ENet(paper,code),YOLOP(paper,code),HybridNets(paper,code),YOLOv8(code). Thanks for their wonderful works.
Recommendation:
- If you seek higher performance and can tolerate reduced speed and increased model complexity, we recommend our latest model, RMT-PPAD. It is built on RT-DETR to implement multi-task learning and still achieves real-time performance on an RTX 4090 GPU.
Visualization
Real Road

Requirement
This codebase has been developed with Python==3.7.16 with PyTorch==1.13.1.
You can use a 1080Ti GPU with 16 batch sizes. That will be fine. Only need more time to train. We recommend using a 4090 or more powerful GPU, which will be fast.
We strongly recommend you create a pure environment and follow our instructions to build yours. Otherwise, you may encounter some issues because the YOLOv8 has many mechanisms to detect your environment package automatically. Then it will change some variable values to further affect the code running.
setup
cd YOLOv8-multi-task
pip install -e .
Data preparation and Pre-trained model
Download
Download the images from images.
Pre-trained model: A-YOLOM # which include two version, scale "n" and "s".
Download the annotations of detection from detection-object.
Download the annotations of drivable area segmentation from seg-drivable-10.
Download the annotations of lane line segmentation from seg-lane-11.
We recommend the dataset directory structure to be the following:
```
The id represent the correspondence relation
├─dataset root │ ├─images │ │ ├─train2017 │ │ ├─val2017 │ ├─detection-object │ │ ├─labels │ │ │ ├─train2017 │ │ │ ├─val2017 │ ├─seg-drivable-10 │ │ ├─labels │ │ │ ├─train2017 │ │ │ ├─val2017 │ ├─seg-lane-11 │ │ ├─labels │ │ │ ├─train2017 │ │ │ ├─val2017 ```
Update the your dataset path in the ./ultralytics/datasets/bdd-multi.yaml.
Training
You can set the training configuration in the ./ultralytics/yolo/cfg/default.yaml.
python train.py
You can change the setting in train.py
```python
setting
sys.path.insert(0, "/home/jiayuan/ultralytics-main/ultralytics")
You should change the path to your local path to "ultralytics" file
model = YOLO('/home/jiayuan/ultralytics-main/ultralytics/models/v8/yolov8-bdd-v4-one-dropout-individual.yaml', task='multi')
You need to change the model path for yours.
The model files saved under "./ultralytics/models/v8"
model.train(data='/home/jiayuan/ultralytics-main/ultralytics/datasets/bdd-multi-toy.yaml', batch=4, epochs=300, imgsz=(640,640), device=[4], name='v4640', val=True, task='multi',classes=[2,3,4,9,10,11],combineclass=[2,3,4,9],single_cls=True) ``` - data: Please change the "data" path to yours. You can find it under "./ultralytics/datasets"
device: If you have multi-GPUs, please list your GPU numbers, such as [0,1,2,3,4,5,6,7,8]
name: Your project name, the result and trained model will save under "./ultralytics/runs/multi/Your Project Name"
task: If you want to use the Multi-task model, please keep "multi" here
classes: You can change this to control which classfication in training, 10 and 11 means drivable area and lane line segmentation. You can create or change dataset map under "./ultralytics/datasets/bdd-multi.yaml"
combine_class: means the model will combine "classes" into one class, such as our project combining the "car", "bus", "truck", and "train" into "vehicle".
singlecls: This will combine whole detection classes into one class, for example, you have 7 classes in your dataset, and when you use "singlecls", it will automatically combine them into one class. When you set singlecls=False or delete the singlecls from model.train(). Please follow the below Note to change the "tnc" in both dataset.yaml and model.yaml, "nc_list" in dataset.yaml, the output of the detection head as well.
Evaluation
You can set the evaluation configuration in the ./ultralytics/yolo/cfg/default.yaml
python val.py
You can change the setting in val.py
```python
setting
sys.path.insert(0, "/home/jiayuan/yolom/ultralytics")
The same with train, you should change the path to yours.
model = YOLO('/home/jiayuan/ultralytics-main/ultralytics/runs/best.pt')
Please change this path to your well-trained model. You can use our provide the pre-train model or your model under "./ultralytics/runs/multi/Your Project Name/weight/best.pt"
metrics = model.val(data='/home/jiayuan/ultralytics-main/ultralytics/datasets/bdd-multi.yaml',device=[3],task='multi',name='val',iou=0.6,conf=0.001, imgsz=(640,640),classes=[2,3,4,9,10,11],combineclass=[2,3,4,9],singlecls=True)
``
- data: Please change the "data" path to yours. You can find it under "./ultralytics/datasets"
- device: If you have multi-GPUs, please list your GPU numbers, such as [0,1,2,3,4,5,6,7,8]. We do not recommend you use multi-GPU in val because usually, one GPU is enough.
- speed: If you want to calculate the FPS, you should set "speed=True". This FPS calculation method reference fromHybridNets`(code)
- single_cls: should keep the same bool value with training.
Prediction
python predict.py
You can change the setting in predict.py
```python
setting
sys.path.insert(0, "/home/jiayuan/ultralytics-main/ultralytics")
number = 3 #input how many tasks in your work, if you have 1 detection and 3 segmentation tasks, here should be 4.
model = YOLO('/home/jiayuan/ultralytics-main/ultralytics/runs/best.pt')
model.predict(source='/data/jiayuan/dashcamaradataset/daytime', imgsz=(384,672), device=[3],name='v4daytime', save=True, conf=0.25, iou=0.45, showlabels=False)
The predict results will save under "runs" folder
```
PS: If you want to use our provided pre-trained model, please make sure that your input images are (720,1280) size and keep "imgsz=(384,672)" to achieve the best performance, you can change the "imgsz" value, but the results maybe different because he is different from the training size.
- source: Your input or want to predict images folder.
- show_labels=False: close the display of the labels. Please keep in mind, when you use a pre-trained model with "single cell=True", labels will default to display the first class name instead.
- boxes=False: close the bos for segmentation tasks.
Note
This code is easy to extend the tasks to any multi-segmentation and detection tasks, only need to modify the model yaml and dataset yaml file information and create your dataset follows our labels format, please keep in mind, you should keep "det" in your detection tasks name and "seg" in your segmentation tasks name. Then the code will be working. No need to modify the basic code, We have done the necessary work in the basic code.
Please keep in mind, when you change the detection task number of classes, please change the "tnc" in dataset.yaml and modle.yaml. "tcn" means the total number of classes, including detection and segmentation. Such as you have 7 classes for detection, 1 segmentation and another 1 segmentation. "tnc" should be set to 9.
- "nclist" also needs to update, it should match your "labelslist" order. Such as detection-object, seg-drivable, seg-lane in your "labelslist". Then "nclist" should be [7,1,1]. That means you have 7 classes in detection-object, 1 class in drivable segmentation, and 1 class in lane segmentation.
- You also need to change the detection head output numbers, that in model.yaml, such as " - [[15, 18, 21], 1, Detect, [int number for detection class]] # 36 Detect(P3, P4, P5)", please change "int number for detection class" to your number of classes in your detection tasks, follow above examples, here should be 7.
If you want to change some basic code to implement your idea. Please search the "###### Jiayuan" or "######Jiayuan", We have changed these parts based on
YOLOv8(code) to implement multi-task in a single model.
Citation
If you find our paper and code useful for your research, please consider giving a star :star: and citation :pencil: :
BibTeX
@ARTICLE{wang2024you,
author={Wang, Jiayuan and Wu, Q. M. Jonathan and Zhang, Ning},
journal={IEEE Transactions on Vehicular Technology},
title={You Only Look at Once for Real-Time and Generic Multi-Task},
year={2024},
pages={1-13},
keywords={Multi-task learning;panoptic driving perception;object detection;drivable area segmentation;lane line segmentation},
doi={10.1109/TVT.2024.3394350}}
Owner
- Name: Jiayuan Wang
- Login: JiayuanWang-JW
- Kind: user
- Location: Windsor
- Company: University of Windsor
- Repositories: 1
- Profile: https://github.com/JiayuanWang-JW
Citation (CITATION.cff)
cff-version: 1.2.0
preferred-citation:
type: software
message: If you use this software, please cite it as below.
authors:
- family-names: Jocher
given-names: Glenn
orcid: "https://orcid.org/0000-0001-5950-6979"
- family-names: Chaurasia
given-names: Ayush
orcid: "https://orcid.org/0000-0002-7603-6750"
- family-names: Qiu
given-names: Jing
orcid: "https://orcid.org/0000-0003-3783-7069"
title: "YOLO by Ultralytics"
version: 8.0.0
# doi: 10.5281/zenodo.3908559 # TODO
date-released: 2023-1-10
license: AGPL-3.0
url: "https://github.com/ultralytics/ultralytics"
GitHub Events
Total
- Issues event: 40
- Watch event: 98
- Issue comment event: 98
- Push event: 2
- Fork event: 15
Last Year
- Issues event: 40
- Watch event: 98
- Issue comment event: 98
- Push event: 2
- Fork event: 15
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 11
- Total pull requests: 1
- Average time to close issues: 8 days
- Average time to close pull requests: 20 days
- Total issue authors: 8
- Total pull request authors: 1
- Average comments per issue: 1.64
- Average comments per pull request: 1.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 11
- Pull requests: 1
- Average time to close issues: 8 days
- Average time to close pull requests: 20 days
- Issue authors: 8
- Pull request authors: 1
- Average comments per issue: 1.64
- Average comments per pull request: 1.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- worlkingbling (6)
- yinheyanxian (4)
- Joungjimin (4)
- oyoanan (3)
- s0966066980 (2)
- wycrystal (2)
- loganwu0526 (2)
- PlutoXN (2)
- jasfa (2)
- TonyMacedonia (2)
- Miaonika (2)
- Jeremy-zhangyichen (2)
- Fuheng188 (2)
- kCW-tb (2)
- YangBo0411 (2)
Pull Request Authors
- Reversev (1)
- violetcodes (1)
- Venkat-1405 (1)
- jasfa (1)
- hu874 (1)
- FreeWilD77 (1)
- PotatoNU (1)
- taotaoland (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- pytorch/pytorch 2.0.0-cuda11.7-cudnn8-runtime build
- Pillow >=7.1.2
- PyYAML >=5.3.1
- matplotlib >=3.2.2
- opencv-python >=4.6.0
- pandas >=1.1.4
- psutil *
- requests >=2.23.0
- scipy >=1.4.1
- seaborn >=0.11.0
- torch >=1.7.0
- torchvision >=0.8.1
- tqdm >=4.64.0
- Pillow >=7.1.2
- PyYAML >=5.3.1
- check-manifest *
- coremltools >=6.0
- coverage *
- matplotlib >=3.2.2
- mkdocs-material *
- mkdocs-redirects *
- mkdocs-ultralytics-plugin *
- mkdocstrings *
- opencv-python >=4.6.0
- openvino-dev >=2022.3
- pandas >=1.1.4
- psutil *
- pytest *
- pytest-cov *
- requests >=2.23.0
- scipy >=1.4.1
- seaborn >=0.11.0
- sentry_sdk *
- tensorflowjs *
- torch >=1.7.0
- torchvision >=0.8.1
- tqdm >=4.64.0