yolov8detach

Stroma Challange via Yolov8 with freeze support. Original branch with freeze support in website

https://github.com/ubeydemavus/yolov8detach

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.5%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Stroma Challange via Yolov8 with freeze support. Original branch with freeze support in website

Basic Info

Host: GitHub
Owner: ubeydemavus
License: gpl-3.0
Language: Python
Default Branch: main
Homepage: https://github.com/theATM/yolov8
Size: 11.8 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 1
Releases: 0

Created over 3 years ago · Last pushed over 3 years ago

Metadata Files

Readme Contributing License Citation

Stroma Challenge

This repo is created as a response to Stroma challenge.

Challange summary:

We are building a computer vision pipeline for a nuts-and-bolts manufacturer. You are expected to use common machine learning frameworks to implement a system to detect, classify and track falling objects. You may use Google Colab to run your experiments. Your final outcome should be able to accurately track different types of nuts and bolts, even under varying lighting conditions, and keep count of them.

Pipeline

The object tracking pipeline uses Yolov8n model for detection. Ultralytics provide a framework to train Yolo architectures easily. However I used different variant of the repo with layer freezing capabilities while training.

The framework provides functionalities for most of the desired qualities on a detection backend model as a product such as fusing of layers for lowering computational needs, exporting to formats like ONNX (for edge devices such as jetson), and data augmentation out of the box.

The object tracking pipeline uses Simple Online and Realtime Tracking with a Deep Association Metric algorithm to track objects. The python implementation I used is the deep-sort-realtime package.

Installation

The pipeline is developed with python 3.9. You will need a python 3.9 interpreter.

First we need to install the traning framework. Then other packages needs to be installed for the pipe line

virtualenv st # create environment git clone https://github.com/ubeydemavus/yolov8detach cd yolov8detach && pip install -e .

If you have a GPU that supports CUDA, you may want to reinstall torch with CUDA capabilities. Install cuda enabled torch according to the CUDA version installed in your computer. Follow the instructions in Pytorch to install correct CUDA version. pip install --ignore-installed torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116

Tracking related libraries pip install deep-sort-realtime pip install motmetrics

Transforming the dataset

After the installation, move the challange folder into yolov8detach folder. Export the archive and move contents not the archive. Then run the command below. This will transform the video files and annotations of the dataset into a format that the framework expects for training. python coco2yolo.py

Training

Although you get a copy of the model trained on the nuts and bolts dataset, when you clone this repo. You may start re-training from the pretrained (trained with COCO dataset) Yolov8n model check point. The framework downloads the necessary weights. You may skip this step and move to the next step. The model trained on the nuts and bolts dataset is about 6MB in size with 8 GFlops of computational need and 3.7M parameters.

In order to start training, run the command below. This script, takes a pretrained Yolov8n model, freezes the first 22 blocks/layers except the detection head, then starts training with default augmentation settings, changed optimizer configurations and correct class weights for 80 epochs. When the training is finished, trained model can be found in "runs/" folder with important metrics.

python challenge_train.py

The model training metrics on the nuts and bolts dataset

Other metrrics, such as confusion matric, F1 curves etc.. can be found under "runs/detect/train2/"

Output example of detection from validation batch

Other examples can be found under "runs/detect/train2/"

Object tracking

When training is complete (or you skipped to this step), run the command below. This script will launch a window and allow you to see the tracking results in realtime for the "test.mp4" file.

When the video finishes, the tracking metrics (MOTA, MOTP) are calculated and printed on the command window screen where you run the code.

python track_objects.py

Note: If you trained your own model, don't forget to change the line to match your newly trained network weights in the script.

``` ...

Load the model which is trained for bolts and nuts dataset.

detector = YOLO(join(current_dir,"runs/detect/train2/weights/best.pt")) ... ```

Comments on the results

The detection metrics for the trained model is as follows:

| Metric | Score | | -------|-------| | mAP50 (mAP calculated at IOU threshold 0.5) | 0.99069 | | mAP50-95 (mAP calculated at IOU threshold 0.5 to 0.95) | 0.85146 | | Precision | 0.96329| | Recall | 0.97091 |

The tracking metrics for the whole pipeline is as follows: | Metric | Score | |--------|-------| | MOTA | -0.930043 | | MOTP | 0.594262 |

The detection metrics indicate that object classification accuracy is very high, when the bounding box is correctly placed (IoU areas).

I even suspect that the detection network might have overfit the dataset, thus I might want to retrain the network with stronger augmentations, to reduce overfitting.

On the other hand, tracking metrics indicate that object tracking accuracy is not good.

I believe this is because of the default tracking strategy employed by the deepsort package. Most objects that are detected confidently, are tracked very well by the deepsort algorithm. However the tracking IDs given to these detections don't match with the dataset trackids because even the wrong detections (occulusions or miss firings of yolov8) are given a tracking ID, and the nextid to be given is increased. This has a cascading effect on the IDs given to detected objects. Once an object is missed / or a wrong detection happends, all remaining object ID miss matches the dataset track_id's.

Further improvement suggestions

The challenge period is short, thus I had to resort to using popular frameworks and architectures in order to complete the challenge in time. However if I had time, here are the areas I would work on to improve the accuracy/success of the tracking pipeline.

Dataset

Make sure to use the correct normalization (and transformations in general) in training, and evaluation. The dataset is syntatic, however yolo and other detection architectures are trained with real images from mostly day light and rarely indoor environments. However bolt and nuts dataset comes from a mostly indoor, day or night time, factory environment. I would first make sure that image attribute distributions (color etc..) match, and if not, calculate necessary transformations from real data.
Increasing the data (number of classes, number of items in each class, more moving objects from different perspectives ). With more data, we can be more confident about the trained model.

Detection Network

In my research I come across few different detection architectures, but did not realy have time to test them and compare them to eachother. Below are the few models I would test before choosing yolo as the deteciton model.

Single shot detector: seem to be more appropriate for video analysis and object tracking since it is more precise compared to yolo, however it has issues with small objects.
Object localization network: This intrigued me the most since it does detection without training on class labels. a "general purpose" detector, so to speak.
Detectron 2: Powerfull and highly praised by many people. Thus, I would have liked to test it.
MiDaS: This is a single-image-depth-estimator (again a cross dataset model). I would have liked to experiment with this to improve the accuracy of the pipeline. The information this model provides could be used to improve object tracking via integrating it to deepsort algorithm's tracking strategy.

There are few more (FastRCNN, MaskRCNN, etc..), but I would explore the ones listed above first.

Once I decided on a detection architecture, I would use Distiller package (or some other library since intel has dropped its support to this library) to prune, quantize and distill the architecture into a smaller architecture (not necessaryly in this order). Finally I would fuse batchnorm2d and conv2d layers into a single layer.

I would like to note that the Ultralytics framework already does fusing to lessen the computational load. However, after pruning and/or distillation stages, we probably can't use framework's automatic fusing function. Thus we may need to do it manually.

Finally, export it to ONNX format (or Torch script) to run on any device that supports them to run the network independently from the dependencies.

Object Tracking

This is where the most work needs to be done.

Firstly, the deep-sort-reatime packege implementation of the object tracking algorithm is problematic.
The implementation updates Kalman filter covariances with constant uncertainty weights, making this implementation much less powerfull.
It doesn't provide a way to set the initial guesses for the filter, either. I found this repo much later, which seemed to implement kalman filter better, but it was too late to test.
The package also doesn't provide a low level interface to the Kalman filter implementation thus you can't change sampling time, and/or other variables that affect its estimation/filtering accuracy. Thus, I had to resort to hot-fix workaround in the lines:

``` tracker.dict['tracker'].kf.dict['stdweightposition'] = 1 # hot fix position and velocity weights, deepsort doesnt provide an interface to change kalman filter, or its parameters. tracker.dict['tracker'].kf.dict['stdweightvelocity'] = 3 # (trained model measurements is good enough -> kalman model uncertainty should be high).

```

The paper proposes and uses a constant velocity motion model, however a constant acceleration model would fit the use-case probably better since falling objects accelerate with a constant acceleration. For more info as to why, you can check here (compare the results for g-h filter vs g-h-k filter)
Also other sensor information can be used (technique known as sensor fusion) to improve the kalman filter even further, such as incorporating the detection confidence into uncertainty calculations, using depth estimation, extracted texture information, maybe using two different detection model's outputs etc.., this requires a lot more work than possible in a week.
The deep-sort-realtime package uses mobilenetv3 as the deep embedder, and provides an interface to use custom embedder. I would have liked to try few different models, before deciding on the mobilenetv3.
Finally, I would change the default tracking strategy employed in deep-sort-realtime. The default strategy is not fit to the use-cases of counting different object types in the presence of faulty detections and when an object is lost for few frames. I would continuously process the tracks with new detections even when they are lost more than few frames (use confidence information here as well), and provide a better way for reidentificaiton (instead of mobilenetv3). Because default strategy doesn't continue counting up if a new detection of a previously detected object enters the scene. The default strategy assumes the new detection to be a previously detected object coming back into the frame rather than a new object entering the frame.

Owner

Login: ubeydemavus
Kind: user

Repositories: 1
Profile: https://github.com/ubeydemavus

Citation (CITATION.cff)

cff-version: 1.2.0
preferred-citation:
  type: software
  message: If you use this software, please cite it as below.
  authors:
  - family-names: Jocher
    given-names: Glenn
    orcid: "https://orcid.org/0000-0001-5950-6979"
  - family-names: Chaurasia
    given-names: Ayush
    orcid: "https://orcid.org/0000-0002-7603-6750"
  - family-names: Qiu
    given-names: Jing
    orcid: "https://orcid.org/0000-0003-3783-7069"
  title: "YOLO by Ultralytics"
  version: 8.0.0
  # doi: 10.5281/zenodo.3908559  # TODO
  date-released: 2023-1-10
  license: GPL-3.0
  url: "https://github.com/ultralytics/ultralytics"

GitHub Events

Total

Last Year

Dependencies

docker/Dockerfile docker

nvcr.io/nvidia/pytorch 22.12-py3 build

requirements.txt pypi

Pillow >=7.1.2
PyYAML >=5.3.1
ipython *
matplotlib >=3.2.2
numpy >=1.18.5
opencv-python >=4.6.0
pandas >=1.1.4
psutil *
requests >=2.23.0
scipy >=1.4.1
seaborn >=0.11.0
tensorboard >=2.4.1
thop >=0.1.1
torch >=1.7.0
torchvision >=0.8.1
tqdm >=4.64.0

setup.py pypi

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science