masters-thesis-ip102-classification
Code for my Master's Thesis in Data Science: Pest Insect Classification on the IP102 Dataset. The paper focuses on improving classification performance on this dataset by addressing two of its major issues, namely class imbalance and intra-class variance.
https://github.com/alexandruiordan99/masters-thesis-ip102-classification
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.3%) to scientific vocabulary
Keywords
Repository
Code for my Master's Thesis in Data Science: Pest Insect Classification on the IP102 Dataset. The paper focuses on improving classification performance on this dataset by addressing two of its major issues, namely class imbalance and intra-class variance.
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
ReadMe.md
Overview
This repository contains code for a Master's Thesis on Pest Insect Classification using the IP102 dataset. The paper focuses on improving classification performance on this dataset by addressing two of its major issues, namely class imbalance and intra-class variance. The former is addressed through the use of data augmentation, and transfer-learning using ImageNet weights, while the latter is addressed through the use of a clustering algorithm. These techniques are implemented using EfficientNet models across versions 1 and 2.
In the main folder, the DatasettoBGR lets users quickly change and save the IP102 dataset in a BGR format. The F1Calculator lets users calculate F1-Scores using the accuracy and recall metrics from the model evaluation output. Finally, geometricsmote is a local copy of the code from https://github.com/georgedouzas/imbalanced-learn-extra. This local copy was necessary because the geometricsmote import was not compatible with Python 3.12.6. The authors are credited at the top of that file.
To run:
Use a Linux machine
Install up to date machine learning drivers from NVIDIA
Prefferably use at least an NVIDIA RTX 4080 (10GB of RAM). Otherwise many of the models will stop working due to not enough VRAM. If you understandably do not own such an expensive graphics card, lower the batch size of the model. This will however lower its accuracy.
Clone the repo
Obtain the dataset from its authors or from Kaggle. This is linked instead of the authors page because their google drive link is broken.
Update the paths to the locations of your respective training, validation and testing sets.
Owner
- Login: AlexandruIordan99
- Kind: user
- Repositories: 1
- Profile: https://github.com/AlexandruIordan99
Citation (Citations.py)
GitHub Events
Total
- Push event: 5
Last Year
- Push event: 5