ic-uda-final-project

Final Project for the Unstructured Data Analysis module in the MSc. Machine Learning and Data Science Course

https://github.com/martinbatek/ic-uda-final-project

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.0%) to scientific vocabulary

Keywords

climate-change convolutional-neural-networks deep-learning image-classification unstructured-data wildfire-detection

Last synced: 6 months ago · JSON representation

Repository

Final Project for the Unstructured Data Analysis module in the MSc. Machine Learning and Data Science Course

Basic Info

Host: GitHub
Owner: martinbatek
Language: Jupyter Notebook
Default Branch: main
Homepage:
Size: 500 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Topics

climate-change convolutional-neural-networks deep-learning image-classification unstructured-data wildfire-detection

Created about 2 years ago · Last pushed about 2 years ago

Metadata Files

Readme Citation

Unstructured Data Analysis Final Project

Imperial College London - Machine Learning and Data Science

Problem Statement

Train a Convolutional Neural Network classifier for fire image classification.
Investigate the effect of the preprocessing, hidden and output layers of the trained CNN model on an example image.
Evaluate the Accuracy, Precision and Recall of the model on Validation, Testing and Out-of-Sample images.

Repository Contents

~/data/
- FLAME_Dataset_subset/: A 100MB subset of the 1.46GB FLAME Dataset used for model training in sample testing
- create_data_subset.ipynb: Python script used to create the subset of the larger FLAME dataset
- Kaggle FIRE dataset/: A dataset used for out of sample testing
~/figures/: Figures exported and saved throughout the analysis, and used in the main report.
~/latex_report/
- UDA_FinalProject_Batek.pdf: The PDF report to be submitted on Coursera.
- UDA_FinalProject_Batek.tex: LaTeX script used to compile the PDF report.
~/models/:
- dcnn_model_checkpoints/: Directory containing model copies saved after each epoch during training.
- dcnn_model_training.hist: A Python Dictionary containing the training metrics (TP, FP, TN, FN, Accuracy, Precision, Recall, AUC for Training and Validation) for each epoch during training.
- dcnn_model.keras: A trained Xception Classifier model, trained using the entire 1 GB FLAME Training set
- dcnn_model_simple.keras: A trained Xception Classifier model, trained using only the training data from ~/data/FLAME_Dataset_subset/
~/reference_material/: Directory contained reference article PDFs and the references.bib file for the assignment.
~/scripts/: Directory containing the assignment script UDA_FinalProject_Batek.ipynb

Technical Specifications

Hardware

The project was compiled exclusively on my Miscrosoft Surface Laptop 4, which has the following specifications:

Throughout the project, I relied on CPU computing only, as I did not have access to GPU resources.

Dependencies

Please consult the environment.yml, requirement.txt and spec-file.txt for a specific list detailing the Python environment dependencies. The key dependencies are: - Python v3.7.12 - Tensorflow v2.3.0 - Keras v2.4

The list above is not exhaustive - please refer to the aforementioned files if necessary. To replicate the python environment, execute: conda create --name <env_name> --file spec-file.txt with conda or pip install -r requirements.txt after creating and activating a virtual environment with virtualenv

Instructions

The Python Notebook containing the code for the assignment is located in ~/scripts/UDA_FinalProject_Batek.ipynb. This notebook was written so that each section separated by markdown headings can be run independently. However, it can also be run sequentially all at once.

Before doing so, be sure to chance the dataset root_path variables throughout the notebook:

Runtime for the model training section of the script was 5 hours and 35 minutes with the hardware specifications above.

Owner

Login: martinbatek
Kind: user

Repositories: 1
Profile: https://github.com/martinbatek

GitHub Events

Total

Last Year

Committers

Last synced: about 2 years ago

All Time

Total Commits: 73
Total Committers: 1
Avg Commits per committer: 73.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 73
Committers: 1
Avg Commits per committer: 73.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Martin Batek	m**k@g**m	73

Issues and Pull Requests

Last synced: about 2 years ago

All Time

Total issues: 0
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

ic-uda-final-project

Science Score: 26.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Unstructured Data Analysis Final Project

Problem Statement

Repository Contents

Technical Specifications

Hardware

Dependencies

Instructions

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels