https://github.com/big-data-lab-team/accident-prediction-montreal

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (8.6%) to scientific vocabulary

Keywords

accidents ai big-data big-data-analytics geospatial-data geospatial-processing machine machine-learning montreal opendata pyspark spark

Last synced: 10 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: big-data-lab-team
License: mit
Language: Jupyter Notebook
Default Branch: master
Size: 65 MB

Statistics

Stars: 9
Watchers: 6
Forks: 7
Open Issues: 7
Releases: 0

Topics

accidents ai big-data big-data-analytics geospatial-data geospatial-processing machine machine-learning montreal opendata pyspark spark

Created over 7 years ago · Last pushed over 3 years ago

Metadata Files

Readme License

High-Resolution Road Vehicle Collision Prediction for the City of Montreal

This repository contains the source code developed for a study of road vehicle collisions in the city of Montreal. Three datasets provided by the city of Montreal and the Government of Canada were used: a dataset containing road vehicle collisions, a dataset describing the Canadian road network, and a dataset containing historical weather information. These datasets have been fused to generate examples corresponding to an hour period and a road segment delimited by intersections. A binary classification has been performed with positive examples, corresponding to the occurrence of a collision, and negative examples, corresponding to the non-occurrence of a collision. Four models have been built and compared, a first basic model using only the count of accident during previous years on the road segment, a model built using random forest with under-sampling of the majority class, a model using balanced random forest and a model using XGBoost. The best performances were obtained by the balanced random forest model. It identifies as positives the 13% most dangerous examples which correspond to 85% of vehicle collisions.

For more information read the corresponding scientific paper.

Folder Structure

mains: contains the scripts for the generation of the dataset, the hyperparameter tuning, the training and the evaluation of the models
notebooks: Jupyter notebooks used during development for interactive exploration of the data and experimentations
results: results of the four models

License

MIT

GitHub Events

Total

Watch event: 1

Last Year

Watch event: 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science