https://github.com/big-data-lab-team/accident-prediction-montreal

https://github.com/big-data-lab-team/accident-prediction-montreal

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.6%) to scientific vocabulary

Keywords

accidents ai big-data big-data-analytics geospatial-data geospatial-processing machine machine-learning montreal opendata pyspark spark
Last synced: 5 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: big-data-lab-team
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: master
  • Size: 65 MB
Statistics
  • Stars: 9
  • Watchers: 6
  • Forks: 7
  • Open Issues: 7
  • Releases: 0
Topics
accidents ai big-data big-data-analytics geospatial-data geospatial-processing machine machine-learning montreal opendata pyspark spark
Created almost 7 years ago · Last pushed about 3 years ago
Metadata Files
Readme License

README.md

High-Resolution Road Vehicle Collision Prediction for the City of Montreal

This repository contains the source code developed for a study of road vehicle collisions in the city of Montreal. Three datasets provided by the city of Montreal and the Government of Canada were used: a dataset containing road vehicle collisions, a dataset describing the Canadian road network, and a dataset containing historical weather information. These datasets have been fused to generate examples corresponding to an hour period and a road segment delimited by intersections. A binary classification has been performed with positive examples, corresponding to the occurrence of a collision, and negative examples, corresponding to the non-occurrence of a collision. Four models have been built and compared, a first basic model using only the count of accident during previous years on the road segment, a model built using random forest with under-sampling of the majority class, a model using balanced random forest and a model using XGBoost. The best performances were obtained by the balanced random forest model. It identifies as positives the 13% most dangerous examples which correspond to 85% of vehicle collisions.

For more information read the corresponding scientific paper.

Folder Structure

  • mains: contains the scripts for the generation of the dataset, the hyperparameter tuning, the training and the evaluation of the models
  • notebooks: Jupyter notebooks used during development for interactive exploration of the data and experimentations
  • results: results of the four models

License

MIT

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1