automated-california-av-dataset

AV Collision report dataset from California DMV

https://github.com/saquibmh/automated-california-av-dataset

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.1%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

AV Collision report dataset from California DMV

Basic Info
  • Host: GitHub
  • Owner: saquibmh
  • License: cc0-1.0
  • Language: Python
  • Default Branch: main
  • Size: 160 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 1
  • Open Issues: 0
  • Releases: 2
Created over 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

Collision Report Extraction

A collision report extractor to extract Information from California DMV AV collision reports.

The California DMV receives collision reports for Autonomous Vehicles in PDF format, making it challenging to manually compile all the data from these reports. To address this issue, an extractor has been developed to automatically extract relevant information from the collision reports and consolidate it into a single Excel file. This tool aims to simplify the process of gathering and analyzing collision data for Autonomous Vehicles.

DOI

Authors: Saquib M Haroon and Alyssa Ryan @ University of Arizona

Libraries used

The successful implementation of the extractor relied on the utilization of specific libraries. These libraries played a crucial role in generating the final Excel file from the Autonomous Vehicle collision reports.

easyOCR
pdf2image
openpyxl
OpenCV
NumPy

Extracted Dataset

Find the latest extracted dataset upto June 2023 here

Future Work

Use NLP models to automatically extract Injury information from the description.
Geocode the Address so as to identify collision coordinates.

Please feel free to contribute to this project

Lab website

Visit our Website: Ryan Research Lab.

Please Cite the Paper

Haroon, S. M., & Ryan, A. (2024). Understanding key factors in automated vehicle collisions: Automating data extraction and analyzing key insights using explainable AI. Journal of Transportation Safety & Security, 1-24. Link

Owner

  • Login: saquibmh
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.1.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Haroon
    given-names: Saquib Mohammed
    orcid: https://orcid.org/0009-0001-3788-9590
title: saquibmh/Automated-California-AV-Dataset: Automated AV database generator
version: 1.0
date-released: 2023-07-10

GitHub Events

Total
Last Year