fars_cleaner

fars_cleaner: A Python package for downloading and pre-processing vehicle fatality data in the US - Published in JOSS (2022)

https://github.com/mzabrams/fars_cleaner

Science Score: 46.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: joss.theoj.org, zenodo.org
  • Committers with academic emails
    3 of 4 committers (75.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.4%) to scientific vocabulary

Scientific Fields

Mathematics Computer Science - 84% confidence
Earth and Environmental Sciences Physical Sciences - 83% confidence
Last synced: 4 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: mzabrams
  • License: bsd-3-clause
  • Language: Python
  • Default Branch: master
  • Size: 4.91 MB
Statistics
  • Stars: 4
  • Watchers: 2
  • Forks: 3
  • Open Issues: 4
  • Releases: 13
Created almost 6 years ago · Last pushed 11 months ago
Metadata Files
Readme Contributing License Zenodo

README.md

GitHub release (latest SemVer) PyPI conda-forge License DOI

status

FARS Cleaner fars_cleaner

fars-cleaner is a Python library for downloading and pre-processing data from the Fatality Analysis Reporting System, collected annually by NHTSA since 1975.

Installation

The preferred installation method is through conda. bash conda install -c conda-forge fars_cleaner You can also install with pip.

bash pip install fars-cleaner

Usage

Downloading FARS data

The FARSFetcher class provides an interface to download and unzip selected years from the NHTSA FARS FTP server. The class uses pooch to download and unzip the selected files. By default, files are unzipped to your OS's cache directory.

```python from fars_cleaner import FARSFetcher

Prepare for FARS file download, using the OS cache directory.

fetcher = FARSFetcher() Suggested usage is to download files to a data directory in your current project directory. Passing `project_dir` will download files to `project_dir/data/fars` by default. This behavior can be overridden by setting `cache_path` as well. Setting `cache_path` alone provides a direct path to the directory you want to download files into. python from pathlib import Path from fars_cleaner import FARSFetcher

SOME_PATH = Path("/YOUR/PROJECT/PATH")

Prepare to download to /YOUR/PROJECT/PATH/data/fars

This is the recommended usage.

fetcher = FARSFetcher(projectdir=SOMEPATH)

Prepare to download to /YOUR/PROJECT/PATH/fars

cachepath = "fars" fetcher = FARSFetcher(projectdir=SOMEPATH, cachepath=cache_path)

cache_path = Path("/SOME/TARGET/DIRECTORY")

Prepare to download directly to a specific directory.

fetcher = FARSFetcher(cachepath=cachepath) ```

Files can be downloaded in their entirety (data from 1975-2018), as a single year, or across a specified year range. Downloading all of the data can be quite time consuming. The download will simultaneously unzip the folders, and delete the zip files. Each zipped file will be unzipped and saved in a folder {YEAR}.unzip ```python

Fetch all data

fetcher.fetch_all()

Fetch a single year

fetcher.fetch_single(1984)

Fetch data in a year range (inclusive).

fetcher.fetch_subset(1999, 2007) ```

Processing FARS data

Calling load_pipeline will allow for full loading and pre-processing of the FARS data requested by the user. ```python from farscleaner import FARSFetcher, loadpipeline

fetcher = FARSFetcher(projectdir=SOMEPATH) vehicles, accidents, people = loadpipeline(fetcher=fetcher, firstrun=True, targetfolder=SOMEPATH) ```

Calling load_basic allows for simple loading of the FARS data for a single year, with no preprocessing. Files must be prefetched using a FARSFetcher or similar method. A mapper dictionary must be provided to identify what, if any, columns require renaming.

```python from farscleaner.dataloader import load_basic

vehicles, accidents, people = loadbasic(year=1975, datadir=SOME_PATH, mapping=mappings) ```

Requirements

Downloading and processing the full FARS dataset currently runs out of memory on Windows machines with only 16GB RAM. It is recommended to have at least 32GB RAM on Windows systems. macOS and Linux run with no issues on 16GB systems.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change. See CONTRIBUTING.md for more details.

License

BSD-3 Clause

Owner

  • Login: mzabrams
  • Kind: user

GitHub Events

Total
  • Issues event: 1
  • Watch event: 1
  • Push event: 1
  • Pull request event: 2
  • Fork event: 1
Last Year
  • Issues event: 1
  • Watch event: 1
  • Push event: 1
  • Pull request event: 2
  • Fork event: 1

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 110
  • Total Committers: 4
  • Avg Commits per committer: 27.5
  • Development Distribution Score (DDS): 0.273
Past Year
  • Commits: 1
  • Committers: 1
  • Avg Commits per committer: 1.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
mzabrams m****s@d****u 80
Mitchell Abrams m****2@d****u 27
Mitchell Abrams m****l@e****k 2
Robert R. Henry r****y@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 9
  • Total pull requests: 4
  • Average time to close issues: 26 days
  • Average time to close pull requests: 11 days
  • Total issue authors: 5
  • Total pull request authors: 3
  • Average comments per issue: 1.44
  • Average comments per pull request: 0.25
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 1
Past Year
  • Issues: 1
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: about 1 month
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • ethanwhite (4)
  • svburke (2)
  • mollyboigon (1)
  • mzabrams (1)
  • RobertHenry6bev (1)
Pull Request Authors
  • mzabrams (2)
  • RobertHenry6bev (1)
  • dependabot[bot] (1)
Top Labels
Issue Labels
Pull Request Labels
dependencies (1)

Dependencies

requirements.txt pypi
  • appdirs ==1.4.4
  • certifi ==2022.6.15
  • charset-normalizer ==2.1.0
  • cloudpickle ==2.1.0
  • colorama ==0.4.5
  • dask ==2022.6.1
  • fsspec ==2022.5.0
  • idna ==3.3
  • lazy-loader ==0.1rc2
  • locket ==1.0.0
  • multipledispatch ==0.6.0
  • natsort ==8.1.0
  • numpy ==1.23.0
  • packaging ==21.3
  • pandas ==1.4.3
  • pandas-flavor ==0.3.0
  • partd ==1.2.0
  • pathlib ==1.0.1
  • pooch ==1.6.0
  • pyjanitor ==0.23.1
  • pyparsing ==3.0.9
  • python-dateutil ==2.8.2
  • pytz ==2022.1
  • pyyaml ==6.0
  • requests ==2.28.1
  • scipy ==1.8.1
  • six ==1.16.0
  • toolz ==0.11.2
  • tqdm ==4.64.0
  • urllib3 ==1.26.9
  • xarray ==2022.3.0
.github/workflows/ci.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
  • snok/install-poetry v1.3.1 composite
.github/workflows/joss-draft-pdf.yml actions
  • actions/checkout v2 composite
  • actions/upload-artifact v1 composite
  • openjournals/openjournals-draft-action master composite
.github/workflows/pythonpublish.yml actions
  • actions/cache v2 composite
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
  • pypa/gh-action-pypi-publish release/v1 composite
  • snok/install-poetry v1 composite
pyproject.toml pypi
environment.yml conda
  • dask
  • distributed
  • numpy
  • pandas
  • pathlib
  • pooch >=1.6.0
  • pyjanitor
  • python >=3.8
  • requests
  • thefuzz
  • tqdm