fars_cleaner
fars_cleaner: A Python package for downloading and pre-processing vehicle fatality data in the US - Published in JOSS (2022)
Science Score: 46.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: joss.theoj.org, zenodo.org -
✓Committers with academic emails
3 of 4 committers (75.0%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.4%) to scientific vocabulary
Scientific Fields
Repository
Basic Info
- Host: GitHub
- Owner: mzabrams
- License: bsd-3-clause
- Language: Python
- Default Branch: master
- Size: 4.91 MB
Statistics
- Stars: 4
- Watchers: 2
- Forks: 3
- Open Issues: 4
- Releases: 13
Metadata Files
README.md
FARS Cleaner fars_cleaner
fars-cleaner is a Python library for downloading and pre-processing data
from the Fatality Analysis Reporting System, collected annually by NHTSA since
1975.
Installation
The preferred installation method is through conda.
bash
conda install -c conda-forge fars_cleaner
You can also install with pip.
bash
pip install fars-cleaner
Usage
Downloading FARS data
The FARSFetcher class provides an interface to download and unzip selected years from the NHTSA FARS FTP server.
The class uses pooch to download and unzip the selected files. By default, files are unzipped to your OS's cache directory.
```python from fars_cleaner import FARSFetcher
Prepare for FARS file download, using the OS cache directory.
fetcher = FARSFetcher()
Suggested usage is to download files to a data directory in your current project directory.
Passing `project_dir` will download files to `project_dir/data/fars` by default. This behavior can be
overridden by setting `cache_path` as well. Setting `cache_path` alone provides a direct path to the directory
you want to download files into.
python
from pathlib import Path
from fars_cleaner import FARSFetcher
SOME_PATH = Path("/YOUR/PROJECT/PATH")
Prepare to download to /YOUR/PROJECT/PATH/data/fars
This is the recommended usage.
fetcher = FARSFetcher(projectdir=SOMEPATH)
Prepare to download to /YOUR/PROJECT/PATH/fars
cachepath = "fars" fetcher = FARSFetcher(projectdir=SOMEPATH, cachepath=cache_path)
cache_path = Path("/SOME/TARGET/DIRECTORY")
Prepare to download directly to a specific directory.
fetcher = FARSFetcher(cachepath=cachepath) ```
Files can be downloaded in their entirety (data from 1975-2018), as a single year, or across a specified year range.
Downloading all of the data can be quite time consuming. The download will simultaneously unzip the folders, and delete
the zip files. Each zipped file will be unzipped and saved in a folder {YEAR}.unzip
```python
Fetch all data
fetcher.fetch_all()
Fetch a single year
fetcher.fetch_single(1984)
Fetch data in a year range (inclusive).
fetcher.fetch_subset(1999, 2007) ```
Processing FARS data
Calling load_pipeline will allow for full loading and pre-processing of the FARS data requested by the user.
```python
from farscleaner import FARSFetcher, loadpipeline
fetcher = FARSFetcher(projectdir=SOMEPATH) vehicles, accidents, people = loadpipeline(fetcher=fetcher, firstrun=True, targetfolder=SOMEPATH) ```
Calling load_basic allows for simple loading of the FARS data for a single year, with no preprocessing. Files must
be prefetched using a FARSFetcher or similar method. A mapper dictionary must be provided to identify what, if
any, columns require renaming.
```python from farscleaner.dataloader import load_basic
vehicles, accidents, people = loadbasic(year=1975, datadir=SOME_PATH, mapping=mappings) ```
Requirements
Downloading and processing the full FARS dataset currently runs out of memory on Windows machines with only 16GB RAM. It is recommended to have at least 32GB RAM on Windows systems. macOS and Linux run with no issues on 16GB systems.
Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change. See CONTRIBUTING.md for more details.
License
Owner
- Login: mzabrams
- Kind: user
- Repositories: 35
- Profile: https://github.com/mzabrams
GitHub Events
Total
- Issues event: 1
- Watch event: 1
- Push event: 1
- Pull request event: 2
- Fork event: 1
Last Year
- Issues event: 1
- Watch event: 1
- Push event: 1
- Pull request event: 2
- Fork event: 1
Committers
Last synced: 5 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| mzabrams | m****s@d****u | 80 |
| Mitchell Abrams | m****2@d****u | 27 |
| Mitchell Abrams | m****l@e****k | 2 |
| Robert R. Henry | r****y@g****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 9
- Total pull requests: 4
- Average time to close issues: 26 days
- Average time to close pull requests: 11 days
- Total issue authors: 5
- Total pull request authors: 3
- Average comments per issue: 1.44
- Average comments per pull request: 0.25
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 1
Past Year
- Issues: 1
- Pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: about 1 month
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- ethanwhite (4)
- svburke (2)
- mollyboigon (1)
- mzabrams (1)
- RobertHenry6bev (1)
Pull Request Authors
- mzabrams (2)
- RobertHenry6bev (1)
- dependabot[bot] (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- appdirs ==1.4.4
- certifi ==2022.6.15
- charset-normalizer ==2.1.0
- cloudpickle ==2.1.0
- colorama ==0.4.5
- dask ==2022.6.1
- fsspec ==2022.5.0
- idna ==3.3
- lazy-loader ==0.1rc2
- locket ==1.0.0
- multipledispatch ==0.6.0
- natsort ==8.1.0
- numpy ==1.23.0
- packaging ==21.3
- pandas ==1.4.3
- pandas-flavor ==0.3.0
- partd ==1.2.0
- pathlib ==1.0.1
- pooch ==1.6.0
- pyjanitor ==0.23.1
- pyparsing ==3.0.9
- python-dateutil ==2.8.2
- pytz ==2022.1
- pyyaml ==6.0
- requests ==2.28.1
- scipy ==1.8.1
- six ==1.16.0
- toolz ==0.11.2
- tqdm ==4.64.0
- urllib3 ==1.26.9
- xarray ==2022.3.0
- actions/checkout v3 composite
- actions/setup-python v4 composite
- snok/install-poetry v1.3.1 composite
- actions/checkout v2 composite
- actions/upload-artifact v1 composite
- openjournals/openjournals-draft-action master composite
- actions/cache v2 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- pypa/gh-action-pypi-publish release/v1 composite
- snok/install-poetry v1 composite
- dask
- distributed
- numpy
- pandas
- pathlib
- pooch >=1.6.0
- pyjanitor
- python >=3.8
- requests
- thefuzz
- tqdm