openpdi

A Python 3 library for decentralized aggregation of data from the Police Data Initiative (PDI).

https://github.com/jdkato/openpdi

Science Score: 28.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.1%) to scientific vocabulary

Keywords

data-science machine-learning nlp open-data python3
Last synced: 6 months ago · JSON representation ·

Repository

A Python 3 library for decentralized aggregation of data from the Police Data Initiative (PDI).

Basic Info
  • Host: GitHub
  • Owner: jdkato
  • License: mit
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 125 KB
Statistics
  • Stars: 3
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
data-science machine-learning nlp open-data python3
Created over 7 years ago · Last pushed about 5 years ago
Metadata Files
Readme License Citation

README.md

OpenPDI Build Status code style DOI PyPI - Python Version

OpenPDI is an unofficial effort to document and standardize data submitted to the Police Data Initiative (PDI). The goal is to make the data more accessible by addressing a number of issues related to a lack of standardization—namely,

  • File types: While some agencies make use if the Socrata Open Data API, many provide their data in raw .csv, .xlsx, or .xls files of varying structures.
  • Column names: Many columns that represent the same data (e.g., race) are named differently across departments, cities, and states.
  • Value formats: Dates, times, and other comparable fields are submitted in many different formats.
  • Column availability: It's currently very difficult to identify data sources that contain certain columns—e.g., Use of Force data specifying the hire date of the involved officer(s).

Getting Started

Installation

shell $ pip install openpdi

Usage

| Dataset | ID | Source | |-------------------|-------|-------------------------------------------------------------| | Use of Force | uof | https://www.policedatainitiative.org/datasets/use-of-force/ |

```python import csv import openpdi

The library has a single entry point:

dataset = openpdi.Dataset( # The dataset ID (see the table above). "uof", # Limit the data sources to a specific state using its two-letter code. # # Default: scope=[]. scope=["TX"], # A list of columns that must be provided in every data source included in # this dataset. See openpdi/meta/{ID}/schema.json for the available # columns. # # Default: columns=[]. columns=["reason"], # If True, only return the user-specified columns -- i.e., those listed # in the columns parameter. # # Default: strict=False. strict=False)

The names of the agencies included in this dataset:

print(dataset.agencies)

The URLs of the external data sources inlcuded in this dataset:

print(dataset.sources)

gen is a generator object for iterating over the CSV-formatted dataset.

gen = dataset.download()

Write to a CSV file:

with open("dataset.csv", "w+") as f: writer = csv.writer(f, delimiter=",", quoting=csv.QUOTE_ALL) writer.writerows(gen) ```

Datasets

In an attempt to avoid unnecessary bloat (in terms of GBs), we don't actually store any PDI data in this repository. Instead, we store small, JSON-formatted descriptions of externally hosted datasets—for example, uof/CA/meta.json:

json [ { "url": "https://www.norwichct.org/Archive.aspx?AMID=61&Type=Recent", "type": "csv", "start": 1, "columns": { "date": { "index": 0, "specifier": "%m/%d/%Y" }, "city": { "raw": "Richmond" }, "state": { "raw": "CA" }, "service_type": { "index": 1 }, "force_type": { "index": 10 }, "light_conditions": { "index": 8 }, "weather_conditions": { "index": 7 }, "reason": { "index": 2 }, "officer_injured": { "index": 6 }, "officer_race": { "index": 9 }, "subject_injured": { "index": 5 }, "aggravating_factors": { "index": 3 }, "arrested": { "index": 4 } } } ]

This file describes a Use of Force (uof) dataset from Richmond, CA. Each entry in the columns array maps a column from the externally-hosted data to a column in the dataset's schema file (uof/schema.json).

flow

The schema.json file assigns a format to every possible column in a particular dataset, which is a Python function tasked with standardizing a raw column value (see openpdi/validators.py).

Owner

  • Name: Joseph Kato
  • Login: jdkato
  • Kind: user
  • Company: @errata-ai

Citation (CITATION)

@ARTICLE{openpdi,
   AUTHOR  = {Joseph Kato},
   TITLE   = {OpenPDI: An unofficial effort to standardize data submitted to the Police Data Initiative},
   YEAR    = {2018},
   JOURNAL = {To appear}
}

GitHub Events

Total
Last Year

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 57
  • Total Committers: 1
  • Avg Commits per committer: 57.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Joseph Kato j****h@j****o 57
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels