openpdi
A Python 3 library for decentralized aggregation of data from the Police Data Initiative (PDI).
Science Score: 28.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.1%) to scientific vocabulary
Keywords
Repository
A Python 3 library for decentralized aggregation of data from the Police Data Initiative (PDI).
Basic Info
Statistics
- Stars: 3
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
OpenPDI

OpenPDI is an unofficial effort to document and standardize data submitted to the Police Data Initiative (PDI). The goal is to make the data more accessible by addressing a number of issues related to a lack of standardization—namely,
- File types: While some agencies make use if the
Socrata Open Data API, many provide their data
in raw
.csv,.xlsx, or.xlsfiles of varying structures. - Column names: Many columns that represent the same data (e.g.,
race) are named differently across departments, cities, and states. - Value formats: Dates, times, and other comparable fields are submitted in many different formats.
- Column availability: It's currently very difficult to identify data sources that contain certain columns—e.g., Use of Force data specifying the hire date of the involved officer(s).
Getting Started
Installation
shell
$ pip install openpdi
Usage
| Dataset | ID | Source |
|-------------------|-------|-------------------------------------------------------------|
| Use of Force | uof | https://www.policedatainitiative.org/datasets/use-of-force/ |
```python import csv import openpdi
The library has a single entry point:
dataset = openpdi.Dataset(
# The dataset ID (see the table above).
"uof",
# Limit the data sources to a specific state using its two-letter code.
#
# Default: scope=[].
scope=["TX"],
# A list of columns that must be provided in every data source included in
# this dataset. See openpdi/meta/{ID}/schema.json for the available
# columns.
#
# Default: columns=[].
columns=["reason"],
# If True, only return the user-specified columns -- i.e., those listed
# in the columns parameter.
#
# Default: strict=False.
strict=False)
The names of the agencies included in this dataset:
print(dataset.agencies)
The URLs of the external data sources inlcuded in this dataset:
print(dataset.sources)
gen is a generator object for iterating over the CSV-formatted dataset.
gen = dataset.download()
Write to a CSV file:
with open("dataset.csv", "w+") as f: writer = csv.writer(f, delimiter=",", quoting=csv.QUOTE_ALL) writer.writerows(gen) ```
Datasets
In an attempt to avoid unnecessary bloat (in terms of GBs), we don't actually store any PDI data in this repository. Instead, we store small, JSON-formatted descriptions of externally hosted datasets—for example, uof/CA/meta.json:
json
[
{
"url": "https://www.norwichct.org/Archive.aspx?AMID=61&Type=Recent",
"type": "csv",
"start": 1,
"columns": {
"date": {
"index": 0,
"specifier": "%m/%d/%Y"
},
"city": {
"raw": "Richmond"
},
"state": {
"raw": "CA"
},
"service_type": {
"index": 1
},
"force_type": {
"index": 10
},
"light_conditions": {
"index": 8
},
"weather_conditions": {
"index": 7
},
"reason": {
"index": 2
},
"officer_injured": {
"index": 6
},
"officer_race": {
"index": 9
},
"subject_injured": {
"index": 5
},
"aggravating_factors": {
"index": 3
},
"arrested": {
"index": 4
}
}
}
]
This file describes a Use of Force (uof) dataset from Richmond, CA. Each entry in the columns array maps a column from the externally-hosted data to a column in the dataset's schema file (uof/schema.json).

The schema.json file assigns a format to every possible column in a particular dataset, which is a Python function tasked with standardizing a raw column value (see openpdi/validators.py).
Owner
- Name: Joseph Kato
- Login: jdkato
- Kind: user
- Company: @errata-ai
- Twitter: jdkato
- Repositories: 44
- Profile: https://github.com/jdkato
Citation (CITATION)
@ARTICLE{openpdi,
AUTHOR = {Joseph Kato},
TITLE = {OpenPDI: An unofficial effort to standardize data submitted to the Police Data Initiative},
YEAR = {2018},
JOURNAL = {To appear}
}
GitHub Events
Total
Last Year
Committers
Last synced: 7 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Joseph Kato | j****h@j****o | 57 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0