recordlinkage_gunviolenceincidents

https://github.com/irishorng/recordlinkage_gunviolenceincidents

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.0%) to scientific vocabulary

Last synced: 11 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: irishorng
Language: R
Default Branch: main
Size: 69.3 KB

Statistics

Stars: 0
Watchers: 1
Forks: 1
Open Issues: 0
Releases: 1

Created almost 3 years ago · Last pushed about 1 year ago

Metadata Files

Readme Citation

RecordLinkage_GunViolenceIncidents

Authors: - Iris Horng - Qishuo Yin - Dylan Small

Contributing: William Chan, Jared Murray

For a detailed description of our framework see: - Probabilistic Record Linkage: An Application to Gun Homicides (in review)

Data: - Gun Violence Archive (GVA) - National Violent Death Reporting System (NVDRS)

The linkage process and manual verification is carried out in 4 steps 1. dataprocessing.R cleans and prepares the data. 2. applyfastlink.R uses the fastLink() method to return matches from the GVA and NVDRS datasets. 3. combiningonlinedata.R collects the GVA Standard Reports publicly available on their website. 4. getcommonmatches.R returns the records from the fastLink matches that have Incident IDs found in the GVA Standard Reports. This can then be used for manual verification of the matches as true matches, non-matches, or undetermined.

data_processing.R

Multiple steps were carried out to clean and prepare the data - first enter the file paths of your GVA and NVDRS data respectively. - make sure the data only includes incidents from 2014 to 2018. - our GVA dataset already only contained incidents from 2014 to 2018, so we only had to do this step for the NVDRS dataset. - we want to represent the date that the incident occurred as a numerical variable so that we can use this number for numerical comparing. We calculate the daysSinceStart variable, which tells you the number of days since January 1, 2014 that the incident occured. - for example, an incident that occured on January 1, 2014 would have daysSinceStart=0. An incident that occured on January 3, 2014 would have daysSinceStart=2. - Since NVDRS dataset contains all violent death incidents regardless if they resulted in a death by some sort of gun violence, we clean the data to align with the GVA's focus. - We remove NVDRS incidents where the IncidentCategory is single suicide or multiple suicide because we only want to include incidents that involved homicides. - We only keep NVDRS incidents where the WeaponType (ie. weapon used in the incident) is a Firearm or non-powder gun. - We only keep NVDRS incidents where the DeathCause involves some sort of firearm, gun, or rifle. - Before, using the fastLink() method, the variables of interest that we would like to link on must have the same name. So we cleaned up some of the names. - For GVA and NVDRS, the zip code is stored in InjuryZip and it must be type numeric. - the state that the injury occurred in is stored as InjuryState. - the city that the injury occurred in is stored as InjuryCity. - the numbered killed in the incident is stored as NumKilled. - We only keep states from each year that are well represented in the NVDRS dataset, according to the CDC Surveillence Summaries. Using this list of states for each year, we clean the GVA and NVDRS datasets accordingly. - Finally, we can save the cleaned data as an RDS or csv.

apply_fastlink.R

For a detailed description of fastLink and its installation, see Enamorado, Ted, Benjamin Fifield, and Kosuke Imai. 2017. fastLink: Fast Probabilistic Record Linkage with Missing Data. Version 0.6.

Notes: - first, take your cleaned NVDRS and GVA files that you outputted as RDS files from dataprocessing.R, and save them as NVDRS and GVA respectively. - we blocked by state for computational efficiency, but you can block on any choice of variable by changing the varnames inside the blockData() function. - `finalmergedwill store all of the matches that are returned from the fastLink() method. - fastLink has options to choose variables of interest that you would like to match on. - Invarnames, you should list all the variables of interest. - Instringdist.match, it's recommended to list the variables that are strings (ie. words) from your variables of interest. - Innumeric.match, it's recommended to list the variables that are numeric (ie. numbers) from your variables of interest. - the for-loop indicated byfor(i in 1:41)should span from 1 to the number of blocks that you have. - To see how many blocks you have, runnames(blockstate_out). Then as an example, if you have 41 blocks, your for-loop should sayfor(i in 1:41)`. - if there is an error with running fastLink, it is most likely that one of the blocks does not have enough observations to carry out probabilistic record linkage, so you should create separate for-loops to avoid that block. - save the fastLink matches as RDS or csv file.

combiningonlinedata.R

Here, we combining GVA Standard Reports From GVA's website (https://www.gunviolencearchive.org/reports), select which standard reports you would like to use to compare with your fastLink merged dataset. - download each of the standard reports and save them as csv files. You can then read each of the csv files and store them as their respective names - merge all of the data together into one file by doing bind_rows(one_data_set, another_data_set). - clean this combined data so that it only contains states for each year that are well represented in the NVDRS dataset (this is the same criteria we used to clean our original GVA and NVDRS datasets) - save the combined file as an RDS file or csv file .

getcommonmatches.R

first store your original cleaned GVA as our_GVA, your combined csv file from combiningonlinedata.R as online_records and your fastLink matches csv file from applyfastlink.R as merged. We transform them into lists named ourdata, onlinedata, and `ourmerged` respectively. There are two outputs you can view
We can compare the collected GVA standard reports (onlinedata) with your original cleaned GVA dataset (ourdata) to see common Incident IDs .
compare the collected GVA standard reports (onlinedata) with your fastLink matches (our_merged) to see common Incident IDs, which are stored in merged_matches.
return records from the fastLink merged dataset that have those common Incident IDS, this is stored as merged_keep_matches.
Note: since multiple records may have the same incident ID (representing multiple deaths for one incident), merged_keep_matches may have more records than the number of IDs in merged_matches.
save the merged_keep_matches as a csv file.

Owner

Login: irishorng
Kind: user

Repositories: 1
Profile: https://github.com/irishorng

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Horng
  given-names: Iris
  orcid: "https://orcid.org/0009-0002-6293-053X"
- family-names: Yin
  given-names: Qishuo
- family-names: Small
  given-names: Dylan
doi: https://doi.org/10.5281/zenodo.13901523
title: "RecordLinkage_GunViolenceIncidents"
version: 1.0.0
date-released: 2024-10-07
url: "https://github.com/irishorng/RecordLinkage_GunViolenceIncidents"

GitHub Events

Total

Watch event: 1
Push event: 11
Pull request event: 1
Fork event: 1
Create event: 1

Last Year

Watch event: 1
Push event: 11
Pull request event: 1
Fork event: 1
Create event: 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science