recordlinkage_gunviolenceincidents
https://github.com/irishorng/recordlinkage_gunviolenceincidents
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.0%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: irishorng
- Language: R
- Default Branch: main
- Size: 69.3 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
RecordLinkage_GunViolenceIncidents
Authors: - Iris Horng - Qishuo Yin - Dylan Small
Contributing: William Chan, Jared Murray
For a detailed description of our framework see: - Probabilistic Record Linkage: An Application to Gun Homicides (in review)
Data: - Gun Violence Archive (GVA) - National Violent Death Reporting System (NVDRS)
The linkage process and manual verification is carried out in 4 steps 1. dataprocessing.R cleans and prepares the data. 2. applyfastlink.R uses the fastLink() method to return matches from the GVA and NVDRS datasets. 3. combiningonlinedata.R collects the GVA Standard Reports publicly available on their website. 4. getcommonmatches.R returns the records from the fastLink matches that have Incident IDs found in the GVA Standard Reports. This can then be used for manual verification of the matches as true matches, non-matches, or undetermined.
data_processing.R
Multiple steps were carried out to clean and prepare the data
- first enter the file paths of your GVA and NVDRS data respectively.
- make sure the data only includes incidents from 2014 to 2018.
- our GVA dataset already only contained incidents from 2014 to 2018, so we only had to do this step for the NVDRS dataset.
- we want to represent the date that the incident occurred as a numerical variable so that we can use this number for numerical comparing. We calculate the daysSinceStart variable, which tells you the number of days since January 1, 2014 that the incident occured.
- for example, an incident that occured on January 1, 2014 would have daysSinceStart=0. An incident that occured on January 3, 2014 would have daysSinceStart=2.
- Since NVDRS dataset contains all violent death incidents regardless if they resulted in a death by some sort of gun violence, we clean the data to align with the GVA's focus.
- We remove NVDRS incidents where the IncidentCategory is single suicide or multiple suicide because we only want to include incidents that involved homicides.
- We only keep NVDRS incidents where the WeaponType (ie. weapon used in the incident) is a Firearm or non-powder gun.
- We only keep NVDRS incidents where the DeathCause involves some sort of firearm, gun, or rifle.
- Before, using the fastLink() method, the variables of interest that we would like to link on must have the same name. So we cleaned up some of the names.
- For GVA and NVDRS, the zip code is stored in InjuryZip and it must be type numeric.
- the state that the injury occurred in is stored as InjuryState.
- the city that the injury occurred in is stored as InjuryCity.
- the numbered killed in the incident is stored as NumKilled.
- We only keep states from each year that are well represented in the NVDRS dataset, according to the CDC Surveillence Summaries. Using this list of states for each year, we clean the GVA and NVDRS datasets accordingly.
- Finally, we can save the cleaned data as an RDS or csv.
apply_fastlink.R
For a detailed description of fastLink and its installation, see Enamorado, Ted, Benjamin Fifield, and Kosuke Imai. 2017. fastLink: Fast Probabilistic Record Linkage with Missing Data. Version 0.6.
Notes:
- first, take your cleaned NVDRS and GVA files that you outputted as RDS files from dataprocessing.R, and save them as NVDRS and GVA respectively.
- we blocked by state for computational efficiency, but you can block on any choice of variable by changing the varnames inside the blockData() function.
- `finalmergedwill store all of the matches that are returned from the fastLink() method.
- fastLink has options to choose variables of interest that you would like to match on.
- Invarnames, you should list all the variables of interest.
- Instringdist.match, it's recommended to list the variables that are strings (ie. words) from your variables of interest.
- Innumeric.match, it's recommended to list the variables that are numeric (ie. numbers) from your variables of interest.
- the for-loop indicated byfor(i in 1:41)should span from 1 to the number of blocks that you have.
- To see how many blocks you have, runnames(blockstate_out). Then as an example, if you have 41 blocks, your for-loop should sayfor(i in 1:41)`.
- if there is an error with running fastLink, it is most likely that one of the blocks does not have enough observations to carry out probabilistic record linkage, so you should create separate for-loops to avoid that block.
- save the fastLink matches as RDS or csv file.
combiningonlinedata.R
Here, we combining GVA Standard Reports
From GVA's website (https://www.gunviolencearchive.org/reports), select which standard reports you would like to use to compare with your fastLink merged dataset.
- download each of the standard reports and save them as csv files. You can then read each of the csv files and store them as their respective names
- merge all of the data together into one file by doing bind_rows(one_data_set, another_data_set).
- clean this combined data so that it only contains states for each year that are well represented in the NVDRS dataset (this is the same criteria we used to clean our original GVA and NVDRS datasets)
- save the combined file as an RDS file or csv file .
getcommonmatches.R
- first store your original cleaned GVA as
our_GVA, your combined csv file from combiningonlinedata.R asonline_recordsand your fastLink matches csv file from applyfastlink.R asmerged. We transform them into lists namedourdata,onlinedata, and `ourmerged` respectively. There are two outputs you can view - We can compare the collected GVA standard reports (
onlinedata) with your original cleaned GVA dataset (ourdata) to see common Incident IDs . - compare the collected GVA standard reports (
onlinedata) with your fastLink matches (our_merged) to see common Incident IDs, which are stored inmerged_matches. - return records from the fastLink merged dataset that have those common Incident IDS, this is stored as
merged_keep_matches. - Note: since multiple records may have the same incident ID (representing multiple deaths for one incident),
merged_keep_matchesmay have more records than the number of IDs inmerged_matches. - save the
merged_keep_matchesas a csv file.
Owner
- Login: irishorng
- Kind: user
- Repositories: 1
- Profile: https://github.com/irishorng
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: Horng given-names: Iris orcid: "https://orcid.org/0009-0002-6293-053X" - family-names: Yin given-names: Qishuo - family-names: Small given-names: Dylan doi: https://doi.org/10.5281/zenodo.13901523 title: "RecordLinkage_GunViolenceIncidents" version: 1.0.0 date-released: 2024-10-07 url: "https://github.com/irishorng/RecordLinkage_GunViolenceIncidents"
GitHub Events
Total
- Watch event: 1
- Push event: 11
- Pull request event: 1
- Fork event: 1
- Create event: 1
Last Year
- Watch event: 1
- Push event: 11
- Pull request event: 1
- Fork event: 1
- Create event: 1