lifetime-consequences-child-health
Replication package for "Lifetime and intergenerational consequences of poor childhood health"
https://github.com/anthonywray/lifetime-consequences-child-health
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 6 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.2%) to scientific vocabulary
Repository
Replication package for "Lifetime and intergenerational consequences of poor childhood health"
Basic Info
- Host: GitHub
- Owner: anthonywray
- License: bsd-3-clause
- Language: Stata
- Default Branch: main
- Size: 23.2 MB
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
Replication package for “Lifetime and intergenerational consequences of
poor childhood health”
Krzysztof Karbownik and Anthony Wray, Journal of Human Resources
You can cite this replication package using Zenodo, where an archival
version of this repository is stored.
Overview
This replication package contains code, output, raw data, and Stata packages. The code in this replication package will replicate all tables and figures from raw data using Stata and a few unix shell commands. Instructions for accessing restricted-use data are provided below.
The entire replication takes 7 and a half days to run on a 2.9 GHz Intel-based Windows PC (64-bit x86-64) with 24 GB of RAM or 4 and a half days on a machine with 64 RAM.
The Zenodo replication package is available here: https://doi.org/10.5281/zenodo.7117306
The GitHub version is available here: https://github.com/anthonywray/lifetime-consequences-child-health
Data availability statement
Replication materials are provided under a BSD 3-Clause License.
The hospital inpatient admissions data from St. Bartholomew’s and Guy’s Hospitals that support the findings of this study are included with the replication package.
The hospital inpatient admissions data from Great Ormond Street Hospital are publicly available from the Historic Hospital Admission Records Project (HHARP) (https://hharp.org/). The HHARP data used in this study were kindly shared by Dr. Sue Hawkins (drsuehawkins@gmail.com). They cannot be posted online or included with the replication materials.
The anonymized complete-count data (UK Data Archive study number 7481) for the Censuses of England and Wales, 1881 to 1911, are publicly available from the Integrated Census Microdata (I-CeM) project (https://icem.data-archive.ac.uk/#step1). Since data access is safeguarded, the data files used for this study cannot be included with the replication materials.
“Integrated Census Microdata (I-CeM) Names and Addresses, 1851-1911: Special Licence Access” (UK Data Archive study number 7856) is a safeguarded dataset that contains names and addresses from the Integrated Census Microdata (I-CeM) dataset of the censuses of Great Britain for the period 1851 to 1911. These data can only be accessed by obtaining a Special License agreement with the UK Data Archive and the data distributor, FindMyPast Ltd.
Prior versions of the complete count 1881 Census of England and Wales were accessed from the North Atlantic Population Project (https://www.nappdata.org/napp/) in 2013. The NAPP data are now distributed through the IPUMS International data system (https://international.ipums.org/international/). At the time we accessed the data, the residential address variable was available for download from NAPP, but it is now only available as part of the restricted-use dataset “Integrated Census Microdata (I-CeM) Names and Addresses, 1851-1911: Special Licence Access.” Thus, we cannot include our original extracts from NAPP as part of the replication materials.
All other data are contained in the Zenodo repository.
Public-use raw data links
- Integrated Census Microdata (I-CeM) project data documentation and supporting materials, including parish and place list dictionary files are available here.
- An archived version of a website for an index of streets by registration district from the 1891 Census of England and Wales is located here. The authors accessed and scraped data from the website in 2013. As the original website was a “wiki”-style page, the archived version may differ from the version accessed by the authors.
Limited-use data links and instructions
Researchers interested in accessing the hospital inpatient admission
records from the Historic Hospital Admission Records Project (HHARP)
(https://hharp.org/) will need to create an account on the HHARP
website. Data download is limited to 200 observations. The search
results page indicates how many entries match the search criteria.
Downloaded data files are in .csv format. Researchers who require
larger volumes of data should contact Dr. Sue Hawkins
(drsuehawkins@gmail.com).
Researchers based outside the UK who are interested in accessing the
complete count records for the Censuses of Great Britain (including the
Censuses of England and Wales) from the Integrated Census Microdata
(I-CeM) project (http://icem.data-archive.ac.uk/#step1) will need to
create an account with the UK Data Service
(https://beta.ukdataservice.ac.uk/myaccount/login) by first requesting
a username
here. If you
are in the UK and from an institution of higher or further education or
your organisation is part of the UK Access Management Federation (UKAMF)
and on
this
list of federation members, you can use the username and password issued
to you by your organisation to login/register with the UK Data Service.
Data download is limited to 1,000,000 observations. Downloaded data
files are in .csv pipe-delimited format. Authors converted .csv
files to .dta files using Stat/Transfer and the shell script
shells/csv_to_dta.sh.
Restricted-use data instructions
The dataset that contains names and addresses from the complete count
Censuses of England and Wales is restricted use, but may be accessed by
obtaining a Special License agreement with the UK Data Archive and the
data distributor, FindMyPast Ltd. Instructions can be found
here.
It can take several months to be approved and gain access to the data.
Data are provided as .txt files. Authors converted .txt files to
.dta files using Stat/Transfer and the shell script
shells/csv_to_dta.sh.
Historical archive instructions
Inpatient hospital admission registers were viewed and photographed by the authors at the Barts Health NHS Trust Archives and the London Metropolitan Archive. Requests for appointments at the Barts Health NHS Trust Archives should be made at least one week in advance. Researchers must complete a researcher registration form in advance of their visit. Researchers interested in visiting the LMA will need to apply in advance for a History Card.
Raw dataset list
| Data file or Folder | Source | Notes | Provided |
|-------------------------------------------|----------------------------|---------------------------------------------------------------|----------|
| raw/diseases/*_Categories.csv | Authors | Causes of admission manually coded by authors. | Yes |
| raw/geography/* | I-CeM and NAPP | Data dictionaries. | Yes |
| raw/hospitals/*.dta | Authors | Data collected by authors. | Yes |
| raw/hharp/hharp_hospital_admissions.dta | HHARP | Limited-access, instructions for obtaining data are above. | No |
| raw/icem/England_YYYY_*.dta | I-CeM | Limited-access, instructions for obtaining data are above. | No |
| raw/icem_sl/ICEM_Names_EW_*.dta | I-CeM and UK Data Service | Restricted-access, instructions for obtaining data are above. | No |
| raw/names/nicknames_for_matching.csv | Stanford University | Obtained from Roy Mill at Stanford University. | Yes |
| raw/napp/napp_*.dta | NAPP | Current public-use version does not include address variable. | No |
| raw/occupations/* | I-CeM | Data dictionaries. | Yes |
| raw/streets/* | The National Archives (UK) | These are the scraped streets data. | Yes |
| shells/* | NAPP | Codebook and do files. | Yes |
Computational requirements
Software requirements
- Stata 14. Some sections of code are set to version 12.
- All user-written Stata programs used in this project can be
found in the
scripts/code/programsdirectory
- All user-written Stata programs used in this project can be
found in the
Portions of the code use shell commands, which may require Unix. The analysis takes 7 and a half days to run on a 2.9 GHz Intel-based Windows PC (64-bit x86-64) Microsoft Windows 10 Professional operating system, 24GB of RAM, and 8-core Stata/MP version 17.0. The run time will be 3 days shorter on a computer with 64 RAM.
At least 150GB of spare hard drive space is required for the raw data files and at least 500 GB of spare hard drive space is required for the intermediate files if you want to build the processed data from raw data.
Description of programs
-
0_run_all.dois the main script that runs all the code, allows you to select all options, and sets up file paths.-
scripts/code/1_build/01_extract_icem_census_data.doextracts the complete-count census datasets.- Runtime: about 4 hours and 30 minutes.
-
scripts/code/1_build/02_clean_icem_names_birthplaces.doextracts the names from the restricted-use census files and cleans the birthplace variables.- Runtime: about 1 hour and 15 minutes.
-
scripts/code/1_build/03_icem_census_linkage.docreates 10-, 20-, and 30-year links for complete count censuses between 1881 and 1911.- Runtime: about 6 days 16 hours and 15 minutes. This step takes about 3 days on a computer with 64 GB of RAM.
-
scripts/code/1_build/04_extract_hospital_data.doextracts and cleans the hospital data.- Runtime: about 7 minutes.
-
scripts/code/1_build/05_census_hospital_record_linkage.dolinks the hospital records to the censuses.- Runtime: about 1 hour and 30 minutes.
-
scripts/code/1_build/06_build_analysis_data.dobuilds the data sets used to produce the tables and figures.- Runtime: about 2 hours and 30 minutes.
-
scripts/code/2_analysis/01_tables_figures.doruns the regressions and conducts the analysis that generates the tables and figures in the main paper.- Runtime: about 30 minutes.
-
scripts/code/2_analysis/02_online_appendix.doruns the regressions and conducts the analysis that generates the tables and figures in the online appendix.- Runtime: about 11 hours and 15 minutes.
-
scripts/code/2_analysis/03_intext_statistics.dogenerates the numbers that are computed from the data and are mentioned in the main paper and online appendix.- Runtime: about 3 minutes.
-
Build: The names of the folders in scripts/code/01_build
correspond to the names of the main do files above. The folders with
code that builds the data contain the individual programs described
below. The sub-folder _name_cleanup/ includes code for cleaning the
name variables that is called on in various steps.
Step 1: scripts/code/01_build/01_extract_icem_census_data/
-
01.01_extract_icem_self_variables.docombines all observations for a census year to a single file, extracts the variables needed for the analysis, and saves different categories of variables to separate files -
01.02_fix_icem_household_ids.domakes corrections to the original household ID variable -
01.03_extract_icem_father_variables.doextracts variables for fathers to be merged into an individual’s record -
01.04_extract_icem_mother_variables.doextracts variables for mothers to be merged into an individual’s record -
01.05_extract_icem_head_variables.doextracts variables for the household head to be merged into an individual’s record -
01.06_create_icem_sibling_ids.doconstructs sibling identifiers -
01.07_extract_icem_spouse_variables.doextracts variables for the spouse to be merged into an individual’s record -
01.08_extract_icem_child_variables.doextracts variables for children to be merged into a parent’s record
Step 2: scripts/code/01_build/02_clean_icem_names_birthplaces/
-
02.00_input_nicknames_for_matching.dosets up a crosswalk file with nicknames to use in linking census and hospital data -
02.01_icem_extract_names.doextracts and cleans variables with names -
02.02_icem_name_distribution.doconstructs variables with the frequency of each name -
02.03_icem_birthplace_cleanup.docleans and codes the birth place variables
Step 3: scripts/code/01_build/03_icem_census_linkage/
-
03.01_extract_icem_matching_vars.doextracts the variables used in linkage from the I-CeM complete count data -
03.02_icem_blocking_setup.doperforms pre-processing steps prior to blocking -
03.03_icem_blocking.doperforms the blocking step in the census linkage -
03.04_icem_linkage_unique_sample.docreates the linked samples of unique matches between censuses
Step 4: scripts/code/01_build/04_extract_hospital_data/
-
04.00_input_london_district_list.doextracts a crosswalk of districts for London from a UK-wide file -
04.01_compile_hharp_data.docleans the Great Ormond Street Hospital data from HHARP -
04.02_combine_hospital_data.docombines the HHARP data with the hospital records from St. Bartholomew’s and Guy’s Hospitals -
04.03_clean_residential_addresses.docleans the address variable in the hospital records and codes residential districts and parishes -
04.04_clean_cause_of_admission.docleans and codes the cause of admission variable -
04.05_create_health_deficiency_index.doconstructs the health deficiency index variable using the cause of admission information
Step 5: scripts/code/01_build/05_census_hospital_record_linkage/
-
05.01_extract_hospital_matching_vars.doextracts the variables from the hospital records used in linking to the census -
05.02_icem_hosp_matching.dolinks the hospital records to the census
Step 6: scripts/code/01_build/06_build_analysis_data/
- .do files in the sub-folder
_sibling_restrictions/impose restrictions and select the siblings used in the analysis, and are called by .do files in this step
Analysis: All tables and figures in the main paper and the online
appendix are created by the .do files in the folder
scripts/code/02_analysis
-
01_tables_figures.docreates tables and figures in the main paper -
02_online_appendix.docreates tables and figures in the online appendix -
03_intext_statistics.docalculates numbers that are mentioned in main paper or online appendix
Packages
All packages and dependencies are included. A list of packages used for Stata is below:
Stata
carryforward, estout, ftools, gtools, jarowinkler, keeporder, labutil, nysiis, reghdfe, regsave, unique
Instructions for replicators
- Unzip, download, or clone the replication folder.
- Set the global
PROJ_PATHon line 29 in0_run_all.do, which points to the replication folder. - Download the I-CeM complete count census data to the folder
raw/icemaccording to the description above. The I-CeM data downloaded as.csvfiles can be converted to.dtafiles using the shell scriptcsv_to_dta.shin theshells/folder. Save each county and year to a separate file in theraw/icemfolder following the name conventionEngland_YYYY_County. The names of the counties should correspond to the lists of counties in lines 13 to 19 of01.01_extract_icem_self_variables.do. - Add the Special License data with names to the folder
raw/icem_sl. Convert the.txtfiles to.dtafiles using the shell scriptcsv_to_dta.sh. The.dtafiles should be namedICEM_NAMES_EW_YYYY. - Obtain the HHARP hospital data and save to
raw/hharp. - The complete count census data from NAPP used in this study include
the variable
GB81A_ADDRESSthat is no longer publicly available. The extracts of NAPP data used in lines 10 and 118 ofscripts/code/01_build/04_extract_hospital_data/04.03_clean_residential_addresses.doneed to be replaced with the I-CeM Special License data that include the address variable from the 1881 census of England and Wales. The variable names may need to be changed. - Save and run
0_run_all.do.
Tables and figures
The .tex or .eps files containing the tables and figures can be
found in /output.
Data citations
Hawkins, Sue. (2010). HHARP: The Historic Hospital Admission Records Project (https://hharp.org/).
Minnesota Population Center. (2008). North Atlantic Population Project: Complete count microdata, Version 2.0 [Machine-readable database]. Minneapolis: Minnesota Population Center.
Schurer, K., Higgs, E. (2020). Integrated Census Microdata (I-CeM), 1851-1911. [data collection]. UK Data Service. SN: 7481, http://doi.org/10.5255/UKDA-SN-7481-2.
Schurer, K., Higgs, E. (2020). Integrated Census Microdata (I-CeM), 1851-1911. [data collection]. UK Data Service. SN: 7481, http://doi.org/10.5255/UKDA-SN-7481-2.
The National Archives. (2013). 1891 Census Street Index. [website]. The National Archives. URL
Package citations
Stata
Tony Brady, 1998. “UNIQUE: Stata module to report number of unique values in variable(s),” Statistical Software Components S354201, Boston College Department of Economics, revised 18 Jun 2020.
Mauricio Caceres Bravo, 2018. “GTOOLS: Stata module to provide a fast implementation of common group commands,” Statistical Software Components S458514, Boston College Department of Economics, revised 02 February 2022.
Sergio Correia, 2016. “FTOOLS: Stata module to provide alternatives to common Stata commands optimized for large datasets,” Statistical Software Components S458213, Boston College Department of Economics, revised 26 October 2019.
Sergio Correia, 2014. “REGHDFE: Stata module to perform linear or instrumental-variable regression absorbing any number of high-dimensional fixed effects,” Statistical Software Components S457874, Boston College Department of Economics, revised 18 Nov 2019.
James Feigenbaum, 2014. “JAROWINKLER: Stata module to calculate the Jaro-Winkler distance between strings,” Statistical Software Components S457850, Boston College Department of Economics, revised 13 Oct 2016.
James Feigenbaum, 2014. “KEEPORDER: Stata module to keep and order a set of variables,” Statistical Software Components S457859, Boston College Department of Economics, revised 02 July 2014.
Ben Jann, 2004 “ESTOUT: Stata module to make regression tables,” Statistical Software Components S439301, Boston College Department of Economics, revised 25 March 2022.
David Kantor, 2004. “CARRYFORWARD: Stata module to carry forward previous observations,” Statistical Software Components S444902, Boston College Department of Economics, revised 15 January 2016.
Julian Reif, 2008. “REGSAVE: Stata module to save regression results to a Stata-formatted dataset,” Statistical Software Components S456964, Boston College Department of Economics, revised 12 Apr 2020.
Adrian Sayers, 2014. “NYSIIS: Stata module to calculate nysiis codes from string variables,” Statistical Software Components S457936, Boston College Department of Economics, revised 21 Jul 2018.
Acknowledgements
This code is based on AEA data editor's readme guidelines. Some content on this page was adapted from Alex Hollingsworth.
Owner
- Name: Anthony Wray
- Login: anthonywray
- Kind: user
- Location: Odense, Denmark
- Company: SDU
- Website: https://sites.google.com/view/anthonywray/home
- Twitter: AnthonySWray
- Repositories: 1
- Profile: https://github.com/anthonywray
Associate Professor of Economics, University of Southern Denmark | Applied Micro, Economic History
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Karbownik" given-names: "Krzysztof" orcid: "https://orcid.org/0000-0003-0523-4829" - family-names: "Wray" given-names: "Anthony" orcid: "https://orcid.org/0000-0001-5114-3309" title: "Replication package for 'Lifetime and intergenerational consequences of poor childhood health'" version: 1.0 doi: 10.5281/zenodo.7117306 date-released: 2022-09-27 url: "https://doi.org/10.5281/zenodo.7117306"