propensity_score_tutorial

Tutorial

https://github.com/bda-kts/propensity_score_tutorial

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.3%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Tutorial

Basic Info
  • Host: GitHub
  • Owner: BDA-KTS
  • License: mit
  • Language: R
  • Default Branch: main
  • Size: 146 KB
Statistics
  • Stars: 1
  • Watchers: 2
  • Forks: 1
  • Open Issues: 1
  • Releases: 0
Created over 1 year ago · Last pushed 6 months ago
Metadata Files
Readme License Citation

README.md

A Step-by-Step Guide To Evaluate Training-Salary Propensity Score in R

Learning Objectives

  • Understand and apply propensity score matching in R.
  • Interpret matching results to assess treatment effects.
  • Evaluate job training programs' impact on employment outcomes.

Target Audience

This tutorial is targeted towards researchers and practitioners interested in assessing the impact of job training programs on employment outcomes, particularly in the field of social science research.

Before starting this tutorial, you should have: - Basic understanding of R programming language. - Familiarity with statistical concepts such as regression analysis. - Access to R environment with necessary packages mentioned below.

Environment Setup (Installing dependencies)

Ensure you have R (version 3.6.0 or higher) environment on your local machine.

Install the required R packages by running the following commands:

R install.packages("Matching") install.packages("tableone")

Duration

Approximately 30-45 minutes

Social Science Use Cases

The tutorial demonstrates the causal effect of job training programs on the salaries of the employees. It uses the propensity score method having training programs as treatment group and the pre-training and post-training salaries as the control group.

Step-by-step Instructions, Codes, and Explanations

This tutorial provides an in-depth guide on utilizing a propensity score matching technique in R on a simple use case. We illustrate how propensity score matching, as described as academic mobility propensity score method, can effectively estimate the effect of a treatment or intervention while accounting for covariates that predict treatment receipt. For example, you might be interested in estimating the effect of job training programs on employment outcomes. In this scenario, we utilize the propensity score method to estimate this effect. It specifically delves into variables such as age, education level, years of experience, earnings before and after the training program, and participation in the training program (treatment variable). The guide comprises step-by-step instructions, example code snippets, and elucidations to facilitate comprehension and implementation. The treatment variable "TREATED" distinguishes individuals who underwent the job training program (TREATED = 1) as the treatment group and those who did not (TREATED = 0) as the control group. The objective is to achieve covariate balance, particularly concerning age, education level, and years of experience, between the treatment and control groups. Through propensity score matching, this method enables a more precise assessment of job training programs' impact on employment outcomes.

The Standardized Mean Difference (SMD) serves as a metric to gauge covariance balance between treatment and control groups before and after matching. SMD is a common metric in propensity score matching, with a lower SMD indicating superior balance and enhanced comparability regarding covariates. Interpretation of Mean Differences' sign and magnitude provides insights into the direction and magnitude of job training programs' impact on employment outcomes.

The provided dataset encompasses the following columns:

  • ID: Employee identifier
  • AGE: Age of the employee
  • EDUCATION: Education level of the employee
  • EXPERIENCE: Years of experience of the employee
  • EARNINGS_PRE: Earnings before the treatment program
  • EARNINGS_POST: Earnings after the treatment program
  • TREATED: Indicates whether the employee received the job training program (1 for received, 0 for not received)

AGE, EDUCATION, and EXPERIENCE are regarded as covariates, while EARNINGSPRE and EARNINGSPOST are used to evaluate the job training program's impact via SMD. A value of 1 in the TREATED column signifies the treatment group, whereas a value of 0 represents the control group.

Input Data

This method can work with any dataset containing variables of interest, a treatment indicator, and covariates. For example: - The 'Call me sexist but' Dataset (CMSB) to Assessing the Impact of Gender Bias in Social Media Posts. Propensity score matching can be used to determine if an author's gender influences bias in social media posts by creating comparable groups based on observable characteristics. This method allows researchers to assess gender's causal impact on bias while controlling for potential confounding factors, providing insights into online discourse dynamics.

We employ the propensity score matching technique to evaluate the impact of job training programs on employment outcomes.

Below, you'll find instructions on applying the Propensity Score Matching method to the sample data:

1. Download Files:

Download the following files into a single folder: - "jobtrainingdata.csv": Input dataset containing the relevant variables for analysis. - "propensitymatchingfunctions.R": R script containing functions for propensity score matching. - "main_script.R": R script for executing the analysis.

2. Run Commands in "main_script.R":

  • Open "main_script.R" in your R environment.
  • Run the commands sequentially to execute the analysis.

3. Load Input Dataset:

Execute the following lines to load the input dataset into R as job_training_data:

```R

Get the directory path of the main_script.R

script_dir <- dirname(rstudioapi::getActiveDocumentContext()$path)

Example usage:

jobtrainingdata <- read.csv(file.path(scriptdir, "jobtraining_data.csv")) ```

Sample Input Data: Sample input data can be provided in CSV format with columns representing variables of interest (EARNINGSPRE, EARNINGSPOST), a treatment indicator (TREATED), and covariates (AGE, EDUCATION, EXPERIENCE). Here is a screenshot of the sample input data:

Image Alt Text

4. Define Functions:

Execute this line to define the necessary functions from "propensitymatchingfunctions.R":

R source(file.path(script_dir, "propensity_matching_functions.R"))

5. Define the Treatment Variable and the Covariates:

Define the treatment variable (treatment_var) as "TREATED" and covariates (covariates) as relevant variables such as age, education level, and years of experience.

```R

Define variables

treatment_var <- "TREATED" # Specify your treatment variable covariates <- c("AGE", "EDUCATION", "EXPERIENCE") # Specify relevant covariates ```

6. Perform Propensity Score Matching and Get the Matched Data:

Call the perform_propensity_matching function with parameters job_training_data, treatment_var, and covariates to conduct propensity score matching:

```R matchingresults <- performpropensitymatching(data = jobtrainingdata, treatmentvar = treatment_var, covariates = covariates)

Matched data

matcheddata <- matchingresults$matched_data ```

The output includes the matched data (matched_data).

7. Compare the SMD in Unmatched and Matched Data:

```R

Access SMDs of unmatched and matched data

unmatchedsmd <- matchingresults$unmatchedsmd matchedsmd <- matchingresults$matchedsmd

Print SMDs

print(unmatchedsmd) print(matchedsmd) ```

Standard Mean Deviation: By examining the SMD for unmatched and matched data under different covariances, we assess the effectiveness of the matching process in achieving balance between the treatment and control groups. A lower SMD indicates a smaller difference between the two groups. For instance, in this example, the SMD for the variable "AGE" is 0.76 for unmatched data and 0.06 for matched data. This suggests that the treatment group in the matched data is more similar to the control group compared to the unmatched data.

Image Alt Text

8. Define Variables of Interest:

Define the variables of interest (vars_of_interest) based on the employment outcomes you want to assess.

R vars_of_interest <- c("EARNINGS_PRE", "EARNINGS_POST") # Specify variables for mean difference calculation

9. Calculate Mean Differences:

Call the calculate_mean_diff function with parameters matched_data, treatment_var, and vars_of_interest to calculate mean differences for the variables of interest.

```R

Calculate mean differences

meandiff <- calculatemeandiff(data = matcheddata, treatmentvar = treatmentvar, varsofinterest = varsofinterest) ```

The output provides the mean differences of vars_of_interest.

```R

Print mean differences

print(meandiff) ``` Mean difference: Example for the interpretation: A mean difference of -666.6 for the variable EARNINGSPRE indicates that, on average, individuals who participated in the training program had earnings that were $666.6 less before the training program compared to those who did not participate in the training program. Additionally, their earnings after the training program are, on average, $3000 more than those who did not participate in the training program, as indicated by the mean difference of 3000 for the variable EARNINGS_POST. Image Alt Text

Conclusion

In conclusion, this tutorial provides a detailed overview of propensity score matching techniques and their application in assessing the impact of job training programs on employment outcomes. By following the step-by-step guide and sample code provided, learners gain a comprehensive understanding of how to implement propensity score matching in R and interpret the results effectively. The learners have acquired the following skills:

  • Understanding of propensity score matching and its relevance in social science research.
  • Proficiency in using R programming language for propensity score matching analysis.
  • Ability to interpret standardized mean differences (SMDs) and mean differences in matched data.
  • Competence in assessing the impact of academic mobility on research productivity, received citations, and collaboration indicators using propensity score matching.

Propensity score matching is a powerful technique for estimating causal effects in observational studies, particularly in the context of social science research. By mastering this technique, researchers can overcome challenges associated with selection bias and confounding variables, leading to more robust and reliable research findings. We encourage learners to further explore advanced topics in propensity score matching and apply these skills to their own research endeavors.

Contact Details

For more information, please contact fakhri.momeni@gesis.org

Owner

  • Name: BDA-KTS
  • Login: BDA-KTS
  • Kind: organization

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Momeni
    given-names: Fakhri
title: "Propensity Score Matching for Assessing the Impact of Job Training Programs on Employment Outcomes: A Step-by-Step Guide using R"
version: 1.0
identifiers:
  - type: 
    value: 
date-released: 2025-06-10

GitHub Events

Total
  • Issues event: 7
  • Watch event: 1
  • Delete event: 1
  • Issue comment event: 10
  • Push event: 12
  • Pull request event: 6
  • Create event: 1
Last Year
  • Issues event: 7
  • Watch event: 1
  • Delete event: 1
  • Issue comment event: 10
  • Push event: 12
  • Pull request event: 6
  • Create event: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 1
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 1
  • Total pull request authors: 0
  • Average comments per issue: 8.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 8.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • shyamgupta196 (3)
  • johanneskiesel (1)
  • taimoorkhan-nlp (1)
Pull Request Authors
  • taimoorkhan-nlp (3)
  • shyamgupta196 (2)
Top Labels
Issue Labels
Pull Request Labels