https://github.com/atomashevic/face-of-populism
Code repository for paper: "The Face of Populism: Examining Differences in Facial Emotional Expressions of Political Leaders Using Machine Learning"
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.1%) to scientific vocabulary
Keywords
Repository
Code repository for paper: "The Face of Populism: Examining Differences in Facial Emotional Expressions of Political Leaders Using Machine Learning"
Basic Info
Statistics
- Stars: 6
- Watchers: 3
- Forks: 1
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
The Face of Populism
Python code repository for the paper: "The Face of Populism: Examining Differences in Facial Emotional Expressions of Political Leaders Using Machine Learning"
ArXiv Preprint: https://arxiv.org/abs/2304.09914

Overview
This repository contains code and data for analyzing facial emotional expressions of political leaders using machine learning techniques. The analysis compares populist and pluralist leaders to identify significant differences in their emotional expressions during public appearances.
Contents
- Emotion detection notebook:
notebook/ - CSVs with processed data:
data-clean/ - Analysis scripts:
scripts/ - Analysis results:
results/ - Figures:
figures/ - Raw data:
data-raw/yt-urls.csv: YouTube URLs for all analyzed videosgps.csv: Global Party Survey data with populism scores
The repository uses a structured organization where raw data is processed through the pipeline, with intermediate and final results stored in appropriate directories.
The output folder in this repository is empty by default, but after running the notebook it will contain folders with annotated images of each processed frame and each processed video. Figure 1 in the paper is made from two such frames.
Statistical Approach
The statistical analysis in this project follows these key steps:
Video-level aggregation: For each video, we calculate:
- Mean scores for each of the 7 emotion categories
- A composite "negative" emotion score (sum of anger, disgust, fear, and sadness)
Classification by political characteristics:
- Populism scores from the Global Party Survey (1-4 scale)
- Binary classification: Pluralist (scores 1-2) vs. Populist (scores 3-4)
- Additional variables: country, party size, ideology
Hypothesis testing: We test whether populist leaders display:
- Higher levels of negative emotions
- Lower levels of neutral expressions
Statistical methods:
- Independent samples t-tests (for binary comparisons)
- ANOVA with post-hoc Tukey HSD (for multi-category comparisons)
- Effect size measurements (Cohen's d, Eta-squared)
- Visualizations to examine distributions and potential outliers ## Implementation Details
The emotion detection pipeline uses the following implementation approach:
- Video Download: Videos are downloaded from YouTube using the
pytubelibrary - Video Preprocessing:
- For videos requiring cropping, the
moviepylibrary is used to create subclips - A temporary directory (
temp/) stores videos during processing
- For videos requiring cropping, the
- Frame Extraction and Analysis:
- OpenCV (
cv2) is used to extract video metadata (fps, frame count, duration) - The
ferlibrary handles face detection (MTCNN) and emotion recognition - Annotated frames are saved as images and zipped for later review
- OpenCV (
- Output Organization:
- CSV files with emotion scores are saved to
data-clean/directory - Annotated images are saved to
output/directory - File naming convention:
[video_index]-[method_code].csv(e.g.,0-t300.csv)
- CSV files with emotion scores are saved to
Technical Pipeline
Video Processing
We use two approaches to extract frames from the videos:
Main approach (t300): Extract 300 frames uniformly distributed over the length of each video
- Implementation calculates optimal frame frequency based on total video frames
- Formula:
frame_frequency = floor(total_frames/300)
Alternative approach (f50): Extract every 50th frame from each video (approximately one frame every 1.67-2.00 seconds)
- Fixed sampling rate regardless of video length
Face Detection
The face detection pipeline consists of the following steps:
Face Detection: We use the Multi-task Cascaded Convolutional Networks (MTCNN) approach implemented in the
ferPython library. This framework uses three-stage cascaded CNNs for simultaneous face detection and alignment.- Parameters:
- Scale factor: 0.709
- Minimum face size: 40 pixels
- Thresholds: [0.6, 0.7, 0.7] for the three stages
- Parameters:
Manual Verification: For videos with multiple detected faces, we manually verify which face corresponds to the political leader of interest.
- The
analyze-emotions.pyscript marks videos with multiple faces (when a second face appears in >5% of frames) - During manual review, videos where consistent face detection isn't possible are discarded
- For some videos, we specify frame ranges to use only portions with consistent face detection
- The
Emotion Detection
After face detection, we apply emotion recognition using the mini-Xception CNN architecture:
Model Architecture: mini-Xception CNN implementation from the
ferlibrary- 4 residual depth-wise separable convolutions
- Input shape: 64x64 pixels (grayscale)
- Output: 7 emotion scores (anger, disgust, fear, happiness, sadness, surprise, neutral)
- Implementation:
detector = FER(mtcnn=True)initializes the detector with MTCNN enabled
Model Training: The model was pre-trained on the FER2013 dataset
- 30,000 48x48 pixel grayscale images of facial expressions
- Benchmark accuracy: 66% (comparable to human accuracy of 65±5%)
Processing Pipeline:
- Videos are processed one at a time to manage memory usage
- For each video, temporary files are created and cleaned between processing steps
- The
Videoclass from theferlibrary handles frame extraction and emotion detection - Multiple faces are preserved in the output (
first_face_only = False) for manual verification later
Output Processing:
- Each frame produces 7 scores that sum to 1 (compositional data)
- We aggregate negative emotions (anger, disgust, fear, sadness) to minimize false positive issues
Validation:
- Human annotation study with 5 coders reviewing 514 randomly selected frames
- Inter-coder agreement: Fleiss' Kappa of 0.231 (fair agreement)
- Machine-human agreement: 55.2% (Cohen's Kappa of 0.311, F1 score of 0.502)
Data Analysis
Our statistical analysis includes:
Data Preprocessing:
- Calculation of mean scores for each emotion category per video
- Aggregation of negative emotions (anger, disgust, fear, sadness) into a composite score
- Handling of missing data and removing videos with inadequate face detection
- Classification of leaders as populist (Party Populism scores 3-4) or pluralist (Party Populism scores 1-2)
Descriptive Statistics:
- Summary statistics (mean, standard deviation, min/max) for each emotion category
- Distribution analysis using visualization techniques (swarm plots, raincloud plots)
- Country-level comparisons of emotional expressions
Statistical Testing:
- Independent samples t-tests for comparing populist vs. pluralist leaders
- ANOVA tests to examine differences across populism score categories (1-4)
- Post-hoc Tukey HSD tests for multiple comparisons
- Effect size calculations (Cohen's d for t-tests, Eta-squared for ANOVA)
Alternative Analyses:
- Comparison of results between two sampling methods (t300 and f50)
- Analysis of dominant emotions (which emotion has the highest score in each frame)
- Examination of ideological differences (left-right, liberal-conservative)
- Country-specific case studies
Notebook: notebook/emotion_detection.ipynb
The emotion detection notebook has been tested on Google Colab and GitHub Codespaces environments. It contains the code for processing YouTube videos, detecting faces, and recognizing emotions.
Requirements: - Python 3.6+ - GPU with CUDA support (the notebook validates GPU availability) - fer library (with MTCNN enabled) - OpenCV (cv2) - TensorFlow - pytube (for YouTube video downloading) - moviepy (for video editing) - imageio==2.4.1 (specific version requirement) - matplotlib, numpy, pandas (for data processing and visualization)
Analysis Scripts: scripts/
Run order:
scripts/analyze-emotions.py: Processes the raw emotion data from videos- Important: This script includes a manual review step for face detection verification
- Input: CSVs with raw emotion scores for each frame
- Output: Marked CSV files for manual review and final results CSV
- Key functions:
mark_multiple_faces(): Identifies videos with multiple faces (>5% of frames)fix_object(): Handles data type conversion and missing valuescreate_results_df(): Aggregates emotion data and merges with political metadata
- The script includes lists for excluded videos (
trash) and videos requiring specific frame ranges (cut_vids,cut_from,cut_to) - A separate analysis is performed for each sampling method (t300 and f50)
scripts/results-t300.py: Main analysis script- Produces Table 1 and Figures 2-4
- Analyzes data from the 300 uniformly sampled frames approach
- Statistical analyses:
- Descriptive statistics for all emotion categories
- T-tests comparing pluralists vs. populists
- Effect size calculations (Cohen's d)
- ANOVA with post-hoc Tukey HSD tests
- Eta-squared calculations for effect size
- Visualization:
- Swarm plots for group comparisons
- RainCloud plots showing distributions by populism score
- Country-level comparisons
scripts/results-f50.py: Supplementary analysis script- Produces Table 2 and Figures 5-7
- Analyzes data from the every-50th-frame approach
- Structure and analyses mirror those in
results-t300.py - Serves as a robustness check for the main analysis
scripts/dominant-emotion.py: Supplementary analysis script- Produces Table 3
- Analyzes which emotions are dominant in different leader groups
- For each frame, identifies the emotion with the highest score
- Calculates the proportion of frames where each emotion is dominant
- Compares distributions between populist and pluralist leaders
scripts/ideology.py: Additional analysis script- Examines the relationship between political ideology and emotional expressions
- Creates variables for left-right (
left) and liberal-conservative (lib) distinctions - Performs t-tests and calculates effect sizes for ideological comparisons
- Provides a comparison point for the populism-based analysis
scripts/japan.py: Country-specific analysis- Focuses on Japanese political leaders from three parties (LDP, NKP, CDP)
- Calculates mean emotion scores for each party
- Performs a detailed case study of emotional expressions in Japanese politics
scripts/misc.py: Utility script- Calculates total length of all videos using the
pytubelibrary - Computes the mean number of frames across all videos
- Note: This doesn't count videos removed from YouTube since the initial analysis
- Calculates total length of all videos using the
Replicating the Analysis
To fully replicate our analysis:
Start with the emotion detection notebook (
notebook/emotion_detection.ipynb) to process YouTube videos:- Ensure you have GPU access (the notebook checks for this with
tf.test.gpu_device_name()) - The notebook will download videos from YouTube using the URLs in
data-raw/yt-urls.csv - Two processing methods are implemented:
- Main method: 300 uniformly distributed frames per video
- Alternative method: Every 50th frame from each video
- Output is saved to
data-clean/(CSV) andoutput/(images) directories
- Ensure you have GPU access (the notebook checks for this with
Run the analysis scripts in the order listed above
For the manual review step in
analyze-emotions.py:- Review the
yt-urls-mark-*.csvfiles - Replace '999' values with the face number representing the selected leader
- Mark unusable videos in the 'trash' list in the script
- For videos needing specific frame ranges, update the 'cutvids', 'cutfrom', and 'cut_to' arrays
- Review the
Owner
- Name: Aleksandar Tomašević
- Login: atomashevic
- Kind: user
- Location: Novi Sad, Serbia
- Company: University of Novi Sad
- Website: www.atomasevic.com
- Twitter: atomasevic
- Repositories: 2
- Profile: https://github.com/atomashevic
GitHub Events
Total
- Push event: 1
Last Year
- Push event: 1
Committers
Last synced: 11 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Aleksandar Tomašević | a****c@g****m | 17 |
Issues and Pull Requests
Last synced: 11 months ago
All Time
- Total issues: 1
- Total pull requests: 0
- Average time to close issues: 3 days
- Average time to close pull requests: N/A
- Total issue authors: 1
- Total pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- atomashevic (1)