partypositions-wikitags

Estimation of party positions from Wikipedia tags (see Herrmann/Döring 2021)

https://github.com/hdigital/partypositions-wikitags

Science Score: 39.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 9 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.1%) to scientific vocabulary

Scientific Fields

Engineering Computer Science - 40% confidence
Last synced: 6 months ago · JSON representation

Repository

Estimation of party positions from Wikipedia tags (see Herrmann/Döring 2021)

Basic Info
  • Host: GitHub
  • Owner: hdigital
  • License: mit
  • Language: HTML
  • Default Branch: main
  • Size: 52 MB
Statistics
  • Stars: 10
  • Watchers: 1
  • Forks: 2
  • Open Issues: 0
  • Releases: 5
Created over 4 years ago · Last pushed 7 months ago
Metadata Files
Readme License Zenodo

README.md

Party positions from Wikipedia classifications

Herrmann, Michael, and Holger Döring. 2023. “Party Positions from Wikipedia Classifications of Party Ideology.” Political Analysis 31(1): 22–41. — doi: 10.1017/pan.2021.28

Holger Döring, and Michael Herrmann. [YEAR] “Party Positions from Wikipedia Tags.” — doi: 10.5281/zenodo.7043510

Results


Install

Running all scripts requires R, Python and Stan.

We use Docker as a replication environment. It includes R, RStudio, Python, Stan and all packages (see Dockerfile).

```sh docker-compose up -d # start container in detached mode

docker-compose down # shut down container ```

http://localhost:8787/ — RStudio in a browser with all dependencies

Project structure

Note — Using RStudio project workflow – 0-wp-data.Rproj. All R scripts use project root as base path and file paths are based on it.

Folders

  • 01-data-sources
    • 01-partyfacts — Party Facts data
    • 02-wikipedia — Wikipedia data and infobox tags
    • 03-party-positions — party position data for validation (CHES, DALP, Manifesto, WVS)
  • 02-data-preparation — create datasets for analysis
  • 03-estimation — estimation of models and post-estimation
  • 04-data-final — datasets with party and tags positions (only M2)
  • 05-validation — validation of party positions (only M2)
  • 06-figures-tables — visualization of results (only M2)

Tag harmonization

A dataset of Wikipedia tags is created in 02-data-preparation/01-wp-infobox.R.

  • some minor harmonization of category names
  • selects only categories that are used twice

The dataset used for the analysis is created in 02-data-preparation/02-wp-data.R.

  • filter most frequent tags — see parameter
  • create dataset in wide format with tags as variable names

Estimation

Model 2 (and Model 1) can be estimated in 03-estimation.

We use only Model 2 for post-estimation and the succeeding preparation of final data, figures and tables.

Party positions

We include party position data for validation — see 01-data-sources/03-party-positions/

  • Chapel Hill Expert Survey (CHES) – trend file 1999–2019
  • Democratic Accountability and Linkages Project (DALP) expert survey (Kitschelt 2013)
  • Manifesto Project (MP) – left-right (rile) scores
  • World Values Survey (WVS) — voters left-right self-placement, Wave 6, 2010–2014

Changes

Differences of revised code with paper-based code used in replication material:

Herrmann, Michael, and Holger Döring. 2021. “Replication Data for: Party Positions from Wikipedia Classifications of Party Ideology.” — doi: 10.7910/DVN/1JHZIU

Data

  • new (revised) main final dataset — 04-descriptives/party-tags-positions.csv
  • remove historical and faction tags sections

Code

  • Stan statistical computing platform used for estimation (JAGS deprecated)
  • new folder structure with index numbers
  • fewer R packages dependencies
  • focus on Model 2 (Model 1 estimation only)
  • removed tables and figures only relevant for paper
  • revised documentation all scripts

datasets

License

MIT — Copyright (c) 2022 Holger Döring and Michael Herrmann

Owner

  • Login: hdigital
  • Kind: user

GitHub Events

Total
  • Release event: 1
  • Watch event: 1
  • Push event: 1
Last Year
  • Release event: 1
  • Watch event: 1
  • Push event: 1

Dependencies

01-data-sources/02-wikipedia/requirements.in pypi
  • pandas ==1.
  • requests *
  • wikitextparser *
  • wptools *
01-data-sources/02-wikipedia/requirements.txt pypi
  • certifi ==2020.12.5
  • charset-normalizer ==2.0.12
  • html2text ==2020.1.16
  • idna ==3.3
  • lxml ==4.6.2
  • numpy ==1.20.1
  • pandas ==1.2.3
  • pycurl ==7.43.0.6
  • python-dateutil ==2.8.1
  • pytz ==2021.1
  • regex ==2020.11.13
  • requests ==2.28.0
  • six ==1.15.0
  • urllib3 ==1.26.9
  • wcwidth ==0.2.5
  • wikitextparser ==0.47.3
  • wptools ==0.4.17
Dockerfile docker
  • rocker/tidyverse 4.1.3 build
docker-compose.yml docker