partypositions-wikitags

Estimation of party positions from Wikipedia tags (see Herrmann/Döring 2021)

https://github.com/hdigital/partypositions-wikitags

Science Score: 39.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 9 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.1%) to scientific vocabulary

Scientific Fields

Engineering Computer Science - 40% confidence
Last synced: 10 months ago · JSON representation

Repository

Estimation of party positions from Wikipedia tags (see Herrmann/Döring 2021)

Basic Info
  • Host: GitHub
  • Owner: hdigital
  • License: mit
  • Language: HTML
  • Default Branch: main
  • Size: 52 MB
Statistics
  • Stars: 10
  • Watchers: 1
  • Forks: 2
  • Open Issues: 0
  • Releases: 5
Created almost 5 years ago · Last pushed 11 months ago
Metadata Files
Readme License Zenodo

README.md

Party positions from Wikipedia classifications

Herrmann, Michael, and Holger Döring. 2023. “Party Positions from Wikipedia Classifications of Party Ideology.” Political Analysis 31(1): 22–41. — doi: 10.1017/pan.2021.28

Holger Döring, and Michael Herrmann. [YEAR] “Party Positions from Wikipedia Tags.” — doi: 10.5281/zenodo.7043510

Results


Install

Running all scripts requires R, Python and Stan.

We use Docker as a replication environment. It includes R, RStudio, Python, Stan and all packages (see Dockerfile).

```sh docker-compose up -d # start container in detached mode

docker-compose down # shut down container ```

http://localhost:8787/ — RStudio in a browser with all dependencies

Project structure

Note — Using RStudio project workflow – 0-wp-data.Rproj. All R scripts use project root as base path and file paths are based on it.

Folders

  • 01-data-sources
    • 01-partyfacts — Party Facts data
    • 02-wikipedia — Wikipedia data and infobox tags
    • 03-party-positions — party position data for validation (CHES, DALP, Manifesto, WVS)
  • 02-data-preparation — create datasets for analysis
  • 03-estimation — estimation of models and post-estimation
  • 04-data-final — datasets with party and tags positions (only M2)
  • 05-validation — validation of party positions (only M2)
  • 06-figures-tables — visualization of results (only M2)

Tag harmonization

A dataset of Wikipedia tags is created in 02-data-preparation/01-wp-infobox.R.

  • some minor harmonization of category names
  • selects only categories that are used twice

The dataset used for the analysis is created in 02-data-preparation/02-wp-data.R.

  • filter most frequent tags — see parameter
  • create dataset in wide format with tags as variable names

Estimation

Model 2 (and Model 1) can be estimated in 03-estimation.

We use only Model 2 for post-estimation and the succeeding preparation of final data, figures and tables.

Party positions

We include party position data for validation — see 01-data-sources/03-party-positions/

  • Chapel Hill Expert Survey (CHES) – trend file 1999–2019
  • Democratic Accountability and Linkages Project (DALP) expert survey (Kitschelt 2013)
  • Manifesto Project (MP) – left-right (rile) scores
  • World Values Survey (WVS) — voters left-right self-placement, Wave 6, 2010–2014

Changes

Differences of revised code with paper-based code used in replication material:

Herrmann, Michael, and Holger Döring. 2021. “Replication Data for: Party Positions from Wikipedia Classifications of Party Ideology.” — doi: 10.7910/DVN/1JHZIU

Data

  • new (revised) main final dataset — 04-descriptives/party-tags-positions.csv
  • remove historical and faction tags sections

Code

  • Stan statistical computing platform used for estimation (JAGS deprecated)
  • new folder structure with index numbers
  • fewer R packages dependencies
  • focus on Model 2 (Model 1 estimation only)
  • removed tables and figures only relevant for paper
  • revised documentation all scripts

datasets

License

MIT — Copyright (c) 2022 Holger Döring and Michael Herrmann

Owner

  • Login: hdigital
  • Kind: user

GitHub Events

Total
  • Release event: 1
  • Watch event: 1
  • Push event: 1
Last Year
  • Release event: 1
  • Watch event: 1
  • Push event: 1

Dependencies

01-data-sources/02-wikipedia/requirements.in pypi
  • pandas ==1.
  • requests *
  • wikitextparser *
  • wptools *
01-data-sources/02-wikipedia/requirements.txt pypi
  • certifi ==2020.12.5
  • charset-normalizer ==2.0.12
  • html2text ==2020.1.16
  • idna ==3.3
  • lxml ==4.6.2
  • numpy ==1.20.1
  • pandas ==1.2.3
  • pycurl ==7.43.0.6
  • python-dateutil ==2.8.1
  • pytz ==2021.1
  • regex ==2020.11.13
  • requests ==2.28.0
  • six ==1.15.0
  • urllib3 ==1.26.9
  • wcwidth ==0.2.5
  • wikitextparser ==0.47.3
  • wptools ==0.4.17
Dockerfile docker
  • rocker/tidyverse 4.1.3 build
docker-compose.yml docker