https://github.com/ccs-zcu/pop
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.2%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: CCS-ZCU
- License: cc-by-sa-4.0
- Language: Jupyter Notebook
- Default Branch: master
- Size: 119 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
pop
This repository serves as a supplementary material for the article "‘The more populism types you know, the better political scientist you are?’ Machine-learning based meta-analysis of populism types in the political science literature", accepted for publication in * Journal of Contemporary European Studies*. It contains scripts, overview data and figures. The scripts are in Python 3 programming language and have a form of Jupyter notebooks. All our analyses aim at being fully reproducible and we invite other scholars to reuse our code for their analyses. However, since the input dataset of PDFs is protected by copy rights, we cannot give you access to it.
Authors
License
CC-BY-SA 4.0, see attached LICENCE.md
Getting started
- download or clone the repository
- obtain access to the dataset of PDF files (we cannot make them available by default) and put them to
data/large_data/articlesfolder - open and run the Jupyter notebook scripts one by one according to the numbering and description below:
- 1metadataexploration.ipynb: collects metadata for individual articles and maps them on the PDF files from
data/large_data/articles - 2textblocksextraction.ipynb: extracts text layer from the PDFs in
data/large_data/articlesand save them as pickle files indata/large_data/articles_textblocks - 3filteredtextextraction.ipynb: combines text passages from textblocks into raw full text files in
data/large_data/articles_filteredtexts - 4exploringtrends.ipynb: explores varies metadata trends from
data/article_metadata.json - 5nlpprocessing.ipynb: applies NLP pipeline to textual data and extracts lemmatized text to
data/large_data/articles_lemmata_min - 6generatingpop_concs.ipynb: identifies all instances of "populism" or "populist" or "populists" or "populisms" using
"^populis|\spopulis"regular expression and uses them to extract concordances and sentences containing thepopulis*term. - 7exploringconcs+sents.ipynb: generates labeled concordances by mapping predefined populism types (right-wing, left-wing, authoritatian) on the concordance data and save them to
data/concs_labeled_min.pickle - 8concclassification.ipynb: uses the labeled concordances to train a document classification model using Multinomial Logistic Regression (other document classification algorithms are explored here, including Multinomial Naive Bayes, Random Forests, and Extremely Randomized Trees). The results of the classification task are introduced using confusion matrices and network graphs.
- 9_wmd.ipynb: calculates Word Mover Distance (WMD) between all pairs of labeled concordances and save the results into
data/large_data/distance_matrix_min.pickle. - 10wmdanalysis.ipynb: explores the WMD data from
data/large_data/distance_matrix_min.pickleby generating scatter plot figures by reprojecting the pairwise distances into 2D space using T-distributed Stochastic Neighbor Embedding (TSNE).
- 1metadataexploration.ipynb: collects metadata for individual articles and maps them on the PDF files from
Software
- Python 3
- Jupyter notebooks app/JupyterLab/JupyterHub
- Python 3 additional libraries listed in
requirements.txt
Owner
- Name: CCS-Lab (Computing Culture & Society)
- Login: CCS-ZCU
- Kind: organization
- Email: kase@kfi.zcu.cz
- Location: Czech Republic
- Website: https://ccs.zcu.cz
- Repositories: 1
- Profile: https://github.com/CCS-ZCU
GitHub Events
Total
Last Year
Dependencies
- gensim *
- ipykernel *
- jupyter *
- matplotlib *
- networkx *
- nltk *
- pandas *
- scikit-learn *
- sddk *
- seaborn *
- virtualenv *
