framing-sml
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.4%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: aapolimeno
- License: mit
- Language: Python
- Default Branch: main
- Size: 378 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Automatic Detection of Media Frames in Dutch COVID-19-related News Articles
Description
This project investigates the automatic detection of media framing in news articles that are related to COVID-19 by means of machine learning.
Installation
Install the required packages by pasting the following line into your terminal/command prompt:
bash
pip install -r requirements.txt
Data
annotated_sample.csv
Contains a sample of the annotated data, which consists of news paper texts accompanied by annotations labels.
Code
main.py
This Python file calls all other scripts in the pipeline. Start the experiment by running it in your terminal or command prompt as follows:
bash
python main.py
Before you do so, make sure to check if the following settings are set to your wishes:
- The path variable, which can be found at the beginning of the script. It should point to where the dataset is saved on your disk. By default, a relative path is used to the sample specified above.
- The framing variable, which can contain all the media framing types described in the coding book;
- The text_reps variable, which specifies the text representation methods (Bag of Words with TF-IDF weighting, pre-trained word embeddings, custom word embeddings, and Sentence-BERT sentence embeddings);
- The algorithms variable, which specifies the classification algorithms that are used in this project (logistic regression, SVM, passive aggressive classifier);
- The input_type variable, which specifies the format of the input. The default setting is texts.
By default, all framing types, text representation methods and algorithms are enabled. You only have to adapt the variables if you want to exclude certain variables.
text_representation.py
This script transforms the textual data into machine-readable format with the methods that are specified in the main script above. Please check the first few lines of the script to verify whether the necessary models are installed on your machine. If not, you can uncomment the corresponding lines.
preprocessing.py
This script transforms the raw data into the correct format. It selects relevant columns, transforms label encodings to binary representations, and splits the data into a training set and a test set.
classification.py
In this script, models are trained with the selected algorithms, and the resulting predictions are saved.
evaluation.py
This script performs the evaluation of the pipeline by means of a classification report with the Precision, Recall and F1-Score metrics (displayed in your terminal) and a confusion matrix (automatically saved in a folder called eval)
Results
The place where the restuls (classification reports as .txt file, confusion matrices as .png) are stored
Owner
- Name: Alessandra
- Login: aapolimeno
- Kind: user
- Location: Utrecht
- Repositories: 1
- Profile: https://github.com/aapolimeno
Student of Human Language Technology @ VU Amsterdam
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: "My Research Software"
authors:
- family-names: Druskat
given-names: Stephan
orcid: https://orcid.org/1234-5678-9101-1121
version: 2.0.4
date-released: 2021-08-11
doi: 10.5281/zenodo.1234
license: Apache-2.0
repository-code: "https://github.com/citation-file-format/my-research-software"