framing-sml

https://github.com/aapolimeno/framing-sml

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.4%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: aapolimeno
License: mit
Language: Python
Default Branch: main
Size: 378 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created over 2 years ago · Last pushed over 2 years ago

Metadata Files

Readme License Citation

Automatic Detection of Media Frames in Dutch COVID-19-related News Articles

Description

This project investigates the automatic detection of media framing in news articles that are related to COVID-19 by means of machine learning.

Installation

Install the required packages by pasting the following line into your terminal/command prompt: bash pip install -r requirements.txt

Data

`annotated_sample.csv`

Contains a sample of the annotated data, which consists of news paper texts accompanied by annotations labels.

Code

`main.py`

This Python file calls all other scripts in the pipeline. Start the experiment by running it in your terminal or command prompt as follows: bash python main.py Before you do so, make sure to check if the following settings are set to your wishes: - The path variable, which can be found at the beginning of the script. It should point to where the dataset is saved on your disk. By default, a relative path is used to the sample specified above. - The framing variable, which can contain all the media framing types described in the coding book; - The text_reps variable, which specifies the text representation methods (Bag of Words with TF-IDF weighting, pre-trained word embeddings, custom word embeddings, and Sentence-BERT sentence embeddings); - The algorithms variable, which specifies the classification algorithms that are used in this project (logistic regression, SVM, passive aggressive classifier); - The input_type variable, which specifies the format of the input. The default setting is texts.

By default, all framing types, text representation methods and algorithms are enabled. You only have to adapt the variables if you want to exclude certain variables.

`text_representation.py`

This script transforms the textual data into machine-readable format with the methods that are specified in the main script above. Please check the first few lines of the script to verify whether the necessary models are installed on your machine. If not, you can uncomment the corresponding lines.

`preprocessing.py`

This script transforms the raw data into the correct format. It selects relevant columns, transforms label encodings to binary representations, and splits the data into a training set and a test set.

`classification.py`

In this script, models are trained with the selected algorithms, and the resulting predictions are saved.

`evaluation.py`

This script performs the evaluation of the pipeline by means of a classification report with the Precision, Recall and F1-Score metrics (displayed in your terminal) and a confusion matrix (automatically saved in a folder called eval)

`Results`

The place where the restuls (classification reports as .txt file, confusion matrices as .png) are stored

Owner

Name: Alessandra
Login: aapolimeno
Kind: user
Location: Utrecht

Repositories: 1
Profile: https://github.com/aapolimeno

Student of Human Language Technology @ VU Amsterdam

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: "My Research Software"
authors:
  - family-names: Druskat
    given-names: Stephan
    orcid: https://orcid.org/1234-5678-9101-1121
version: 2.0.4
date-released: 2021-08-11
doi: 10.5281/zenodo.1234
license: Apache-2.0
repository-code: "https://github.com/citation-file-format/my-research-software"

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

framing-sml

Science Score: 44.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Automatic Detection of Media Frames in Dutch COVID-19-related News Articles

Description

Installation

Data

`annotated_sample.csv`

Code

`main.py`

`text_representation.py`

`preprocessing.py`

`classification.py`

`evaluation.py`

`Results`

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Dependencies