Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.4%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: aapolimeno
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 378 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 2 years ago · Last pushed over 2 years ago
Metadata Files
Readme License Citation

README.md

Automatic Detection of Media Frames in Dutch COVID-19-related News Articles

Description

This project investigates the automatic detection of media framing in news articles that are related to COVID-19 by means of machine learning.

Installation

Install the required packages by pasting the following line into your terminal/command prompt: bash pip install -r requirements.txt

Data

annotated_sample.csv

Contains a sample of the annotated data, which consists of news paper texts accompanied by annotations labels.

Code

main.py

This Python file calls all other scripts in the pipeline. Start the experiment by running it in your terminal or command prompt as follows: bash python main.py Before you do so, make sure to check if the following settings are set to your wishes: - The path variable, which can be found at the beginning of the script. It should point to where the dataset is saved on your disk. By default, a relative path is used to the sample specified above. - The framing variable, which can contain all the media framing types described in the coding book; - The text_reps variable, which specifies the text representation methods (Bag of Words with TF-IDF weighting, pre-trained word embeddings, custom word embeddings, and Sentence-BERT sentence embeddings); - The algorithms variable, which specifies the classification algorithms that are used in this project (logistic regression, SVM, passive aggressive classifier); - The input_type variable, which specifies the format of the input. The default setting is texts.

By default, all framing types, text representation methods and algorithms are enabled. You only have to adapt the variables if you want to exclude certain variables.

text_representation.py

This script transforms the textual data into machine-readable format with the methods that are specified in the main script above. Please check the first few lines of the script to verify whether the necessary models are installed on your machine. If not, you can uncomment the corresponding lines.

preprocessing.py

This script transforms the raw data into the correct format. It selects relevant columns, transforms label encodings to binary representations, and splits the data into a training set and a test set.

classification.py

In this script, models are trained with the selected algorithms, and the resulting predictions are saved.

evaluation.py

This script performs the evaluation of the pipeline by means of a classification report with the Precision, Recall and F1-Score metrics (displayed in your terminal) and a confusion matrix (automatically saved in a folder called eval)

Results

The place where the restuls (classification reports as .txt file, confusion matrices as .png) are stored

Owner

  • Name: Alessandra
  • Login: aapolimeno
  • Kind: user
  • Location: Utrecht

Student of Human Language Technology @ VU Amsterdam

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: "My Research Software"
authors:
  - family-names: Druskat
    given-names: Stephan
    orcid: https://orcid.org/1234-5678-9101-1121
version: 2.0.4
date-released: 2021-08-11
doi: 10.5281/zenodo.1234
license: Apache-2.0
repository-code: "https://github.com/citation-file-format/my-research-software"

GitHub Events

Total
Last Year

Dependencies

requirements.txt pypi