news_view_reddit

Reddit text data analysis, gathering data and basic analysis

https://github.com/ottokuosmanen/news_view_reddit

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.5%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Reddit text data analysis, gathering data and basic analysis

Basic Info
  • Host: GitHub
  • Owner: OttoKuosmanen
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 1.15 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created almost 3 years ago · Last pushed almost 2 years ago
Metadata Files
Readme License Citation

README.md

Reddit Data Extraction and Basic Language Analysis

This directory provides tools for extracting data from Reddit, performing basic language analysis using VADER (Valence Aware Dictionary and sEntiment Reasoner), and visualizing the data. The package includes a ready-to-analyze data file from the "worldnews" subreddit as of March 27, 2023.

Author: Otto Kuosmanen
Data Source: Reddit
Subreddit: "worldnews" (modifiable)
Contribute: GitHub


Important Note

Before starting, it is essential to understand that some scripts require your own Reddit access key. This can be obtained through the Reddit website. For more information, refer to the KEY folder's script. A Reddit account is necessary for this step.

Certain scripts can be executed with the default data file, including word clouds, histograms, and emotion scatter plots. To conduct live analysis or create your own data files, follow the instructions found in the KEY folder.


Scripts

Visualization

  • wordclouds.py: Visualizes the post titles in the data file using the WordCloud library.

  • wordclouds_live.py: Fetches data from Reddit and visualizes post titles with the WordCloud library.

Data Creation

  • create_data.py: Generates a custom data file by fetching data from a chosen subreddit. Performs basic language analysis on titles and comments, calculating valence scores (negative, positive, compound). The compound score is utilized for further analysis as it aggregates valence scores. The data is saved in JSON format in the data folder.

Analysis

  • histogram: Analyzes a data file and produces a histogram of the valence scores for titles and comments.

  • emo_scatter: Creates a scatter plot illustrating the emotionality of comments and the post rating.

  • redditaveragetime_live: Calculates the average time between posts in a selected subreddit.


Data File

Located in the DATA folder, the file contains 1000 posts from the "worldnews" subreddit, dated March 27, 2023, ready for analysis.

Owner

  • Name: Otto Kuosmanen
  • Login: OttoKuosmanen
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: "Reddit Data Extraction and Basic Language Analysis"
version: 1.0.0
date-released: 2024-04-30
authors:
  - family-names: Kuosmanen
    given-names: Otto Juhani Benjamin
url: "https://github.com/OttoKuosmanen/news_view_reddit"

GitHub Events

Total
Last Year