nyt-sentiment-index

A daily sentiment index based on New York Times economic news.

https://github.com/hakonmh/nyt-sentiment-index

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.0%) to scientific vocabulary

Keywords

finance nytimes sentiment-analysis

Last synced: 6 months ago · JSON representation ·

Repository

A daily sentiment index based on New York Times economic news.

Basic Info

Host: GitHub
Owner: hakonmh
License: mit
Language: Jupyter Notebook
Default Branch: master
Homepage:
Size: 1.37 MB

Statistics

Stars: 2
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Topics

finance nytimes sentiment-analysis

Created almost 3 years ago · Last pushed over 2 years ago

Metadata Files

Readme Changelog License Citation

NYT Sentiment Index

This repository contains the source code used to create the New York Times Sentiment Index.

The New York Times Sentiment Index is a high frequency measure of economic sentiment based on classification of economics-related news articles. The index is based on all articles published in the New York Times since 1851. The index is updated with new data on a daily basis and can be found here.

The New York Sentiment Index - Historical overview (1900-2023)
Figure 1: The chart shows moving average of NYT news sentiment values since 1900; higher values indicate more positive sentiment, and lower values indicate more negative sentiment. Gray bars indicate NBER recession dates.

Methodology

The construction of the index is achieved through the computation of a daily sentiment score, which is based on an analysis of all articles published by The New York Times each respective day.

The first step is to analyze each article published that day by employing a two-step classification process. First, a fined tuned transformer model classifies article into either the Economic or Other category. Second, each article classified as Economic is classified by a second transformer as either Positive, Neutral, or Negative based on its sentiment.

The second step is to compute a sentiment score for each day, which is done by comparing the number of Positive and Negative headlines. Neutral headlines are simply ignored.

The sentiment score is computed by:

Counting the number of headlines tagged Positive ($N{pos}$) and Negative ($N{neg}$).
Applying the formula:

math \text{Sentiment} = \frac{N_{pos} - N_{neg}}{N_{pos} + N_{neg}}

Where $N$ is the number of Economic headlines classified as either Positive or Negative.

The final index, as demonstrated in Figure 1, is made by smoothing the sentiment score with a 100-day exponential moving average, and subsequently detrending it by subtracting a 7-year simple moving average, and adding a constant $0.5$. This technique allows for the removal of any long term fluctuations that could be attributed to evolving journalistic behaviors (e.g., the increasing prominence of negative news of all forms in the social media era), without compromising the index’s ability to capture sentiment throughout a business cycle (Usually 5-7 years).

Models

To classify each headline, two models are employed:

Both models are fine tuned transformers based on xtremedistil-l12-h384-uncased, a model originally published by Microsoft. These models were fine-tuned on labelled datasets consisting of 300,000 and 600,000 headlines, respectively. The source code employed for fine-tuning both models can be accessed here.

Data

The data used to create the index was gathered through the use of The New York Times API. It is plausible to construct similar indices derived from other news sources. However, The New York Times archive offers a distinct advantage due to its comprehensive historical range, coupled with its accessible format.

Citation

If you republish or redistribute any part of this work, please acknowledge its source by including the following citation:`

text Håkon Magne Holmen. 2023. New York Times Sentiment Index. Version 0.1.0 https://github.com/hakonmh/NYT-Sentiment-Index

Running the Code

Clone the repository and install the required packages:

bash git clone https://github.com/hakonmh/NYT-Sentiment-Index.git cd NYT-Sentiment-Index pip install -r requirements.txt

Get a New York Times API-key at https://developer.nytimes.com/. You can use the API-key by either setting the NYT_API_KEY environment variable or changing the NYT_API_KEY variable found in download.py

Finally, run the main.py script:

bash python main.py

It should be noted that the New York Times API has a limit of 500 requests per day, meaning just 500 months of data can be downloaded per day. The script will crash once this limit is reached. However, the script will start from where it left of when called again next time. You can change the start date in main.py to download a shorter version of the index for testing purposes.

Owner

Login: hakonmh
Kind: user

Repositories: 4
Profile: https://github.com/hakonmh

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Holmen"
  given-names: "Håkon Magne"
title: "New York Times Sentiment Index"
version: 0.1.0
date-released: 2023-05-26
url: "https://github.com/hakonmh/NYT-Sentiment-Index"

GitHub Events

Total

Last Year

Dependencies

requirements.txt pypi

pandas *
pynytimes *
torch *
transformers *

setup.py pypi

pandas >=1.4.0
pynytimes *
torch *
transformers *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

nyt-sentiment-index

Science Score: 44.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

NYT Sentiment Index

Methodology

Models

Data

Citation

Running the Code

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Dependencies