nyt-sentiment-index
A daily sentiment index based on New York Times economic news.
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.0%) to scientific vocabulary
Keywords
Repository
A daily sentiment index based on New York Times economic news.
Basic Info
Statistics
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
NYT Sentiment Index
This repository contains the source code used to create the New York Times Sentiment Index.
The New York Times Sentiment Index is a high frequency measure of economic sentiment based on classification of economics-related news articles. The index is based on all articles published in the New York Times since 1851. The index is updated with new data on a daily basis and can be found here.
Figure 1: The chart shows moving average of NYT news sentiment values since 1900; higher
values indicate more positive sentiment, and lower values indicate more negative sentiment.
Gray bars indicate NBER recession dates.
Methodology
The construction of the index is achieved through the computation of a daily sentiment score, which is based on an analysis of all articles published by The New York Times each respective day.
The first step is to analyze each article published that day by employing a two-step classification process. First, a fined tuned transformer model classifies article into either the Economic or Other category. Second, each article classified as Economic is classified by a second transformer as either Positive, Neutral, or Negative based on its sentiment.
The second step is to compute a sentiment score for each day, which is done by comparing the number of Positive and Negative headlines. Neutral headlines are simply ignored.
The sentiment score is computed by:
- Counting the number of headlines tagged Positive ($N{pos}$) and Negative ($N{neg}$).
- Applying the formula:
math
\text{Sentiment} = \frac{N_{pos} - N_{neg}}{N_{pos} + N_{neg}}
Where $N$ is the number of Economic headlines classified as either Positive or Negative.
The final index, as demonstrated in Figure 1, is made by smoothing the sentiment score with a 100-day exponential moving average, and subsequently detrending it by subtracting a 7-year simple moving average, and adding a constant $0.5$. This technique allows for the removal of any long term fluctuations that could be attributed to evolving journalistic behaviors (e.g., the increasing prominence of negative news of all forms in the social media era), without compromising the index’s ability to capture sentiment throughout a business cycle (Usually 5-7 years).
Models
To classify each headline, two models are employed:
Both models are fine tuned transformers based on
xtremedistil-l12-h384-uncased,
a model originally published by Microsoft. These models were fine-tuned on labelled
datasets consisting of 300,000 and 600,000 headlines, respectively. The source code employed for
fine-tuning both models can be accessed here.
Data
The data used to create the index was gathered through the use of The New York Times API. It is plausible to construct similar indices derived from other news sources. However, The New York Times archive offers a distinct advantage due to its comprehensive historical range, coupled with its accessible format.
Citation
If you republish or redistribute any part of this work, please acknowledge its source by including the following citation:`
text
Håkon Magne Holmen. 2023. New York Times Sentiment Index. Version 0.1.0 https://github.com/hakonmh/NYT-Sentiment-Index
Running the Code
Clone the repository and install the required packages:
bash
git clone https://github.com/hakonmh/NYT-Sentiment-Index.git
cd NYT-Sentiment-Index
pip install -r requirements.txt
Get a New York Times API-key at https://developer.nytimes.com/.
You can use the API-key by either setting the NYT_API_KEY environment variable or changing the
NYT_API_KEY variable found in download.py
Finally, run the main.py script:
bash
python main.py
It should be noted that the New York Times API has a limit of 500 requests per day, meaning just
500 months of data can be downloaded per day. The script will crash once this limit is reached.
However, the script will start from where it left of when called again next time. You can change
the start date in main.py to download a shorter version of the index for testing purposes.
Owner
- Login: hakonmh
- Kind: user
- Repositories: 4
- Profile: https://github.com/hakonmh
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Holmen" given-names: "Håkon Magne" title: "New York Times Sentiment Index" version: 0.1.0 date-released: 2023-05-26 url: "https://github.com/hakonmh/NYT-Sentiment-Index"
GitHub Events
Total
Last Year
Dependencies
- pandas *
- pynytimes *
- torch *
- transformers *
- pandas >=1.4.0
- pynytimes *
- torch *
- transformers *
