Sentiment Analysis of Twitter Data (saotd)

Sentiment Analysis of Twitter Data (saotd) - Published in JOSS (2019)

https://github.com/evan-l-munson/saotd

Science Score: 95.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 5 DOI reference(s) in README and JOSS metadata
✓
Academic publication links
Links to: joss.theoj.org, zenodo.org
✓
Committers with academic emails
1 of 4 committers (25.0%) from academic institutions
○
Institutional organization owner
✓
JOSS paper metadata
Published in Journal of Open Source Software

Keywords

bing-lexicon latent-dirichlet-allocation n-grams plot r sentiment-analysis tidy-data topicanalysis tweets twitter-data

Scientific Fields

Engineering Computer Science - 60% confidence

Last synced: 6 months ago · JSON representation

Repository

Sentiment Analysis of Twitter Data (saotd)

Basic Info

Host: GitHub
Owner: evan-l-munson
Language: R
Default Branch: master
Homepage:
Size: 95.4 MB

Statistics

Stars: 12
Watchers: 3
Forks: 9
Open Issues: 0
Releases: 3

Topics

bing-lexicon latent-dirichlet-allocation n-grams plot r sentiment-analysis tidy-data topicanalysis tweets twitter-data

Created about 8 years ago · Last pushed over 1 year ago

Metadata Files

Readme Changelog

Sentiment Analysis of Twitter Data

$CRAN\_Status\_Badge$ GitHub last commit

Purpose

This package is focused on utilizing Twitter data due to its widespread global acceptance. The rapid expansion and acceptance of social media has opened doors into opinions and perceptions that were never as accessible as they are with today’s prevalence of mobile technology. Harvested Twitter data, analyzed for opinions and sentiment can provide powerful insight into a population. This insight can assist companies by letting them better understand their target population. The knowledge gained can also enable governments to better understand a population so they can make more informed decisions for that population. During the course of this research, data was acquired through the Public Twitter Application Programming Interface (API), to obtain Tweets as the foundation of data and will build a methodology utilizing a topic modeling and lexicographical approach to analyze the sentiment and opinions of text in English to determine a general sentiment such as positive or negative. The more people express themselves on social media, this application can be use1`d to gauge the general feeling of people.

Package

The saotd package is an R interface to the Twitter API and can be used to acquire Tweets based on user selected #hashtags and was developed utilizing a tidyverse approach. The package was designed to allow a user to conduct a complete analysis with the contained functions. The package will clean and tidy the Twitter data, determine the latent topics within the Tweets utilizing Latent Dirichlet Allocation (LDA), determine a sentiment score using the Bing lexicon dictionary and output visualizations.

Installation

You can install the CRAN version using:

r install.packages("saotd")

You can install the development version from GitHub using:

r install.packages("devtools") devtools::install_github('evan-l-munson/saotd', build_vignettes = TRUE)

Using saotd

The functions that are provided by saotd are broken down into five different categories: Acquire, Explore, Topic Analysis, Sentiment Calculation, and Visualizations.

Acquire
- tweet_acquire allows a user to acquire Tweets of their choosing by accessing the Twitter API. In order to do this the user needs to have a Twitter account. Additionally once the user has an account they will then need to sign up for a Twitter Developers account. Once a user has a Twitter developers account and has received their individual consumer key, consumer secret key, access token, and access secret key, they can acquire Tweets based on a list of hashtags and a requested number of entries per hashtag.
Explore
- tweet_tidy removes all emoticons, punctuation, weblinks, etc and converts converts the data to a tidy structure.
- merge_terms merges terms within a dataframe and prevents redundancy in the analysis.
- unigram displays the text Uni-Grams within the Twitter data in sequence from the most used to the least used. A Uni-Gram is a single word.
- bigram displays the text Bi-Grams within the Twitter data in sequence from the most used to the least used. A Bi-Gram is a combination of two consecutive words.
- trigram displays the text Tri-Grams within the Twitter data in sequence from the most used to the least used. A Tri-Gram is a combination of three consecutive words.
- bigram_network Bi-Gram networks builds on computed Bi-Grams. Bi-Gram networks serve as a visualization tool that displays the relationships between the words simultaneously as opposed to a tabular display of Bi-Gram words.
- word_corr displays the word correlation between words.
- word_corr_network displays the mutual relationship between words. The correlation network shows higher correlations with a thicker and darker edge color.
Topic Analysis
- number_topics determines the optimal number of Latent topics within a dataframe by tuning the Latent Dirichlet Allocation (LDA) model parameters. Uses the ldatuning package and outputs an ldatuning plot. This process can be time consuming depending on the size of the dataframe.
- tweet_topics determines the Latent topics within a dataframe by using Latent Dirichlet Allocation (LDA) model parameters. Uses the ldatuning package and outputs an ldatuning plot. Prepares Tweet text, creates DTM, conducts LDA, display data terms associated with each topic.
Sentiment Calculation
- tweet_scores calculates the Sentiment Scores using the Bing Lexicon Dictionary that will account for sentiment by hashtag or topic.
- posneg_words determines and displays the most positive and negative words within the Twitter data.
- tweet_min_scores determines the minimum scores for either the entire dataset or the minimum scores associated with a hashtag or topic analysis.
- tweet_max_scores determines the maximum scores for either the entire dataset or the maximum scores associated with a hashtag or topic analysis.
Visualizations
- tweet_corpus_distribution determines the scores distribution for the entire Twitter data corpus.
- tweet_distribution determines the scores distribution by hashtag or topic for Twitter data.
- tweet_box displays the distribution scores of either hashtag or topic Twitter data.
- tweet_violin displays the distribution scores of either hashtag or topic Twitter data.
- tweet_time displays how the Twitter data sentiment scores through time.
- tweet_worldmap function is not longer exported, as the Twitter data does not contain latitude and longitude values. Displays the location of a Tweet across the globe by hashtag or topic.

Example

For an example of how to use this package, find the vignette at:

r library(saotd) utils::vignette("saotd")

Getting help

If you encounter a clear bug, please file a minimal reproducible example on github.

Contributing

If you would like to contribute, please create a Pull Request and make appropriate applicable changes for review.

References

Owner

Name: Evan Munson
Login: evan-l-munson
Kind: user
Location: Anchorage, AK
Company: U.S. Army

Twitter: spot2ring
Repositories: 2
Profile: https://github.com/evan-l-munson

Operations Research Systems Analyst (ORSA) and Data Scientist for the U.S. Army 's 11th Airborne Division.

JOSS Publication

Sentiment Analysis of Twitter Data (saotd)

Published

February 27, 2019

DOI

10.21105/joss.00764

Volume 4, Issue 34, Page 764

Authors

Evan L. Munson

Air Force Institure of Technology

Christopher M. Smith

Air Force Institure of Technology

Bradley C. Boehmke

Air Force Institure of Technology

Jason K. Freels

Air Force Institure of Technology

Editor

Arfon Smith

GitHub Events

Total

Last Year

Committers

Last synced: 7 months ago

All Time

Total Commits: 458
Total Committers: 4
Avg Commits per committer: 114.5
Development Distribution Score (DDS): 0.009

Past Year

Commits: 1
Committers: 1
Avg Commits per committer: 1.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
evan-l-munson	e**n@g**m	454
seanstuntz	s**z@g**m	2
kbenoit	k**t@l**k	1
Arfon Smith	a****n	1

Committer Domains (Top 20 + Academic)

lse.ac.uk: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 6
Total pull requests: 10
Average time to close issues: 7 months
Average time to close pull requests: about 1 month
Total issue authors: 5
Total pull request authors: 6
Average comments per issue: 1.33
Average comments per pull request: 1.0
Merged pull requests: 8
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

seanstuntz (2)
nuhorchak (1)
Auburngrads (1)
omnep1 (1)
djfgerber (1)

Pull Request Authors

evan-l-munson (5)
seanstuntz (1)
nuhorchak (1)
arfon (1)
whaleshark16 (1)
kbenoit (1)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- cran 207 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 3
Total maintainers: 1

cran.r-project.org: saotd

Sentiment Analysis of Twitter Data

Homepage: https://github.com/evan-l-munson/saotd
Documentation: http://cran.r-project.org/web/packages/saotd/saotd.pdf
License: GPL-2 | GPL-3 [expanded from: GPL (≥ 2)]
Status: removed
Latest release: 0.3.1
published over 2 years ago

Versions: 3
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 207 Last month

Rankings

Forks count: 7.7%

Stargazers count: 17.9%

Dependent packages count: 29.8%

Average: 30.8%

Dependent repos count: 35.5%

Downloads: 63.0%

Maintainers (1)

evan.l.munson@gmail.com

Last synced: 6 months ago

Dependencies

DESCRIPTION cran

R >= 3.5.0 depends
dplyr * imports
ggplot2 * imports
ggraph * imports
igraph * imports
ldatuning * imports
lubridate * imports
magrittr * imports
reshape2 * imports
rtweet * imports
scales * imports
stats * imports
stringr * imports
tidyr * imports
tidytext * imports
topicmodels * imports
utils * imports
widyr * imports
base64enc * suggests
covr * suggests
httr * suggests
knitr * suggests
rmarkdown * suggests
testthat >= 3.0.0 suggests
tibble * suggests

Sentiment Analysis of Twitter Data (saotd)

Science Score: 95.0%

Keywords

Scientific Fields

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Sentiment Analysis of Twitter Data

Purpose

Package

Installation

Using saotd

Example

Meta

Getting help

Contributing

References

Owner

JOSS Publication

Sentiment Analysis of Twitter Data (saotd)

Authors

Editor

Tags

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

cran.r-project.org: saotd

Rankings

Maintainers (1)

Dependencies