https://github.com/animesh/analyzing-twitter-trends-on-covid-19-vaccinations
A quantitative study comprising Twitter discussions and thematic analysis for COVID-19 vaccines
https://github.com/animesh/analyzing-twitter-trends-on-covid-19-vaccinations
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 2 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (6.7%) to scientific vocabulary
Last synced: 10 months ago
·
JSON representation
Repository
A quantitative study comprising Twitter discussions and thematic analysis for COVID-19 vaccines
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Fork of rashidesai24/Analyzing-Twitter-Trends-On-COVID-19-Vaccinations
Created about 4 years ago
· Last pushed about 4 years ago
https://github.com/animesh/Analyzing-Twitter-Trends-On-COVID-19-Vaccinations/blob/main/
# Analyzing-Twitter-Trends-On-COVID-19-Vaccinations A quantitative study comprising Twitter discussions and thematic analysis for COVID-19 vaccines
Paper published at: https://infodemiology.jmir.org/2022/1/e33909 ## TABLE OF CONTENTS * [Background](#background) * [Objective](#objective) * [Tools and Packages](#tools) * [Data Collection](#data-collection) * [Data Pre-Processing](#data-preprocessing) * [Data Modeling](#data-modeling) * [Data Visualization](#data-visualization) * [Results](#results) * [Conclusion](#conclusion) * [References](#references) * [Challenges and Future Work](#challenges-and-futurework)
## BACKGROUND The COVID-19 pandemic has killed 3.2 million people and infected 150 million around the world as of April 30, 2021. Growing human rights concerns, vaccine movements, and skepticism towards the vaccines, its effects and efficacy have resulted in a multitude of conversations on social media and the process of vaccination becoming a complicated task. No major studies have been conducted to analyze peoples perception of COVID-19 vaccines on social media for the year 2021
## OBJECTIVE
## TOOLS
| Task | Technique | Tools/Packages Used |
|---|---|---|
| Data Collection | Tweet extraction from Twitter | snscrape |
| Data Pre-processing | Removed punctuation, stopwords, URLs, emojis, lemmatization | re, nltk,CountVectorizer, pandas, numpy |
| Data Modeling | Unsupervised LDA | pyLDAvis.sklearn, LatentDirichletAllocation, sklearn |
| Text Analytics | Topic Modeling, Sentiment analysis | vaderSentiment, corextopic |
| Data Visualization | Multi-attribute plots | matplotlib, seaborn, Tableau, wordcloud |
| Environments & Platforms | MS Excel, Google Colab, Jupyter Notebook, Twitter |
## DATA-COLLECTION
| Method | Notes |
|---|---|
| Tweepy | 3200 tweets; no historical data |
| GetOldTweets3 | Twitter has removed the endpoint the GetOldTweets3 uses |
| TWINT | Twitter throws a more strict device + IP-ban after a certain amount of queries |
| snscrape | Scrapped 100K tweets - 96,641 English tweets |
| Octoparse (software) | Very time consuming with the event loop |
Data Collection: Identifying COVID-19 Vaccines Content
Data Coverage:
With covid vaccine as the search terms, we believe that our set of keywords provides reasonable coverage and is representative of tweets communicating about COVID-19 vaccinesIndividual tweets = 2.1 million
Organizational tweets = 0.59 million
## DATA-PREPROCESSING Data Cleaning
Individual vs Organizational Tweets
## DATA-MODELING
Unsupervised LDA
To understand the abstract topics hidden in the tweets unsupervised LDA technique was implemented using the library 'pyLDAvis'. We discovered 18 different topics considering the cluster size and no overlapping amongst the clustersSentiment Analysis
Sentiment analysis is a supervised machine learning problem with different types of analysis. We considered a fine-grained sentiment classification with five levels of sentiments - overly positive, positive, neutral, negative, and overly negative. We used VADER (Valence Aware Dictionary for Sentiment Reasoning) as a rule-based model to examine the impact of COVID-19 vaccine on the attitude of Twitter users during the pandemic.CorEx
Correlation Explanation (CorEx) provides a flexible framework for learning topics that are maximally informative about a corpus of text. Through anchor words, we seeded and guided the topic model towards topics of substantive interest, which allowed us to interact with and refine topics in a way that is not possible with traditional topic models. Normalized Topic Correlation (NTC) represents the correlations within an individual document explained by a particular topic.## DATA-VISUALIZATION
Unsupervised LDA
Trends in Sentiment Analysis
Distribution of Sentiments
Vaccine Conversation Trends
Popular Topics
## RESULTS
## CONCLUSION This study focused on demonstrating the conversations around COVID-19 vaccines on Twitter using a dataset created with tweets from individuals leveraging Machine Learning and Text Analytics approach. We performed exploratory data analysis using Unsupervised LDA to identify initial implicit topics. The dataset was further analysed for positive and negative sentiments. We further performed topic modeling for a deeper understanding of topics and their popularity across time.
## REFERENCES
## CHALLENGES-AND-FUTUREWORK Challenges : Identifying package for tweet scraping and recognizing limitations on extraction, large execution times and runtime errors due to memory limitation for parts of data modeling
Future Work
This project was made in collaboration with Harsh Shah and Vivek Kumar, do check out some of the amazing projects they've worked on.
Owner
- Name: Ani
- Login: animesh
- Kind: user
- Location: Norway
- Company: Norwegian University of Science and Technology
- Website: https://www.fuzzylife.org
- Twitter: animesh1977
- Repositories: 749
- Profile: https://github.com/animesh
A medical graduate from Delhi University with post-graduation in bioinformatics from Jawaharlal Nehru University, India.