https://github.com/bgonzalezbustamante/twitter_ideology

Estimating Ideological Positions with Twitter Data

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 2 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.6%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Estimating Ideological Positions with Twitter Data

Basic Info

Host: GitHub
Owner: bgonzalezbustamante
License: gpl-2.0
Default Branch: master
Homepage:
Size: 4.05 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Fork of pablobarbera/twitter_ideology

Created over 6 years ago · Last pushed almost 7 years ago

https://github.com/bgonzalezbustamante/twitter_ideology/blob/master/

Estimating Ideological Positions with Twitter Data ---------------- This GitHub repository contains code and materials related to the article "[Birds of a Feather Tweet Together. Bayesian Ideal Point Estimation Using Twitter Data](http://pan.oxfordjournals.org/content/23/1/76.full)," published in Political Analysis in 2015. The original replication code can be found in the `replication` folder. See also [Dataverse](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/26589) for the full replication materials, including data and output. As an application of the method, in June 2015 I wrote a blog post on The Monkey Cage / Washington Post entitled ["Who is the most conservative Republican candidate for president?."](http://www.washingtonpost.com/blogs/monkey-cage/wp/2015/06/16/who-is-the-most-conservative-republican-candidate-for-president/) The replication code for the figure in the post is available in the `primary` folder. Finally, this repository also contains an R package (`tweetscores`) with several functions to facilitate the application of this method in future research. The rest of this README file provides a tutorial with instructions showing how to use it

Authentication

In order to download data from Twitters API, the first step is to create an authentication token. In order to do so, its necessary to follow these steps:

1 - Go to apps.twitter.com and sign in

2 - Click on Create New App

3 - Fill name, description, and website (it can be anything, even google.com), and make sure you leave Callback URL empty

4 - Agree to user conditions

5 - Copy consumer key and consumer secret and paste below

install.packages("ROAuth")
library(ROAuth)
requestURL <- "https://api.twitter.com/oauth/request_token"
accessURL <- "https://api.twitter.com/oauth/access_token"
authURL <- "https://api.twitter.com/oauth/authorize"
consumerKey <- "XXXXXXXXXXXX"
consumerSecret <- "YYYYYYYYYYYYYYYYYYY"
my_oauth <- OAuthFactory$new(consumerKey=consumerKey, consumerSecret=consumerSecret, 
    requestURL=requestURL, accessURL=accessURL, authURL=authURL)

6 - Run this line and go to the URL that appears on screen

my_oauth$handshake(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl"))

7 - Copy and paste the PIN number (6 digits) on the R console

8 - Change current folder into a folder where you will save all your tokens

setwd("~/Dropbox/credentials/twitter")

9 - Now you can save oauth token for use in future sessions with R

save(my_oauth, file="my_oauth")

Installing the `tweetscores` package

The following code will install the tweetscores package, as well as all other R packages necessary for the functions to run.

toInstall <- c("ggplot2", "scales", "R2WinBUGS", "devtools", "yaml", "httr", "RJSONIO")
install.packages(toInstall, repos = "http://cran.r-project.org")
library(devtools)
install_github("pablobarbera/twitter_ideology/pkg/tweetscores")

Estimating the ideological positions of a US Twitter user

We can now go ahead and estimate ideology for any Twitter users in the US. In order to do so, the package includes pre-estimated ideology for political accounts and media outlets, so here were just replicating the second stage in the method that is, estimating a users ideology based on the accounts they follow.

# load package
library(tweetscores)

# downloading friends of a user
user <- "p_barbera"
friends <- getFriends(screen_name=user, oauth="~/Dropbox/credentials/twitter")

## /Users/pablobarbera/Dropbox/credentials/twitter/oauth_token_32 
## 15  API calls left
## 1065 friends. Next cursor:  0 
## 14  API calls left

# estimate ideology with MCMC method
results <- estimateIdeology(user, friends)

## p_barbera follows 11 elites: nytimes maddow caitlindewey carr2n fivethirtyeight 
NickKristof nytgraphics nytimesbits NYTimeskrugman nytlabs thecaucus
## Chain 1
  |=================================================================| 100%
## Chain 2
  |=================================================================| 100%

Once we have this set of estimates, we can analyze them with a series of built-in functions.

# summarizing results
summary(results)

##        mean   sd  2.5%   25%   50%   75% 97.5% Rhat n.eff
## beta  -2.30 0.57 -3.37 -2.72 -2.25 -1.92 -1.26 1.02   200
## theta -1.78 0.30 -2.28 -1.99 -1.82 -1.59 -1.11 1.00   200

# assessing chain convergence using a trace plot
tracePlot(results, "theta")

# comparing with other ideology estimates
plot(results)

Faster ideology estimation

The previous function relies on a Metropolis-Hastings sampling algorithm to estimate ideology. However, we can also use Maximum Likelihood estimation to compute the distribution of the latent parameters. This method is much faster, since its not sampling from the posterior distribution of the parameters, but it will tend to give smaller standard errors. However, overall the results should be almost identical. (See here for the actual estimation functions for each of these two approaches.)

# faster estimation using maximum likelihood
results <- estimateIdeology(user, friends, method="MLE")

## p_barbera follows 11 elites: nytimes maddow caitlindewey carr2n fivethirtyeight 
NickKristof nytgraphics nytimesbits NYTimeskrugman nytlabs thecaucus

summary(results)

##        mean   sd  2.5%   25%   50%   75% 97.5% Rhat n.eff
## beta  -2.30 0.57 -3.37 -2.72 -2.25 -1.92 -1.26 1.02   200
## theta -1.78 0.30 -2.28 -1.99 -1.82 -1.59 -1.11 1.00   200

Estimation using correspondence analysis

One limitation of the previous method is that users need to follow at least one political account. To partially overcome this problem, in a recently published article in Psychological Science, we add a third stage to the model where we add additional accounts (not necessarily political) followed predominantely by liberal or by conservative users, under the assumption that if other users also follow this same set of accounts, they are also likely to be liberal or conservative. To reduce computational costs, we rely on correspondence analysis to project all users onto the latent ideological space (see Supplementary Materials), and then we normalize all the estimates so that they follow a normal distribution with mean zero and standard deviation one. This package also includes a function that reproduces the last stage in the estimation, after all the additional accounts have been added:

# estimation using correspondence analysis
results <- estimateIdeology2(user, friends)

## p_barbera follows 22 elites: andersoncooper, billclinton, BreakingNews, 
## cnnbrk, davidaxelrod, Gawker, HillaryClinton, maddow, MaddowBlog, mashable, mattyglesias,
## NateSilver538, NickKristof, nytimes, NYTimeskrugman, repjoecrowley, RonanFarrow, 
## SCOTUSblog, StephenAtHome, TheDailyShow, TheEconomist, UniteBlue

results

## [1] -1.06158

Additional functions

The package also contains additional functions that I use in my research, which Im providing here in case they are useful:

scrapeCongressData is a scraper of the list of Twitter accounts for Members of the US congress from the unitedstates Github account.
getUsersBatch scrapes user information for more than 100 Twitter users from Twitters REST API.
getFollower scrapes followers lists from Twitter REST API.
CA is a modified version of the ca function in the ca package (available on CRAN) that computes simple correspondence analysis with a much lower memory usage.
supplementaryColumns and supplementaryRows takes additional columns of a follower matrix and projects them to the latent ideological space using the parameters of an already-fitted correspondence analysis model.
getCreated returns the approximate date in which a Twitter account was created based on its Twitter ID. In combination with estimatePastFollowers and estimateDateBreaks, it can be used to infer past Twitter follower networks.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/bgonzalezbustamante/twitter_ideology

Science Score: 13.0%

Repository

Basic Info

Statistics

https://github.com/bgonzalezbustamante/twitter_ideology/blob/master/

Authentication

Installing the `tweetscores` package

Estimating the ideological positions of a US Twitter user

Faster ideology estimation

Estimation using correspondence analysis

Additional functions

Owner

GitHub Events

Total

Last Year

https://github.com/bgonzalezbustamante/twitter_ideology

Science Score: 13.0%

Repository

Basic Info

Statistics

https://github.com/bgonzalezbustamante/twitter_ideology/blob/master/

Authentication

Installing the tweetscores package

Estimating the ideological positions of a US Twitter user

Faster ideology estimation

Estimation using correspondence analysis

Additional functions

Owner

GitHub Events

Total

Last Year

Installing the `tweetscores` package