https://github.com/chainsawriot/ica2020what
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (3.6%) to scientific vocabulary
Last synced: 10 months ago
·
JSON representation
Repository
Basic Info
- Host: GitHub
- Owner: chainsawriot
- Default Branch: master
- Size: 3.01 MB
Statistics
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 0
Created over 6 years ago
· Last pushed over 6 years ago
Metadata Files
Readme
README.Rmd
---
title: "An analysis of the ICA 2020 program"
author:
- Chung-hong Chan ^[University of Mannheim]
output:
github_document:
toc: true
toc_depth: 1
---
```{r prepro1}
require(tidyverse)
require(textclean)
require(quanteda)
ica_raw <- rio::import('70th Annual ICA Conference_29FEB2020.csv') %>% as_tibble
ica_raw
```
```{r}
colnames(ica_raw)
```
I am only interested in a few columns.
```{r prepro2}
colnames(ica_raw)[2] <- "event_type"
colnames(ica_raw)[3] <- "event_group"
colnames(ica_raw)[5] <- "start_time"
colnames(ica_raw)[8] <- "event_info"
colnames(ica_raw)[17] <- "abstract"
ica_raw %>% count(event_type) -> all_event_types
### I am only interested in these sessions.
all_event_types[c(5, 6, 7, 8, 9, 11, 14),]
```
```{r prepro3, results = 'asis'}
##probably not the cleanest.
ica_raw %>% filter(event_type %in% all_event_types$event_type[c(5, 6, 7, 8, 9, 11, 14)]) %>% mutate(abstract = str_remove(replace_html(abstract), "^Abstracts? ?B?o?d?y?:? ?")) %>% filter(abstract != "") -> ica
ica %>% count(event_group, sort = TRUE) %>% add_count(wt = n, name = "total") %>% mutate(percent = round((n / total) * 100,2)) %>% select(-total) %>% knitr::kable()
```
```{r prepro4}
abstract_corpus <- corpus(ica$abstract)
docvars(abstract_corpus, "group") <- ica$event_group
```
```{r prepro5}
dfm(abstract_corpus, tolower = TRUE, stem = TRUE, remove_punct = TRUE, remove_symbols = TRUE, remove = stopwords('en')) %>% dfm_select("^[A-Za-z]+$", valuetype = 'regex') -> abstract_dfm
```
Top Features of all ICA abstracts.
```{r}
topfeatures(abstract_dfm, n = 50)
```
# What the "big 5" divisions are writing?
## Health Communication
```{r hc}
textstat_keyness(abstract_dfm, target = docvars(abstract_dfm, "group") == "Health Communication") %>% textplot_keyness
```
## CAT
```{r cat}
textstat_keyness(abstract_dfm, target = docvars(abstract_dfm, "group") == "Communication and Technology") %>% textplot_keyness
```
## JSD
```{r jsd}
textstat_keyness(abstract_dfm, target = docvars(abstract_dfm, "group") == "Journalism Studies") %>% textplot_keyness
```
## POLCOM
```{r polcom}
textstat_keyness(abstract_dfm, target = docvars(abstract_dfm, "group") == "Political Communication") %>% textplot_keyness
```
## MASSCOM
```{r masscom}
textstat_keyness(abstract_dfm, target = docvars(abstract_dfm, "group") == "Mass Communication") %>% textplot_keyness
```
and of course,
## Computational methods
```{r comp}
textstat_keyness(abstract_dfm, target = docvars(abstract_dfm, "group") == "Computational Methods") %>% textplot_keyness
```
and
## theme
```{r theme}
textstat_keyness(abstract_dfm, target = docvars(abstract_dfm, "group") == "Theme") %>% textplot_keyness
```
# Similarity between groups
```{r}
uni_groups <- unique(docvars(abstract_dfm, "group"))
group_dfm <- map(uni_groups, ~ apply(dfm_subset(abstract_dfm, group == .), 2, sum))
names(group_dfm) <- uni_groups
## How similar is PolCom 3 and JSD 12
require(lsa)
## So ugly
cosine(group_dfm['Political Communication'][[1]], group_dfm['Journalism Studies'][[1]])
```
```{r}
## Polcom 3 and Comm Law 4
cosine(group_dfm['Political Communication'][[1]], group_dfm['Communication Law and Policy'][[1]])
```
```{r}
t(combn(uni_groups, 2)) %>% as_tibble(.name_repair = "minimal") -> pairs
colnames(pairs) <- c('gp1', 'gp2')
get_cosine <- function(gp1, gp2, group_dfm) {
cosine(group_dfm[gp1][[1]], group_dfm[gp2][[1]])[1,1]
}
get_cosine("Political Communication", "Theme", group_dfm)
```
```{r network}
pairs %>% mutate(weight = map2_dbl(gp1, gp2, get_cosine, group_dfm = group_dfm)) -> pairs
require(igraph)
ica_graph <- graph_from_data_frame(pairs, directed = FALSE)
### Not informative at all
plot(ica_graph)
```
Most similar
```{r , results = 'asis'}
pairs %>% arrange(desc(weight)) %>% head(n = 50) %>% knitr::kable()
```
# Newsmap: US-Centrism
Classify all abstracts by the geographical prediction algorithm by Watanabe (2017) ^[Watanabe, K. (2018). Newsmap: A semi-supervised approach to geographical news classification. Digital Journalism, 6(3), 294-309.].
```{r}
require(newsmap)
toks <- tokens(abstract_corpus, remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE) %>% tokens_tolower %>% tokens_remove(stopwords('english'), valuetype = 'fixed', padding = TRUE)
country_label <- tokens_lookup(toks, data_dictionary_newsmap_en, levels = 3)
dfmt_label <- dfm(country_label)
dfmt_feat <- dfm(toks) %>% dfm_select(pattern = "^[a-z]", selection = "keep", valuetype = "regex")
model <- textmodel_newsmap(dfmt_feat, dfmt_label)
coef(model, n = 7)[c("us", "gb", "de", "br", "jp", "hk", "cn")]
```
How US-centric is ICA?
```{r, results = "asis"}
country <- predict(model)
tibble(country) %>% count(country, sort = TRUE) %>% add_count(wt = n, name = 'totaln') %>% mutate(percent = round((n / totaln) * 100, 2)) %>% select(country, percent) %>% knitr::kable()
```
Rank division/IG by percetnage of non-US abstracts
```{r, results = "asis"}
tibble(group = ica$event_group, country) %>% mutate(nonus = country != "us") %>% group_by(group) %>% summarise(totalnonus = sum(nonus), n = n()) %>% mutate(percent = round((totalnonus / n) * 100)) %>% arrange(desc(percent)) %>% knitr::kable()
```
# Which groups need more chairs?
```{r}
str_extract(ica$event_info, "Expected Attendance:[0-9]+")
```
```{r}
ica$expected_attendance <- str_extract(ica$event_info, "Expected Attendance:[0-9]+") %>% str_extract("[0-9]+") %>% as.numeric
```
```{r, results = "asis"}
ica %>% group_by(event_group) %>% summarise(mean_ea = mean(expected_attendance, na.rm = TRUE), media_ea = median(expected_attendance, na.rm = TRUE)) %>% arrange(desc(mean_ea)) %>% knitr::kable()
```
```{r chairs}
ica %>% group_by(event_group) %>% summarise(mean_ea = mean(expected_attendance, na.rm = TRUE), media_ea = median(expected_attendance, na.rm = TRUE)) %>% arrange(desc(mean_ea)) %>% ggplot(aes(x = fct_reorder(event_group, mean_ea), y = mean_ea)) + geom_bar(stat = 'identity') + coord_flip() + xlab("Group") + ylab("Mean expected attendance")
```
Is there a US-bias in expected attendance (after adjusted for group and type)?
```{r}
require(MASS)
ica$country <- predict(model) == "us"
glm.nb(expected_attendance~event_group+event_type+country, data = ica) %>% summary
```
Not significant.
# Why some presentations are early in the morning?
```{r, results = "asis"}
ica %>% count(start_time) %>% knitr::kable()
```
Starting time by group.
```{r starting}
time_conv <- function(date) {
res <- str_split(date, "[: ]")[[1]]
s <- 0
if (res[[3]] == "PM" & res[[1]] != "12") {
s <- s + (12 * 60)
}
s <- s + (as.numeric(res[[1]]) * 60) + as.numeric(res[[2]])
return(s)
}
ica %>% mutate(start_time2 = map_dbl(start_time, time_conv)) -> ica
ica %>% group_by(event_group) %>% summarise(mean_start_time = mean(start_time2)) %>% arrange(mean_start_time) %>% ggplot(aes(x = fct_reorder(event_group, mean_start_time), y = mean_start_time)) + geom_bar(stat = 'identity') + coord_flip() + xlab("Group") + ylab("Mean starting time")
```
Is there a US-bias in starting time (after adjusted for group and type)?
```{r}
require(MASS)
ica$country <- predict(model) == "us"
glm.nb(start_time2~event_group+event_type+country, data = ica) %>% summary
```
Not significant.
Owner
- Login: chainsawriot
- Kind: user
- Location: Germany
- Company: @gesistsa
- Website: http://www.chainsawriot.com
- Repositories: 241
- Profile: https://github.com/chainsawriot
GitHub Events
Total
Last Year
Issues and Pull Requests
Last synced: over 1 year ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0