https://github.com/chainsawriot/ica2020what

Last synced: 10 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: chainsawriot
Default Branch: master
Size: 3.01 MB

Statistics

Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Releases: 0

Created over 6 years ago · Last pushed over 6 years ago

Metadata Files

Readme

README.Rmd

---
title: "An analysis of the ICA 2020 program"
author:
  - Chung-hong Chan ^[University of Mannheim]
output:
  github_document:
    toc: true
    toc_depth: 1
---


```{r prepro1}
require(tidyverse)
require(textclean)
require(quanteda)

ica_raw <- rio::import('70th Annual ICA Conference_29FEB2020.csv') %>% as_tibble
ica_raw
```

```{r}
colnames(ica_raw)
```

I am only interested in a few columns.

```{r prepro2}
colnames(ica_raw)[2] <- "event_type"
colnames(ica_raw)[3] <- "event_group"
colnames(ica_raw)[5] <- "start_time"
colnames(ica_raw)[8] <- "event_info"
colnames(ica_raw)[17] <- "abstract"
ica_raw %>% count(event_type) -> all_event_types
### I am only interested in these sessions.
all_event_types[c(5, 6, 7, 8, 9, 11, 14),]
```

```{r prepro3, results = 'asis'}
##probably not the cleanest.

ica_raw %>% filter(event_type %in% all_event_types$event_type[c(5, 6, 7, 8, 9, 11, 14)]) %>% mutate(abstract = str_remove(replace_html(abstract), "^Abstracts? ?B?o?d?y?:? ?")) %>% filter(abstract != "") -> ica
ica %>% count(event_group, sort = TRUE) %>% add_count(wt = n, name = "total") %>% mutate(percent = round((n / total) * 100,2)) %>% select(-total) %>% knitr::kable()
```

```{r prepro4}
abstract_corpus <- corpus(ica$abstract)
docvars(abstract_corpus, "group") <- ica$event_group
```

```{r prepro5}
dfm(abstract_corpus, tolower = TRUE, stem = TRUE, remove_punct = TRUE, remove_symbols = TRUE, remove = stopwords('en')) %>% dfm_select("^[A-Za-z]+$", valuetype = 'regex') -> abstract_dfm
```

Top Features of all ICA abstracts.

```{r}
topfeatures(abstract_dfm, n = 50)
```

# What the "big 5" divisions are writing?

## Health Communication

```{r hc}
textstat_keyness(abstract_dfm, target = docvars(abstract_dfm, "group") == "Health Communication") %>% textplot_keyness
```

## CAT

```{r cat}
textstat_keyness(abstract_dfm, target = docvars(abstract_dfm, "group") == "Communication and Technology") %>% textplot_keyness
```

## JSD

```{r jsd}
textstat_keyness(abstract_dfm, target = docvars(abstract_dfm, "group") == "Journalism Studies") %>% textplot_keyness
```

## POLCOM

```{r polcom}
textstat_keyness(abstract_dfm, target = docvars(abstract_dfm, "group") == "Political Communication") %>% textplot_keyness
```

## MASSCOM

```{r masscom}
textstat_keyness(abstract_dfm, target = docvars(abstract_dfm, "group") == "Mass Communication") %>% textplot_keyness
```

and of course, 

## Computational methods

```{r comp}
textstat_keyness(abstract_dfm, target = docvars(abstract_dfm, "group") == "Computational Methods") %>% textplot_keyness
```

and

## theme

```{r theme}
textstat_keyness(abstract_dfm, target = docvars(abstract_dfm, "group") == "Theme") %>% textplot_keyness
```

# Similarity between groups

```{r}
uni_groups <- unique(docvars(abstract_dfm, "group"))
group_dfm <- map(uni_groups, ~ apply(dfm_subset(abstract_dfm, group == .), 2, sum))
names(group_dfm) <- uni_groups

## How similar is PolCom 3 and JSD 12

require(lsa)
## So ugly
cosine(group_dfm['Political Communication'][[1]], group_dfm['Journalism Studies'][[1]])
```

```{r}
## Polcom 3 and Comm Law 4
cosine(group_dfm['Political Communication'][[1]], group_dfm['Communication Law and Policy'][[1]])
```

```{r}
t(combn(uni_groups, 2)) %>% as_tibble(.name_repair = "minimal") -> pairs
colnames(pairs) <- c('gp1', 'gp2')

get_cosine <- function(gp1, gp2, group_dfm) {
    cosine(group_dfm[gp1][[1]], group_dfm[gp2][[1]])[1,1]
}

get_cosine("Political Communication", "Theme", group_dfm)
```

```{r network}
pairs %>% mutate(weight = map2_dbl(gp1, gp2, get_cosine, group_dfm = group_dfm)) -> pairs
require(igraph)

ica_graph <- graph_from_data_frame(pairs, directed = FALSE)
### Not informative at all
plot(ica_graph)
```

Most similar

```{r , results = 'asis'}
pairs %>% arrange(desc(weight)) %>% head(n = 50) %>% knitr::kable()
```

# Newsmap: US-Centrism

Classify all abstracts by the geographical prediction algorithm by Watanabe (2017) ^[Watanabe, K. (2018). Newsmap: A semi-supervised approach to geographical news classification. Digital Journalism, 6(3), 294-309.].

```{r}
require(newsmap)
toks <- tokens(abstract_corpus, remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE) %>% tokens_tolower %>% tokens_remove(stopwords('english'), valuetype = 'fixed', padding = TRUE)
country_label <- tokens_lookup(toks, data_dictionary_newsmap_en, levels = 3)
dfmt_label <- dfm(country_label)

dfmt_feat <- dfm(toks) %>% dfm_select(pattern = "^[a-z]", selection = "keep", valuetype = "regex")

model <- textmodel_newsmap(dfmt_feat, dfmt_label)
coef(model, n = 7)[c("us", "gb", "de", "br", "jp", "hk", "cn")]
```

How US-centric is ICA?

```{r, results = "asis"}
country <- predict(model)
tibble(country) %>% count(country, sort = TRUE) %>% add_count(wt = n, name = 'totaln') %>% mutate(percent = round((n / totaln) * 100, 2)) %>% select(country, percent) %>% knitr::kable()
```

Rank division/IG by percetnage of non-US abstracts

```{r, results = "asis"}
tibble(group = ica$event_group, country) %>% mutate(nonus = country != "us") %>% group_by(group) %>% summarise(totalnonus = sum(nonus), n = n()) %>% mutate(percent = round((totalnonus / n) * 100)) %>% arrange(desc(percent)) %>% knitr::kable()
```

# Which groups need more chairs?

```{r}
str_extract(ica$event_info, "Expected Attendance:[0-9]+")
```

```{r}
ica$expected_attendance <- str_extract(ica$event_info, "Expected Attendance:[0-9]+") %>% str_extract("[0-9]+") %>% as.numeric
```

```{r, results = "asis"}
ica %>% group_by(event_group) %>% summarise(mean_ea  = mean(expected_attendance, na.rm = TRUE), media_ea = median(expected_attendance, na.rm = TRUE)) %>% arrange(desc(mean_ea)) %>% knitr::kable()
```

```{r chairs}
ica %>% group_by(event_group) %>% summarise(mean_ea  = mean(expected_attendance, na.rm = TRUE), media_ea = median(expected_attendance, na.rm = TRUE)) %>% arrange(desc(mean_ea)) %>% ggplot(aes(x = fct_reorder(event_group, mean_ea), y = mean_ea)) + geom_bar(stat = 'identity') + coord_flip() + xlab("Group") + ylab("Mean expected attendance")
```

Is there a US-bias in expected attendance (after adjusted for group and type)?

```{r}
require(MASS)
ica$country <- predict(model) == "us"
glm.nb(expected_attendance~event_group+event_type+country, data = ica) %>% summary
```

Not significant.

# Why some presentations are early in the morning?

```{r, results = "asis"}
ica %>% count(start_time) %>% knitr::kable()
```

Starting time by group.

```{r starting}

time_conv <- function(date) {
    res <- str_split(date, "[: ]")[[1]]
    s <- 0
    if (res[[3]] == "PM" & res[[1]] != "12") {
        s <- s + (12 * 60)
    }
    s <- s + (as.numeric(res[[1]]) * 60) + as.numeric(res[[2]])
    return(s)
}

ica %>% mutate(start_time2 = map_dbl(start_time, time_conv)) -> ica

ica %>% group_by(event_group) %>% summarise(mean_start_time = mean(start_time2)) %>% arrange(mean_start_time) %>% ggplot(aes(x = fct_reorder(event_group, mean_start_time), y = mean_start_time)) + geom_bar(stat = 'identity') + coord_flip() + xlab("Group") + ylab("Mean starting time")
```

Is there a US-bias in starting time (after adjusted for group and type)?


```{r}
require(MASS)
ica$country <- predict(model) == "us"
glm.nb(start_time2~event_group+event_type+country, data = ica) %>% summary
```

Not significant.

Owner

Login: chainsawriot
Kind: user
Location: Germany
Company: @gesistsa

Website: http://www.chainsawriot.com
Repositories: 241
Profile: https://github.com/chainsawriot

GitHub Events

Total

Last Year

Issues and Pull Requests

Last synced: over 1 year ago

All Time

Total issues: 0
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/chainsawriot/ica2020what

Science Score: 13.0%

Repository

Basic Info

Statistics

Metadata Files

README.Rmd

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels