https://github.com/chainsawriot/ica2020what

https://github.com/chainsawriot/ica2020what

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (3.6%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: chainsawriot
  • Default Branch: master
  • Size: 3.01 MB
Statistics
  • Stars: 0
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 6 years ago · Last pushed over 6 years ago
Metadata Files
Readme

README.Rmd

---
title: "An analysis of the ICA 2020 program"
author:
  - Chung-hong Chan ^[University of Mannheim]
output:
  github_document:
    toc: true
    toc_depth: 1
---


```{r prepro1}
require(tidyverse)
require(textclean)
require(quanteda)

ica_raw <- rio::import('70th Annual ICA Conference_29FEB2020.csv') %>% as_tibble
ica_raw
```

```{r}
colnames(ica_raw)
```

I am only interested in a few columns.

```{r prepro2}
colnames(ica_raw)[2] <- "event_type"
colnames(ica_raw)[3] <- "event_group"
colnames(ica_raw)[5] <- "start_time"
colnames(ica_raw)[8] <- "event_info"
colnames(ica_raw)[17] <- "abstract"
ica_raw %>% count(event_type) -> all_event_types
### I am only interested in these sessions.
all_event_types[c(5, 6, 7, 8, 9, 11, 14),]
```

```{r prepro3, results = 'asis'}
##probably not the cleanest.

ica_raw %>% filter(event_type %in% all_event_types$event_type[c(5, 6, 7, 8, 9, 11, 14)]) %>% mutate(abstract = str_remove(replace_html(abstract), "^Abstracts? ?B?o?d?y?:? ?")) %>% filter(abstract != "") -> ica
ica %>% count(event_group, sort = TRUE) %>% add_count(wt = n, name = "total") %>% mutate(percent = round((n / total) * 100,2)) %>% select(-total) %>% knitr::kable()
```

```{r prepro4}
abstract_corpus <- corpus(ica$abstract)
docvars(abstract_corpus, "group") <- ica$event_group
```

```{r prepro5}
dfm(abstract_corpus, tolower = TRUE, stem = TRUE, remove_punct = TRUE, remove_symbols = TRUE, remove = stopwords('en')) %>% dfm_select("^[A-Za-z]+$", valuetype = 'regex') -> abstract_dfm
```

Top Features of all ICA abstracts.

```{r}
topfeatures(abstract_dfm, n = 50)
```

# What the "big 5" divisions are writing?

## Health Communication

```{r hc}
textstat_keyness(abstract_dfm, target = docvars(abstract_dfm, "group") == "Health Communication") %>% textplot_keyness
```

## CAT

```{r cat}
textstat_keyness(abstract_dfm, target = docvars(abstract_dfm, "group") == "Communication and Technology") %>% textplot_keyness
```

## JSD

```{r jsd}
textstat_keyness(abstract_dfm, target = docvars(abstract_dfm, "group") == "Journalism Studies") %>% textplot_keyness
```

## POLCOM

```{r polcom}
textstat_keyness(abstract_dfm, target = docvars(abstract_dfm, "group") == "Political Communication") %>% textplot_keyness
```

## MASSCOM

```{r masscom}
textstat_keyness(abstract_dfm, target = docvars(abstract_dfm, "group") == "Mass Communication") %>% textplot_keyness
```

and of course, 

## Computational methods

```{r comp}
textstat_keyness(abstract_dfm, target = docvars(abstract_dfm, "group") == "Computational Methods") %>% textplot_keyness
```

and

## theme

```{r theme}
textstat_keyness(abstract_dfm, target = docvars(abstract_dfm, "group") == "Theme") %>% textplot_keyness
```

# Similarity between groups

```{r}
uni_groups <- unique(docvars(abstract_dfm, "group"))
group_dfm <- map(uni_groups, ~ apply(dfm_subset(abstract_dfm, group == .), 2, sum))
names(group_dfm) <- uni_groups

## How similar is PolCom 3 and JSD 12

require(lsa)
## So ugly
cosine(group_dfm['Political Communication'][[1]], group_dfm['Journalism Studies'][[1]])
```

```{r}
## Polcom 3 and Comm Law 4
cosine(group_dfm['Political Communication'][[1]], group_dfm['Communication Law and Policy'][[1]])
```

```{r}
t(combn(uni_groups, 2)) %>% as_tibble(.name_repair = "minimal") -> pairs
colnames(pairs) <- c('gp1', 'gp2')

get_cosine <- function(gp1, gp2, group_dfm) {
    cosine(group_dfm[gp1][[1]], group_dfm[gp2][[1]])[1,1]
}

get_cosine("Political Communication", "Theme", group_dfm)
```

```{r network}
pairs %>% mutate(weight = map2_dbl(gp1, gp2, get_cosine, group_dfm = group_dfm)) -> pairs
require(igraph)

ica_graph <- graph_from_data_frame(pairs, directed = FALSE)
### Not informative at all
plot(ica_graph)
```

Most similar

```{r , results = 'asis'}
pairs %>% arrange(desc(weight)) %>% head(n = 50) %>% knitr::kable()
```

# Newsmap: US-Centrism

Classify all abstracts by the geographical prediction algorithm by Watanabe (2017) ^[Watanabe, K. (2018). Newsmap: A semi-supervised approach to geographical news classification. Digital Journalism, 6(3), 294-309.].

```{r}
require(newsmap)
toks <- tokens(abstract_corpus, remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE) %>% tokens_tolower %>% tokens_remove(stopwords('english'), valuetype = 'fixed', padding = TRUE)
country_label <- tokens_lookup(toks, data_dictionary_newsmap_en, levels = 3)
dfmt_label <- dfm(country_label)

dfmt_feat <- dfm(toks) %>% dfm_select(pattern = "^[a-z]", selection = "keep", valuetype = "regex")

model <- textmodel_newsmap(dfmt_feat, dfmt_label)
coef(model, n = 7)[c("us", "gb", "de", "br", "jp", "hk", "cn")]
```

How US-centric is ICA?

```{r, results = "asis"}
country <- predict(model)
tibble(country) %>% count(country, sort = TRUE) %>% add_count(wt = n, name = 'totaln') %>% mutate(percent = round((n / totaln) * 100, 2)) %>% select(country, percent) %>% knitr::kable()
```

Rank division/IG by percetnage of non-US abstracts

```{r, results = "asis"}
tibble(group = ica$event_group, country) %>% mutate(nonus = country != "us") %>% group_by(group) %>% summarise(totalnonus = sum(nonus), n = n()) %>% mutate(percent = round((totalnonus / n) * 100)) %>% arrange(desc(percent)) %>% knitr::kable()
```

# Which groups need more chairs?

```{r}
str_extract(ica$event_info, "Expected Attendance:[0-9]+")
```

```{r}
ica$expected_attendance <- str_extract(ica$event_info, "Expected Attendance:[0-9]+") %>% str_extract("[0-9]+") %>% as.numeric
```

```{r, results = "asis"}
ica %>% group_by(event_group) %>% summarise(mean_ea  = mean(expected_attendance, na.rm = TRUE), media_ea = median(expected_attendance, na.rm = TRUE)) %>% arrange(desc(mean_ea)) %>% knitr::kable()
```

```{r chairs}
ica %>% group_by(event_group) %>% summarise(mean_ea  = mean(expected_attendance, na.rm = TRUE), media_ea = median(expected_attendance, na.rm = TRUE)) %>% arrange(desc(mean_ea)) %>% ggplot(aes(x = fct_reorder(event_group, mean_ea), y = mean_ea)) + geom_bar(stat = 'identity') + coord_flip() + xlab("Group") + ylab("Mean expected attendance")
```

Is there a US-bias in expected attendance (after adjusted for group and type)?

```{r}
require(MASS)
ica$country <- predict(model) == "us"
glm.nb(expected_attendance~event_group+event_type+country, data = ica) %>% summary
```

Not significant.

# Why some presentations are early in the morning?

```{r, results = "asis"}
ica %>% count(start_time) %>% knitr::kable()
```

Starting time by group.

```{r starting}

time_conv <- function(date) {
    res <- str_split(date, "[: ]")[[1]]
    s <- 0
    if (res[[3]] == "PM" & res[[1]] != "12") {
        s <- s + (12 * 60)
    }
    s <- s + (as.numeric(res[[1]]) * 60) + as.numeric(res[[2]])
    return(s)
}

ica %>% mutate(start_time2 = map_dbl(start_time, time_conv)) -> ica

ica %>% group_by(event_group) %>% summarise(mean_start_time = mean(start_time2)) %>% arrange(mean_start_time) %>% ggplot(aes(x = fct_reorder(event_group, mean_start_time), y = mean_start_time)) + geom_bar(stat = 'identity') + coord_flip() + xlab("Group") + ylab("Mean starting time")
```

Is there a US-bias in starting time (after adjusted for group and type)?


```{r}
require(MASS)
ica$country <- predict(model) == "us"
glm.nb(start_time2~event_group+event_type+country, data = ica) %>% summary
```

Not significant.

Owner

  • Login: chainsawriot
  • Kind: user
  • Location: Germany
  • Company: @gesistsa

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: over 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels