Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.0%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: koheiw
  • License: other
  • Language: R
  • Default Branch: master
  • Size: 3.27 MB
Statistics
  • Stars: 4
  • Watchers: 1
  • Forks: 1
  • Open Issues: 2
  • Releases: 0
Created about 2 years ago · Last pushed 12 months ago
Metadata Files
Readme Changelog License

README.Rmd

---
output: github_document
editor_options: 
  chunk_output_type: console
---

```{r, echo=FALSE, message=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "##",
  fig.path = "man/images/"
)
```

# Wordmap: Semi-supervised Multinomial Document Classifier

**wordmap** is a semi-supervised algorithm for multinomial document classification originally created for [newsmap](https://github.com/koheiw/newsmap). **wordmap** is separated from **newsmap** to expand the score of its application beyond geographical classification of news. 

The algorithm is also useful in extracting features associated with document meta-data (industry group, patent class etc.) from vary larger corpora. The list of features could be used to create a lexicon to perform dictionary analysis.

## How to install

**wordmap** is available on CRAN since the v0.8.0 You can install the package using the R command.

```{r, eval=FALSE}
install.packages("wordmap")
```

If you want to the latest version, please install by running this command in R. You need to have **devtools** installed beforehand.

```{r, eval=FALSE}
install.packages("devtools")
devtools::install_github("koheiw/wordmap")
```

## Example

In this example, we identify topics of sentences from using a seed topic dictionary adopted from [Watanabe & Zhou (2020)](https://journals.sagepub.com/doi/full/10.1177/0894439320907027).
`data_corpus_ungd2017` contains transcripts of speeches delivered at the United Nations General Assembly in 2017.

```{r}
require(quanteda)
require(wordmap)

dict <- data_dictionary_topic
print(dict)

corp <- data_corpus_ungd2017 %>% 
    corpus_reshape()

toks <- tokens(corp, remove_url = TRUE, remove_numbers = TRUE) %>% 
    tokens_remove(stopwords("en"), min_nchar = 2, padding = TRUE) #%>% 
    #tokens_remove("^[A-Z]", valuetype = "regex", case_insensitive = FALSE, padding = TRUE)
    
dfmt_feat <- dfm(toks, remove_padding = TRUE) %>% 
    dfm_trim(min_termfreq = 5)
dfmt_label <- tokens_lookup(toks, dict) %>% 
    dfm()

map <- textmodel_wordmap(dfmt_feat, dfmt_label)
coef(map)
```

### Predict topics of sentences 

```{r}
dat <- data.frame(text = corp, topic = predict(map))
```

```{r echo=FALSE}
knitr::kable(head(dat, 10))
```

### Create a topic dictionary

Create a **quanteda** dictionary object from the extracted features. The dictionary could be use to perform analysis of other corpora.

```{r}
as.dictionary(map, n = 100)
```

Owner

  • Name: Kohei Watanabe
  • Login: koheiw
  • Kind: user
  • Location: Japan

Data analyst specializes in political and financial texts

GitHub Events

Total
  • Issues event: 1
  • Watch event: 2
  • Delete event: 3
  • Issue comment event: 1
  • Push event: 21
  • Pull request event: 11
  • Fork event: 1
  • Create event: 5
Last Year
  • Issues event: 1
  • Watch event: 2
  • Delete event: 3
  • Issue comment event: 1
  • Push event: 21
  • Pull request event: 11
  • Fork event: 1
  • Create event: 5

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 1
  • Total pull requests: 12
  • Average time to close issues: N/A
  • Average time to close pull requests: about 20 hours
  • Total issue authors: 1
  • Total pull request authors: 2
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.17
  • Merged pull requests: 10
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 10
  • Average time to close issues: N/A
  • Average time to close pull requests: 1 day
  • Issue authors: 1
  • Pull request authors: 2
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.1
  • Merged pull requests: 8
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • koheiw (1)
Pull Request Authors
  • koheiw (14)
  • kbenoit (2)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 297 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 6
  • Total maintainers: 1
cran.r-project.org: wordmap

Feature Extraction and Document Classification with Noisy Labels

  • Versions: 6
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 297 Last month
Rankings
Dependent packages count: 28.7%
Dependent repos count: 35.4%
Average: 50.1%
Downloads: 86.2%
Maintainers (1)
Last synced: 10 months ago

Dependencies

.github/workflows/check-standard.yaml actions
  • actions/checkout v3 composite
  • r-lib/actions/check-r-package v2 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/test-coverage.yaml actions
  • actions/checkout v3 composite
  • actions/upload-artifact v3 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
DESCRIPTION cran
  • R >= 3.5 depends
  • methods * depends
  • Matrix * imports
  • quanteda >= 2.1 imports
  • quanteda.textstats * imports
  • stringi * imports
  • utils * imports
  • newsmap * suggests
  • testthat * suggests