Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 1 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.0%) to scientific vocabulary
Last synced: 10 months ago
·
JSON representation
Repository
Basic Info
- Host: GitHub
- Owner: koheiw
- License: other
- Language: R
- Default Branch: master
- Size: 3.27 MB
Statistics
- Stars: 4
- Watchers: 1
- Forks: 1
- Open Issues: 2
- Releases: 0
Created about 2 years ago
· Last pushed 12 months ago
Metadata Files
Readme
Changelog
License
README.Rmd
---
output: github_document
editor_options:
chunk_output_type: console
---
```{r, echo=FALSE, message=FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "##",
fig.path = "man/images/"
)
```
# Wordmap: Semi-supervised Multinomial Document Classifier
**wordmap** is a semi-supervised algorithm for multinomial document classification originally created for [newsmap](https://github.com/koheiw/newsmap). **wordmap** is separated from **newsmap** to expand the score of its application beyond geographical classification of news.
The algorithm is also useful in extracting features associated with document meta-data (industry group, patent class etc.) from vary larger corpora. The list of features could be used to create a lexicon to perform dictionary analysis.
## How to install
**wordmap** is available on CRAN since the v0.8.0 You can install the package using the R command.
```{r, eval=FALSE}
install.packages("wordmap")
```
If you want to the latest version, please install by running this command in R. You need to have **devtools** installed beforehand.
```{r, eval=FALSE}
install.packages("devtools")
devtools::install_github("koheiw/wordmap")
```
## Example
In this example, we identify topics of sentences from using a seed topic dictionary adopted from [Watanabe & Zhou (2020)](https://journals.sagepub.com/doi/full/10.1177/0894439320907027).
`data_corpus_ungd2017` contains transcripts of speeches delivered at the United Nations General Assembly in 2017.
```{r}
require(quanteda)
require(wordmap)
dict <- data_dictionary_topic
print(dict)
corp <- data_corpus_ungd2017 %>%
corpus_reshape()
toks <- tokens(corp, remove_url = TRUE, remove_numbers = TRUE) %>%
tokens_remove(stopwords("en"), min_nchar = 2, padding = TRUE) #%>%
#tokens_remove("^[A-Z]", valuetype = "regex", case_insensitive = FALSE, padding = TRUE)
dfmt_feat <- dfm(toks, remove_padding = TRUE) %>%
dfm_trim(min_termfreq = 5)
dfmt_label <- tokens_lookup(toks, dict) %>%
dfm()
map <- textmodel_wordmap(dfmt_feat, dfmt_label)
coef(map)
```
### Predict topics of sentences
```{r}
dat <- data.frame(text = corp, topic = predict(map))
```
```{r echo=FALSE}
knitr::kable(head(dat, 10))
```
### Create a topic dictionary
Create a **quanteda** dictionary object from the extracted features. The dictionary could be use to perform analysis of other corpora.
```{r}
as.dictionary(map, n = 100)
```
Owner
- Name: Kohei Watanabe
- Login: koheiw
- Kind: user
- Location: Japan
- Website: http://koheiw.net
- Twitter: koheiw7
- Repositories: 34
- Profile: https://github.com/koheiw
Data analyst specializes in political and financial texts
GitHub Events
Total
- Issues event: 1
- Watch event: 2
- Delete event: 3
- Issue comment event: 1
- Push event: 21
- Pull request event: 11
- Fork event: 1
- Create event: 5
Last Year
- Issues event: 1
- Watch event: 2
- Delete event: 3
- Issue comment event: 1
- Push event: 21
- Pull request event: 11
- Fork event: 1
- Create event: 5
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 1
- Total pull requests: 12
- Average time to close issues: N/A
- Average time to close pull requests: about 20 hours
- Total issue authors: 1
- Total pull request authors: 2
- Average comments per issue: 0.0
- Average comments per pull request: 0.17
- Merged pull requests: 10
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 10
- Average time to close issues: N/A
- Average time to close pull requests: 1 day
- Issue authors: 1
- Pull request authors: 2
- Average comments per issue: 0.0
- Average comments per pull request: 0.1
- Merged pull requests: 8
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- koheiw (1)
Pull Request Authors
- koheiw (14)
- kbenoit (2)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- cran 297 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 6
- Total maintainers: 1
cran.r-project.org: wordmap
Feature Extraction and Document Classification with Noisy Labels
- Homepage: https://github.com/koheiw/wordmap
- Documentation: http://cran.r-project.org/web/packages/wordmap/wordmap.pdf
- License: MIT + file LICENSE
-
Latest release: 0.9.5
published 12 months ago
Rankings
Dependent packages count: 28.7%
Dependent repos count: 35.4%
Average: 50.1%
Downloads: 86.2%
Maintainers (1)
Last synced:
10 months ago
Dependencies
.github/workflows/check-standard.yaml
actions
- actions/checkout v3 composite
- r-lib/actions/check-r-package v2 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/test-coverage.yaml
actions
- actions/checkout v3 composite
- actions/upload-artifact v3 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
DESCRIPTION
cran
- R >= 3.5 depends
- methods * depends
- Matrix * imports
- quanteda >= 2.1 imports
- quanteda.textstats * imports
- stringi * imports
- utils * imports
- newsmap * suggests
- testthat * suggests