LSX

Semi-supervised algorithm for document scaling

https://github.com/koheiw/lsx

Science Score: 59.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 8 DOI reference(s) in README
  • Academic publication links
    Links to: scholar.google, wiley.com
  • Committers with academic emails
    1 of 3 committers (33.3%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.1%) to scientific vocabulary

Keywords

lsa quanteda sentiment-analysis text-analysis

Keywords from Contributors

corpus text-analytics
Last synced: 6 months ago · JSON representation

Repository

Semi-supervised algorithm for document scaling

Basic Info
Statistics
  • Stars: 56
  • Watchers: 5
  • Forks: 5
  • Open Issues: 8
  • Releases: 0
Topics
lsa quanteda sentiment-analysis text-analysis
Created almost 10 years ago · Last pushed 8 months ago
Metadata Files
Readme Changelog

README.Rmd

---
output: 
  rmarkdown::github_document
---

```{r, echo=FALSE}
knitr::opts_chunk$set(
  collapse = FALSE,
  comment = "##",
  fig.path = "images/",
  dpi = 150,
  fig.height = 5,
  fig.width = 10
)
```

# LSS: Semi-supervised algorithm for document scaling


[![CRAN
Version](https://www.r-pkg.org/badges/version/LSX)](https://CRAN.R-project.org/package=LSX)
[![Downloads](https://cranlogs.r-pkg.org/badges/LSX)](https://CRAN.R-project.org/package=LSX)
[![Total
Downloads](https://cranlogs.r-pkg.org/badges/grand-total/LSX?color=orange)](https://CRAN.R-project.org/package=LSX)
[![R build
status](https://github.com/koheiw/LSX/workflows/R-CMD-check/badge.svg)](https://github.com/koheiw/LSX/actions)
[![codecov](https://codecov.io/gh/koheiw/LSX/branch/master/graph/badge.svg)](https://app.codecov.io/gh/koheiw/LSX)


In quantitative text analysis, the cost of training supervised machine learning models tend to be very high when the corpus is large. Latent Semantic Scaling (LSS) is a semi-supervised document scaling technique that I developed to perform large scale analysis at low cost. Taking user-provided *seed words* as weak supervision, it estimates polarity of words in the corpus by latent semantic analysis and locates documents on a unidimensional scale (e.g. sentiment). 

## Installation

From CRAN:

```{r, eval=FALSE}
install.packages("LSX")
```

From Github:

```{r, eval=FALSE}
devtools::install_github("koheiw/LSX")
```

## Examples

Please visit the package website to understand the usage of the functions:

- [Introduction to LSX](https://koheiw.github.io/LSX/articles/pkgdown/basic.html)
- [Application in research](https://koheiw.github.io/LSX/articles/pkgdown/research.html)
- [Selection of seed words](https://koheiw.github.io/LSX/articles/pkgdown/seedwords.html)

Please read the following papers for the algorithm and methodology, and its application to non-English texts (Japanese and Hebrew): 

- Watanabe, Kohei. 2020. ["Latent Semantic Scaling: A Semisupervised Text Analysis Technique for New Domains and Languages"](https://www.tandfonline.com/doi/full/10.1080/19312458.2020.1832976), *Communication Methods and Measures*.
- Watanabe, Kohei, Segev, Elad, & Tago, Atsushi. (2022). ["Discursive diversion: Manipulation of nuclear threats by the conservative leaders in Japan and Israel"](https://journals.sagepub.com/doi/full/10.1177/17480485221097967), *International Communication Gazette*. 

## Other publications

LSS has been used for research in various fields of social science.

- Nakamura, Kentaro. 2022 [Balancing Opportunities and Incentives: How Rising China’s Mediated Public Diplomacy Changes Under Crisis](https://ijoc.org/index.php/ijoc/article/view/18676/3968), *International Journal of Communication*.
- Zollinger, Delia. 2022 [Cleavage Identities in Voters’ Own Words: Harnessing Open-Ended Survey Responses](https://onlinelibrary.wiley.com/doi/10.1111/ajps.12743), *American Journal of Political Science*.
- Brändle, Verena K., and Olga Eisele. 2022. ["A Thin Line: Governmental Border Communication in Times of European Crises"](https://onlinelibrary.wiley.com/doi/full/10.1111/jcms.13398) *Journal of Common Market Studies*.
- Umansky, Natalia. 2022. ["Who gets a say in this? Speaking security on social media"](https://journals.sagepub.com/doi/10.1177/14614448221111009). *New Media & Society*.
- Rauh, Christian, 2022. ["Supranational emergency politics? What executives’ public crisis communication may tell us"](https://www.tandfonline.com/doi/full/10.1080/13501763.2021.1916058), *Journal of European Public Policy*.
- Trubowitz, Peter and Watanabe, Kohei. 2021. ["The Geopolitical Threat Index: A Text-Based Computational Approach to Identifying Foreign Threats"](https://academic.oup.com/isq/advance-article/doi/10.1093/isq/sqab029/6278490), *International Studies Quarterly*.
- Vydra, Simon and Kantorowicz, Jaroslaw. 2020. ["Tracing Policy-relevant Information in Social Media: The Case of Twitter before and during the COVID-19 Crisis"](https://www.degruyter.com/document/doi/10.1515/spp-2020-0013/html). *Statistics, Politics and Policy*.
- Watanabe, Kohei. 2017. ["Measuring News Bias: Russia's Official News Agency ITAR-TASS’s Coverage of the Ukraine Crisis"](http://journals.sagepub.com/eprint/TBc9miIc89njZvY3gyAt/full), *European Journal Communication*.

More publications are available on [Google Scholar](https://scholar.google.com/scholar?oi=bibs&hl=en&cites=5312969973901591795).

Owner

  • Name: Kohei Watanabe
  • Login: koheiw
  • Kind: user
  • Location: Japan

Data analyst specializes in political and financial texts

GitHub Events

Total
  • Issues event: 1
  • Watch event: 1
  • Delete event: 4
  • Issue comment event: 9
  • Push event: 43
  • Pull request event: 23
  • Create event: 15
Last Year
  • Issues event: 1
  • Watch event: 1
  • Delete event: 4
  • Issue comment event: 9
  • Push event: 43
  • Pull request event: 23
  • Create event: 15

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 650
  • Total Committers: 3
  • Avg Commits per committer: 216.667
  • Development Distribution Score (DDS): 0.014
Past Year
  • Commits: 112
  • Committers: 2
  • Avg Commits per committer: 56.0
  • Development Distribution Score (DDS): 0.009
Top Committers
Name Email Commits
Kohei Watanabe w****i@g****m 641
kbenoit k****t@l****k 8
Kenneth Benoit k****t@K****l 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 8 months ago

All Time
  • Total issues: 16
  • Total pull requests: 103
  • Average time to close issues: 3 months
  • Average time to close pull requests: 7 days
  • Total issue authors: 4
  • Total pull request authors: 3
  • Average comments per issue: 0.94
  • Average comments per pull request: 0.66
  • Merged pull requests: 96
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 2
  • Pull requests: 17
  • Average time to close issues: 3 days
  • Average time to close pull requests: 12 days
  • Issue authors: 2
  • Pull request authors: 1
  • Average comments per issue: 1.5
  • Average comments per pull request: 0.82
  • Merged pull requests: 14
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • koheiw (13)
  • teunbrand (1)
  • mrwunderbar666 (1)
  • jwijffels (1)
Pull Request Authors
  • koheiw (119)
  • kbenoit (4)
  • olivroy (2)
Top Labels
Issue Labels
bug (1) enhancement (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 521 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 24
  • Total maintainers: 1
cran.r-project.org: LSX

Semi-Supervised Algorithm for Document Scaling

  • Versions: 24
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 521 Last month
Rankings
Stargazers count: 6.8%
Forks count: 11.3%
Downloads: 19.7%
Average: 20.6%
Dependent packages count: 29.8%
Dependent repos count: 35.5%
Maintainers (1)
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • R >= 3.5.0 depends
  • methods * depends
  • Matrix * imports
  • RSpectra * imports
  • digest * imports
  • ggplot2 * imports
  • ggrepel * imports
  • irlba * imports
  • locfit * imports
  • proxyC * imports
  • quanteda >= 2.0 imports
  • quanteda.textstats * imports
  • reshape2 * imports
  • rsparse * imports
  • rsvd * imports
  • stats * imports
  • stringi * imports
  • testthat * suggests
.github/workflows/check-standard.yaml actions
  • actions/checkout v3 composite
  • r-lib/actions/check-r-package v2 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/test-coverage.yaml actions
  • actions/checkout v3 composite
  • actions/upload-artifact v3 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite