corporaexplorer

corporaexplorer: An R package for dynamic exploration of text collections - Published in JOSS (2019)

https://github.com/kgjerde/corporaexplorer

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 8 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

corpora corpus r shiny text-analysis
Last synced: 6 months ago · JSON representation

Repository

An R package for dynamic exploration of text collections

Basic Info
Statistics
  • Stars: 65
  • Watchers: 7
  • Forks: 4
  • Open Issues: 0
  • Releases: 15
Topics
corpora corpus r shiny text-analysis
Created almost 7 years ago · Last pushed over 1 year ago
Metadata Files
Readme Changelog License

README.Rmd

---
output:
    github_document
---



```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```
# corporaexplorer: An R package for dynamic exploration of text collections


[![CRAN status](https://www.r-pkg.org/badges/version/corporaexplorer)](https://cran.r-project.org/package=corporaexplorer)
[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
[![R build
status](https://github.com/kgjerde/corporaexplorer/actions/workflows/check-standard.yaml/badge.svg)](https://github.com/kgjerde/corporaexplorer/actions)
[![DOI](http://joss.theoj.org/papers/10.21105/joss.01342/status.svg)](https://doi.org/10.21105/joss.01342)
[![Mentioned in Awesome R](https://awesome.re/mentioned-badge.svg)](https://github.com/qinwf/awesome-R)



> **"I really like the application and its simplicity. It looks great and is very functional. ... a nice addition to text analysis tools."**  
> *--[Kenneth Benoit](https://github.com/kbenoit), creator of [quanteda](https://quanteda.io/), professor of computational social science at [LSE](https://www.lse.ac.uk/Methodology/People/Academic-Staff/Kenneth-Benoit/Kenneth-Benoit)*

> **"I really enjoyed interacting with corporaexplorer.**
> **This is exciting work that opens up doors for non-technical users."**   
> *--[Tyler Rinker](https://github.com/trinker), creator of [sentimentr](https://github.com/trinker/sentimentr) and [qdap](https://github.com/trinker/qdap)*


– Featured in RStudio’s “R Views” blog’s "Top 40 New R Packages”

– Included in CRAN Task View: Natural Language Processing



```{r, out.width = "100%", echo = FALSE} knitr::include_graphics("https://github.com/kgjerde/corporaexplorer/raw/master/man/figures/readme_illustration.png") ``` *^Illustration^ ^screenshots^* ## What is corporaexplorer? **corporaexplorer** is an R package that uses the `Shiny` graphical user interface framework for dynamic exploration of text collections. **corporaexplorer** is designed for use with a wide range of text collections; one example could be a collection of tens of thousands of documents scraped from a governmental website; another example could be the collected works of a novelist; a third example could be the chapters of a single book. **corporaexplorer**'s intended primary audience are qualitatively oriented researchers who rely on close reading of textual documents as part of their academic activity, but the package should also be a useful supplement for those doing quantitative textual research and wishing to visit the texts under study. Finally, by offering a convenient way to explore any character vector, it can also be useful for a wide range of other R users. While collecting and preparing the text collections to be explored requires some familiarity with R programming, using the Shiny apps for exploring and extracting documents from the corpus should be fairly intuitive also for those with no programming knowledge, once the apps have been set up by a collaborator. Thus, the aim is for the package to be useful for anyone with a rudimentary knowledge of R -- or with collaborators who have such knowledge. ## Installation To install the released version from CRAN, simply run the following from an R console: ``` r install.packages("corporaexplorer") ``` Alternatively, to install the development version from GitHub, run the following from an R console: ``` r install.packages("devtools") devtools::install_github("kgjerde/corporaexplorer") ``` **corporaexplorer** works on Mac OS, Windows and Linux. (The Shiny apps look much clunkier on Windows than on the other platforms, but the apps are fully functional.) ## How to cite Please cite the following paper if you use **corporaexplorer** in your research. > Gjerde, Kristian Lundby. 2019. "corporaexplorer: An R package for dynamic exploration of text collections." _Journal of Open Source Software_ 4 (38): 1342. [https://doi.org/10.21105/joss.01342](https://doi.org/10.21105/joss.01342). For a BibTeX entry, use the output from `citation(package = "corporaexplorer")`. ## Usage For usage instructions and example corpora, see the [package web page](https://kgjerde.github.io/corporaexplorer/). ## Demo apps The package includes two demo apps. To explore Jane Austen's novels (data accessed through the [**janeaustenr**](https://github.com/juliasilge/janeaustenr) package): ``` r library(corporaexplorer) run_janeausten_app() ``` To explore the US presidents' State of the Union addresses (data accessed through the the [**sotu**](https://CRAN.R-project.org/package=sotu) package): ``` r library(corporaexplorer) run_sotu_app() ``` For more info, see https://kgjerde.github.io/corporaexplorer/articles/jane_austen.html and https://kgjerde.github.io/corporaexplorer/articles/sotu.html, and also the [function references](https://kgjerde.github.io/corporaexplorer/reference/index.html). ## A note on platforms and encoding **corporaexplorer** works on Mac OS, Windows and Linux, and there are some important differences in how R handles text on the different platforms. If you are working with plain English text, there will most likely be no issues with encoding on any platform. Unfortunately, working with non-[ASCII](https://en.wikipedia.org/wiki/ASCII) encoded text in R (e.g. non-English characters), *can* be complicated -- in particular on Windows. **On Mac OS or Linux**, problems with encoding will likely not arise at all. If problems do arise, they can typically be solved by making the R "locale" unicode-friendly (e.g. `Sys.setlocale("LC_ALL", "en_US.UTF-8")`). NB! This assumes that the text is UTF-8 encoded, so if changing the locale in this way does not help, make sure that the text is encoded as UTF-8 characters. Alternatively, if you can ascertain the character encoding, set the locale correspondingly. **On Windows**, things can be much more complicated. The most important thing is to check carefully that the texts appear as expected in `corporaexplorer`'s apps, and that the searches function as expected. If there are problems, a good place to start is a blog post with the telling title ["Escaping from character encoding hell in R on Windows"](https://www.r-bloggers.com/2016/06/escaping-from-character-encoding-hell-in-r-on-windows/). For (a lot) more information about encoding, see [this informative article](https://kunststube.net/encoding/) by David C. Zentgraf. ## Contributing Contributions in the form of feedback, bug reports and code are most welcome. Ways to contribute: * Contact [me](mailto:klg@nupi.no) by email. * Issues and bug reports: [File a GitHub issue](https://github.com/kgjerde/corporaexplorer/issues). * Fork the source code, modify, and issue a [pull request](https://docs.github.com/articles/creating-a-pull-request-from-a-fork/) through the [project GitHub page](https://github.com/kgjerde/corporaexplorer).

Owner

  • Name: Kristian Lundby Gjerde
  • Login: kgjerde
  • Kind: user
  • Company: Norwegian Institute of International Affairs (NUPI)

JOSS Publication

corporaexplorer: An R package for dynamic exploration of text collections
Published
June 13, 2019
Volume 4, Issue 38, Page 1342
Authors
Kristian Lundby Gjerde ORCID
Research Fellow, Norwegian Institute of International Affairs (NUPI)
Editor
Leonardo Uieda ORCID
Tags
qualitative research mixed methods text analysis corpora Shiny

GitHub Events

Total
  • Watch event: 5
Last Year
  • Watch event: 5

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 575
  • Total Committers: 1
  • Avg Commits per committer: 575.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 32
  • Committers: 1
  • Avg Commits per committer: 32.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
kgjerde k****e@g****m 575

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 29
  • Total pull requests: 6
  • Average time to close issues: 2 months
  • Average time to close pull requests: about 13 hours
  • Total issue authors: 9
  • Total pull request authors: 2
  • Average comments per issue: 1.41
  • Average comments per pull request: 0.17
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 3
  • Average time to close issues: 3 days
  • Average time to close pull requests: 1 day
  • Issue authors: 1
  • Pull request authors: 2
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.33
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • trinker (10)
  • kgjerde (7)
  • mvsell (4)
  • discoleo (2)
  • rickyking (2)
  • randomgambit (1)
  • wrathofquan (1)
  • Teolone88 (1)
  • MichaelChirico (1)
Pull Request Authors
  • kgjerde (7)
  • MichaelChirico (2)
Top Labels
Issue Labels
enhancement (4)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 360 last-month
  • Total docker downloads: 41,971
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 12
  • Total maintainers: 1
cran.r-project.org: corporaexplorer

A 'Shiny' App for Exploration of Text Collections

  • Versions: 12
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 360 Last month
  • Docker Downloads: 41,971
Rankings
Stargazers count: 5.9%
Forks count: 12.8%
Average: 21.9%
Downloads: 25.3%
Dependent packages count: 29.8%
Dependent repos count: 35.5%
Maintainers (1)
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • R >= 3.0.0 depends
  • RColorBrewer * imports
  • data.table * imports
  • dplyr * imports
  • ggplot2 * imports
  • lubridate * imports
  • magrittr * imports
  • padr * imports
  • plyr * imports
  • re2 * imports
  • rlang * imports
  • rmarkdown * imports
  • scales * imports
  • shiny * imports
  • shinyWidgets * imports
  • shinydashboard * imports
  • shinyjs * imports
  • stringi * imports
  • stringr * imports
  • tibble * imports
  • tidyr * imports
  • janeaustenr * suggests
  • shinytest * suggests
  • sotu * suggests
  • testthat * suggests
.github/workflows/check-standard.yaml actions
  • actions/checkout v3 composite
  • r-lib/actions/check-r-package v2 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
  • rstudio/shiny-workflows/setup-phantomjs v1 composite
.github/workflows/pkgdown.yaml actions
  • actions/cache v2 composite
  • actions/checkout v2 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite