cwbtools

Tools to create and manage CWB-indexed corpora

https://github.com/polmine/cwbtools

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
    1 of 3 committers (33.3%) from academic institutions
  • Institutional organization owner
    Organization polmine has institutional domain (polmine.sowi.uni-due.de)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.8%) to scientific vocabulary
Last synced: 7 months ago · JSON representation

Repository

Tools to create and manage CWB-indexed corpora

Basic Info
  • Host: GitHub
  • Owner: PolMine
  • Language: R
  • Default Branch: master
  • Size: 1.28 MB
Statistics
  • Stars: 4
  • Watchers: 2
  • Forks: 2
  • Open Issues: 6
  • Releases: 10
Created almost 8 years ago · Last pushed almost 2 years ago
Metadata Files
Readme

README.md

License: GPL v3 R build status CRAN_Status_Badge codecov DOI <!-- badges: end -->

Tools to Create, Modify and Manage Corpora for the Corpus Workbench (CWB)

The Corpus Workbench (CWB) is a classic indexing and query engine to efficiently work with large, linguistically annotated corpora. The cwbtools package offers a set of tools to conveniently create, modify and manage CWB indexed corpora from within R. It complements R packages that use the CWB as a backend for text mining with R, namely the RcppCWB package for low-level access to CWB indexed corpora, and polmineR as a toolset to implement common text mining workflows.

Installation

The package is available via CRAN and can be installed as follows on Windows, macOS and Linux.

{r} install.packages("cwbtools")

To install the development version of the package, use the installation mechanism offered by the remotes package. On Windows, an installation of Rtools may be necessary.

```{r}

Make sure the remotes package is present

if (!"remotes" %in% installed.packages()[,"Package"]) install.packages("remotes") Sys.setenv(RREMOTESSTANDALONE = "true") remotes::install_github("PolMine/cwbtools", ref = "dev", force = TRUE) ```

Explanatory note:

The default approach to install the development version cwbtools from GitHub would be devtools::install_github("PolMine/cwbtools", ref = "dev"). However, the concurrent dependency of devtools and of cwbtools on the curl package may cause nerve-wrecking problems if curl can be updated: If a newer version of curl is available, the user will be prompted whether this update is desired. Most users will agree. However, this update will fail because curl is loaded by devtools, and parts of the curl package cannot be deleted/updated (the dynamic library that is loaded).

To avoid having to perform manual updates in the correct order, using the original install_github() function of the remotes package is recommended. When setting the environment variable R_REMOTES_STANDALONE to true, the remotes package will rely on a minimal set of additional packages. The aforementioned situation that may make the installation of cwbtools difficult for most users is omitted.

Acknowledgements

The CWB is a classical indexing and query engine. Its character as an open source project is of great value for the community working with corpora. The enduring effort of the developers of the CWB is gratefully acknowledged!

Owner

  • Name: PolMine
  • Login: PolMine
  • Kind: organization
  • Location: Germany

GitHub Events

Total
  • Watch event: 1
  • Issue comment event: 2
Last Year
  • Watch event: 1
  • Issue comment event: 2

Committers

Last synced: about 3 years ago

All Time
  • Total Commits: 260
  • Total Committers: 3
  • Avg Commits per committer: 86.667
  • Development Distribution Score (DDS): 0.104
Top Committers
Name Email Commits
Andreas Blaette a****e@u****e 233
Andreas Blätte a****e@M****x 26
Andreas Blaette a****e@A****x 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: almost 2 years ago

All Time
  • Total issues: 70
  • Total pull requests: 2
  • Average time to close issues: about 1 year
  • Average time to close pull requests: 4 months
  • Total issue authors: 7
  • Total pull request authors: 2
  • Average comments per issue: 1.01
  • Average comments per pull request: 0.5
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 22
  • Pull requests: 2
  • Average time to close issues: 9 days
  • Average time to close pull requests: 4 months
  • Issue authors: 3
  • Pull request authors: 2
  • Average comments per issue: 0.5
  • Average comments per pull request: 0.5
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • ablaette (49)
  • ChristophLeonhardt (15)
  • PolMine (5)
  • bandoose (1)
  • thedamsch (1)
  • Thore91 (1)
  • svjack (1)
  • dschuele (1)
Pull Request Authors
  • eblondel (2)
  • olivroy (1)
Top Labels
Issue Labels
enhancement (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 626 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 2
  • Total versions: 16
  • Total maintainers: 1
cran.r-project.org: cwbtools

Tools to Create, Modify and Manage 'CWB' Corpora

  • Versions: 16
  • Dependent Packages: 0
  • Dependent Repositories: 2
  • Downloads: 626 Last month
Rankings
Forks count: 17.0%
Dependent repos count: 19.2%
Stargazers count: 25.5%
Dependent packages count: 28.7%
Average: 35.1%
Downloads: 85.2%
Maintainers (1)
Last synced: about 1 year ago

Dependencies

DESCRIPTION cran
  • R6 * imports
  • RcppCWB >= 0.5.2 imports
  • cli * imports
  • curl * imports
  • data.table * imports
  • fs * imports
  • httr * imports
  • jsonlite * imports
  • lifecycle * imports
  • methods * imports
  • pbapply * imports
  • rstudioapi * imports
  • stringi * imports
  • tools * imports
  • xml2 * imports
  • zen4R * imports
  • NLP * suggests
  • SnowballC * suggests
  • aws.s3 * suggests
  • janeaustenr * suggests
  • knitr * suggests
  • markdown * suggests
  • openNLP * suggests
  • rmarkdown * suggests
  • testthat * suggests
  • tidytext * suggests
  • tm >= 0.7.3 suggests
  • tokenizers >= 0.2.1 suggests
.github/workflows/R-CMD-check.yaml actions
  • actions/cache v2 composite
  • actions/checkout v2 composite
  • actions/upload-artifact v2 composite
  • actions/upload-artifact main composite
  • r-lib/actions/setup-pandoc v1 composite
  • r-lib/actions/setup-r v1 composite
.github/workflows/pkgdown.yaml actions
  • actions/cache v2 composite
  • actions/checkout v2 composite
  • r-lib/actions/setup-pandoc v1 composite
  • r-lib/actions/setup-r v1 composite