stringi

Fast and portable character string processing in R (with the Unicode ICU)

https://github.com/gagolews/stringi

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
    1 of 16 committers (6.3%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.9%) to scientific vocabulary

Keywords

icu icu4c natural-language-processing nlp r regex regexp string-manipulation stringi stringr text text-processing tidy-data unicode

Keywords from Contributors

tidyverse
Last synced: 6 months ago · JSON representation

Repository

Fast and portable character string processing in R (with the Unicode ICU)

Basic Info
Statistics
  • Stars: 310
  • Watchers: 20
  • Forks: 49
  • Open Issues: 46
  • Releases: 39
Topics
icu icu4c natural-language-processing nlp r regex regexp string-manipulation stringi stringr text text-processing tidy-data unicode
Created about 13 years ago · Last pushed 11 months ago
Metadata Files
Readme Changelog License Code of conduct Citation

README.md

stringi

Fast and Portable Character String Processing in R (with the Unicode ICU)

Build Status RStudio CRAN mirror downloads RStudio CRAN mirror downloads RStudio CRAN mirror downloads

A comprehensive tutorial and reference manual is available at https://stringi.gagolewski.com/.

Check out stringx for a set of wrappers around stringi with a base R-compatible API.

To learn more about R, check out Marek's open-access (free!) textbook Deep R Programming.

stringi (pronounced stringy, IPA [strini]) is THE R package for string/text/natural language processing. It is very fast, consistent, convenient, and thanks to the ICU International Components for Unicode library portable across all locales and platforms.

Available features include:

  • string concatenation, padding, wrapping,
  • substring extraction,
  • pattern searching (e.g., with Java-like regular expressions),
  • collation and sorting,
  • random string generation,
  • case mapping and folding,
  • string transliteration,
  • Unicode normalisation,
  • date-time formatting and parsing,

and many more.

Package Maintainer: Marek Gagolewski

Authors and Contributors: Marek Gagolewski, with contributions from Bartomiej Tartanus and many others.

The package's API was inspired by that of the early (pre-tidyverse; v0.6.2) version of Hadley Wickham's stringr package (and since the 2015 v1.0.0 stringr is powered by stringi).

Homepage: https://stringi.gagolewski.com/

Citation: Gagolewski M., stringi: Fast and portable character string processing in R, Journal of Statistical Software 103(2), 2022, 159, https://dx.doi.org/10.18637/jss.v103.i02.

CRAN Entry: https://CRAN.R-project.org/package=stringi

System Requirements: R >= 3.4, ICU4C >= 61 (refer to the INSTALL file for more details)

License: stringi's source code is distributed under the open source BSD-3-clause license. For more details, see LICENSE.

This git repository also contains a custom subset of ICU4C source code which is copyrighted by Unicode, Inc. and others. A binary version of the Unicode Character Database is included. For more details on copyright holders, see LICENSE. The ICU project is covered by the Unicode license a simple, permissive non-copyleft free software license, compatible with the GNU GPL. The ICU license is intended to allow ICU to be included in free software projects as well as in proprietary or commercial products.

Changes: see the NEWS file.

How to access the stringi C++ API from within an Rcpp-based R package

Owner

  • Name: Marek Gagolewski
  • Login: gagolews
  • Kind: user
  • Location: Melbourne, VIC, Australia
  • Company: Deakin University

Free universities!

GitHub Events

Total
  • Create event: 1
  • Release event: 1
  • Issues event: 7
  • Watch event: 11
  • Issue comment event: 21
  • Push event: 8
  • Pull request event: 5
  • Fork event: 5
Last Year
  • Create event: 1
  • Release event: 1
  • Issues event: 7
  • Watch event: 11
  • Issue comment event: 21
  • Push event: 8
  • Pull request event: 5
  • Fork event: 5

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 1,637
  • Total Committers: 16
  • Avg Commits per committer: 102.313
  • Development Distribution Score (DDS): 0.151
Past Year
  • Commits: 9
  • Committers: 1
  • Avg Commits per committer: 9.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
gagolews e****y@g****m 1,390
bartektartanus b****s@g****m 197
DavisVaughan d****s@r****m 11
ironholds i****s@g****m 7
Katrin Leinweber k****r@u****e 6
Marcin Bujarski m****n@J****) 6
Marcin Bujarski m****i@g****m 5
Jeroen Ooms j****s@g****m 3
Hiroaki Yutani y****i@g****m 2
Mikko Korpela m****l@i****i 2
liuxiang88 l****g@l****n 2
Bartolini b****i@b****) 2
Avraham Adler a****r 1
Brett Bialer b****t@b****o 1
Lukasz Daniel l****l@g****m 1
Salim B s****m@p****e 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 114
  • Total pull requests: 11
  • Average time to close issues: 8 months
  • Average time to close pull requests: about 2 months
  • Total issue authors: 70
  • Total pull request authors: 6
  • Average comments per issue: 3.68
  • Average comments per pull request: 1.27
  • Merged pull requests: 5
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 5
  • Pull requests: 4
  • Average time to close issues: 3 days
  • Average time to close pull requests: 2 days
  • Issue authors: 4
  • Pull request authors: 2
  • Average comments per issue: 0.4
  • Average comments per pull request: 1.5
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • gagolews (34)
  • hadley (5)
  • kwhkim (4)
  • discoleo (2)
  • asmlgkj (2)
  • thomasp85 (2)
  • DavisVaughan (1)
  • calpan (1)
  • fikovnik (1)
  • salim-b (1)
  • KyleHaynes (1)
  • sunnycxh0 (1)
  • kbenoit (1)
  • bhagwataditya (1)
  • dm8000 (1)
Pull Request Authors
  • jeroen (3)
  • andrjohns (2)
  • gagolews (2)
  • CGMossa (2)
  • MichaelChirico (2)
  • liuxiang88 (1)
Top Labels
Issue Labels
low priority (4) new feature (3) ICU-related (3) question (3) bug (3) documentation (2) wontfix (2) need more info (1) good first issue (1) enhancement (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads: unknown
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 36
proxy.golang.org: github.com/gagolews/stringi
  • Versions: 36
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 9.0%
Average: 9.6%
Dependent repos count: 10.2%
Last synced: 6 months ago

Dependencies

.github/workflows/r-check-other.yml actions
  • actions/cache v2 composite
  • actions/checkout v2 composite
  • r-lib/actions/setup-pandoc v1 composite
  • r-lib/actions/setup-r v1 composite
.github/workflows/r-icu-bundle.yml actions
  • actions/checkout v2 composite
.github/workflows/r-icu-bundle55.yml actions
  • actions/checkout v2 composite
.github/workflows/r-icu-system.yml actions
  • actions/checkout v2 composite
DESCRIPTION cran
  • R >= 3.1 depends
  • stats * imports
  • tools * imports
  • utils * imports
devel/.old/donttouch_cran/DESCRIPTION cran
  • R >= 2.15.0 depends