Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
✓Committers with academic emails
1 of 3 committers (33.3%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.8%) to scientific vocabulary
Keywords
japanese-language
nlp
rpackage
Last synced: 6 months ago
·
JSON representation
Repository
R Interface to 'Sudachi'
Basic Info
- Host: GitHub
- Owner: uribo
- License: other
- Language: R
- Default Branch: main
- Homepage: https://uribo.github.io/sudachir/
- Size: 268 KB
Statistics
- Stars: 6
- Watchers: 2
- Forks: 1
- Open Issues: 2
- Releases: 0
Topics
japanese-language
nlp
rpackage
Created over 5 years ago
· Last pushed about 3 years ago
Metadata Files
Readme
Funding
License
README.Rmd
---
output: github_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# sudachir
SudachiR is an R version of [Sudachi](https://github.com/WorksApplications/sudachi.rs), a Japanese morphological analyzer.
[](https://CRAN.R-project.org/package=sudachir)
[](https://github.com/uribo/sudachir/actions)
[](https://www.tidyverse.org/lifecycle/#experimental)
## Installation
You can install the released version of `{sudachir}` from CRAN with:
``` r
install.packages("sudachir")
```
and also, the developmment version from GitHub.
``` r
if (!requireNamespace("remotes"))
install.packages("remotes")
remotes::install_github("uribo/sudachir")
```
## Usage
### Set up 'r-sudachipy' environment
`{sudachir}` works with [sudachipy](https://github.com/WorksApplications/sudachi.rs/tree/develop/python) (>= 0.6.\*) via the [reticulate](https://github.com/rstudio/reticulate/) package.
To get started, it requires a Python environment that has sudachipy and its dictionaries already installed and available.
This package provides a function `install_sudachipy` which helps users prepare a Python virtual environment. The desired modules (`sudachipy`, `sudachidict_core`, `pandas`) can be installed with this function, but can also be installed manually.
```{r}
library(reticulate)
library(sudachir)
if (!virtualenv_exists("r-sudachipy")) {
install_sudachipy()
}
use_virtualenv("r-sudachipy", required = TRUE)
```
### Tokenize sentences
Use `tokenize_to_df` for tokenization.
```{r}
txt <- c(
"国家公務員は鳴門海峡に行きたい",
"吾輩は猫である。\n名前はまだない。"
)
tokenize_to_df(data.frame(doc_id = c(1, 2), text = txt))
```
You can control which dictionary features are parsed using the `col_select` argument.
```{r}
tokenize_to_df(txt, col_select = 1:3) |>
dplyr::glimpse()
tokenize_to_df(
txt,
into = dict_features("en"),
col_select = c("pos1", "pos2")
) |>
dplyr::glimpse()
```
The `as_tokens` function can tidy up tokens and the first part-of-speech informations into a list of named tokens. Also, you can use the `form` function as a shorthand of `tokenize_to_df(txt) |> as_tokens()`.
```{r}
tokenize_to_df(txt) |> as_tokens(type = "surface")
form(txt, type = "surface")
form(txt, type = "normalized")
form(txt, type = "dictionary")
form(txt, type = "reading")
```
### Change split mode
```{r}
tokenize_to_df(txt, instance = rebuild_tokenizer("B")) |>
as_tokens("surface", pos = FALSE)
tokenize_to_df(txt, instance = rebuild_tokenizer("A")) |>
as_tokens("surface", pos = FALSE)
```
### Change dictionary edition
You can touch dictionary options using the `rebuild_tokenizer` function.
```{r}
if (py_module_available("sudachidict_full")) {
tokenizer_full <- rebuild_tokenizer(mode = "C", dict_type = "full")
tokenize_to_df(txt, instance = tokenizer_full) |>
as_tokens("surface", pos = FALSE)
}
```
Owner
- Name: Shinya Uryu
- Login: uribo
- Kind: user
- Location: Tokushima, Japan
- Company: Tokushima University (徳島大学)
- Website: https://uribo.hatenablog.com
- Twitter: u_ribo
- Repositories: 210
- Profile: https://github.com/uribo
R / Data Engineer / Geo / Ecology / Visualization / Tokushima University (徳島大学)
GitHub Events
Total
Last Year
Committers
Last synced: about 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| Shinya Uryu | s****7@g****m | 26 |
| paithiov909 | a****4@g****m | 25 |
| Shinya Uryu | u****a@t****p | 5 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 7 months ago
All Time
- Total issues: 4
- Total pull requests: 4
- Average time to close issues: 3 months
- Average time to close pull requests: 5 days
- Total issue authors: 2
- Total pull request authors: 2
- Average comments per issue: 0.0
- Average comments per pull request: 2.0
- Merged pull requests: 4
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- uribo (3)
- barracuda156 (1)
Pull Request Authors
- paithiov909 (3)
- uribo (1)
Top Labels
Issue Labels
release 🚀 (1)
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- cran 198 last-month
- Total dependent packages: 0
- Total dependent repositories: 1
- Total versions: 1
- Total maintainers: 1
cran.r-project.org: sudachir
R Interface to 'Sudachi'
- Homepage: https://github.com/uribo/sudachir
- Documentation: http://cran.r-project.org/web/packages/sudachir/sudachir.pdf
- License: Apache License (≥ 2.0)
-
Latest release: 0.1.0
published over 5 years ago
Rankings
Forks count: 17.0%
Stargazers count: 20.6%
Dependent repos count: 23.8%
Dependent packages count: 28.7%
Average: 34.3%
Downloads: 81.6%
Maintainers (1)
Last synced:
6 months ago
Dependencies
DESCRIPTION
cran
- cli >= 2.1.0 imports
- dplyr >= 1.0.2 imports
- glue >= 1.4.2 imports
- magrittr >= 1.5 imports
- purrr >= 0.3.4 imports
- reticulate >= 1.17 imports
- rlang >= 0.4.8 imports
- tibble >= 3.0.4 imports
- tidyselect >= 1.1.0 imports
- rstudioapi * suggests
- testthat * suggests
.github/workflows/R-CMD-check.yaml
actions
- actions/checkout v3 composite
- actions/setup-python v4 composite
- r-lib/actions/check-r-package v2 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/pkgdown.yaml
actions
- actions/cache v2 composite
- actions/checkout v2 composite
- r-lib/actions/setup-pandoc master composite
- r-lib/actions/setup-r master composite