moranajp

Tool of morphological analysis for Japanese

https://github.com/matutosi/moranajp

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    1 of 2 committers (50.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.2%) to scientific vocabulary

Keywords

r r-package
Last synced: 6 months ago · JSON representation

Repository

Tool of morphological analysis for Japanese

Basic Info
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
r r-package
Created over 4 years ago · Last pushed over 1 year ago
Metadata Files
Readme Changelog License

README.Rmd

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# moranajp はじめに

The goal of moranajp is a tool of morphological analysis for Japanese. 

moranajpは,日本語形態素解析をするためのものです.

## Installation インストール

You can install the released version of moranajp from [GitHub] ( https://github.com/matutosi/moranajp ). 
You need install MeCab ( https://taku910.github.io/mecab/ ). 

最新バージョンは,[GitHub] ( https://github.com/matutosi/moranajp ) でダウンロードできます.
MeCab ( https://taku910.github.io/mecab/ ) を別途インストールする必要があります.

``` r
  # CRAN
install.packages("moranajp")

  # development
  # install.packages("remotes")
remotes::install_github("matutosi/moranajp")
```

## Example 使用例

```{r}
library(moranajp)

data(neko)
neko <- 
  neko |>
  dplyr::mutate(text = stringi::stri_unescape_unicode(text)) |>
  tibble::rownames_to_column("cols")
neko  # First part of 'I Am a Cat' by Soseki Natsume

  # MeCab (Need install MeCab) 
  # MeCabをインストールする必要あり
bin_dir <- "d:/pf/mecab/bin" # set your environment MeCabをインストールしたフォルダを指定
  # bin_dir <- "/opt/local/mecab/bin/"  # Example for Mac or Linux
iconv <- "CP932_UTF-8"       # maybe need in Windows Windowsで必要な場合あり
  # 文字化けする場合は,引数 iconv を使ってください.
  # `iconv = "CP932_UTF-8"` or `iconv = "EUC_UTF-8"`
neko |>
  moranajp_all(text_col = "text", bin_dir = bin_dir, iconv = iconv) |>
  print(n=30)

  # chamame (Do not need install, but use web service)
  # 別途ツールのインストールのなし(Web茶まめを使用)
neko |>
  head(3) |>
  moranajp_all(method = "chamame", text_col = "text") |>
  print(n=30)
```

```{r, fig.width=7, fig.height=7}
library(moranajp)

data(synonym)
synonym <- unescape_utf(synonym)

data(neko_mecab)
neko_mecab <- 
  neko_mecab  |>
  unescape_utf() |>
  add_sentence_no() |>
  clean_up(use_common_data = TRUE, synonym_df = synonym)

bigram_neko <- 
  neko_mecab |>
  draw_bigram_network()

add_stop_words <- 
  c("\\u3042\\u308b", "\\u3059\\u308b", "\\u3066\\u308b", 
    "\\u3044\\u308b","\\u306e", "\\u306a\\u308b", "\\u304a\\u308b", 
    "\\u3093", "\\u308c\\u308b", "*") |> 
   unescape_utf()

data(review_chamame)
bigram_review <- 
  review_chamame |>
  dplyr::slice(1:2000) |>
  unescape_utf() |>
  add_sentence_no() |>
  clean_up(add_stop_words = add_stop_words) |>
  draw_bigram_network()

data(review_ginza)
bigram_review_ginza <- 
  review_ginza |>
  unescape_utf() |>
  add_sentence_no() |>
  clean_up(add_depend = TRUE) |>
  draw_bigram_network(depend = TRUE)
```


## Note 注意点

Line breaks in the text will be removed to avoid lag text id. 
If you want to remain line breaks, please change them into other character. 

文字列内の改行コード(\r\n, \n)は,削除されます(改行コードでずれるのを防ぐため).
改行コードに意味がある場合は,事前に改行コードを別の文字列に変更するなどの対応をしてください.

## Citation 引用

Toshikazu Matsumura (2021) Morphological analysis for Japanese with R. https://github.com/matutosi/moranajp/.

松村 俊和 (2021) Rによる日本語形態素解析. https://github.com/matutosi/moranajp/.

## Installation of MeCab for (Linux / Mac) MeCabのインストール(Linux / Mac)

download file (mecab-0.996.tar.gz, mecab-ipadic-2.7.0-20070801.tar.gz)

ファイルのダウンロード(mecab-0.996.tar.gz, mecab-ipadic-2.7.0-20070801.tar.gz)

http://taku910.github.io/mecab/#download


```
  tar xvf mecab-0.996.tar.gz
  cd mecab-0.996
  ./configure --enable-utf8-only --prefix=/opt/local/mecab
  make
  sudo make install
  # install directory
  # 辞書のインストール
  tar xvf mecab-ipadic-2.7.0-20070801.tar.gz
  cd mecab-ipadic-2.7.0-20070801
  ./configure  --with-mecab-config=/opt/local/mecab/bin/mecab-config --with-charset=utf8 --prefix=/opt/local/mecab
  make
  sudo make install
  # add path
  # パスの追加
  echo 'export PATH=/opt/local/mecab/bin:$PATH' >> ~/.bash_profile
  source ~/.bash_profile
  # run mecab
  # mecabの実行
  mecab
```

ref (in Japanese) 参考
https://qiita.com/nkjm/items/913584c00af199794257

Owner

  • Name: Toshikazu Matsumura
  • Login: matutosi
  • Kind: user

GitHub Events

Total
Last Year

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 288
  • Total Committers: 2
  • Avg Commits per committer: 144.0
  • Development Distribution Score (DDS): 0.003
Past Year
  • Commits: 109
  • Committers: 2
  • Avg Commits per committer: 54.5
  • Development Distribution Score (DDS): 0.009
Top Committers
Name Email Commits
matutosi m****i@g****m 287
松村 俊和 m****i@k****p 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 347 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 5
  • Total maintainers: 1
cran.r-project.org: moranajp

Morphological Analysis for Japanese

  • Versions: 5
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 347 Last month
Rankings
Downloads: 28.7%
Forks count: 28.8%
Dependent packages count: 29.8%
Average: 31.6%
Stargazers count: 35.2%
Dependent repos count: 35.5%
Maintainers (1)
Last synced: 7 months ago

Dependencies

.github/workflows/R-CMD-check.yaml actions
  • actions/checkout v2 composite
  • r-lib/actions/check-r-package v1 composite
  • r-lib/actions/setup-r v1 composite
  • r-lib/actions/setup-r-dependencies v1 composite
.github/workflows/r.yml actions
  • actions/checkout v2 composite
  • r-lib/actions/setup-r f57f1301a053485946083d7a45022b278929a78a composite
DESCRIPTION cran
  • R >= 3.5.0 depends
  • dplyr * imports
  • ggplot2 * imports
  • ggraph * imports
  • igraph * imports
  • magrittr * imports
  • purrr * imports
  • rlang * imports
  • stats * imports
  • stringr * imports
  • tibble * imports
  • tidyr * imports
  • knitr * suggests
  • rmarkdown * suggests
  • stringi * suggests
  • testthat >= 3.0.0 suggests
  • tidyverse * suggests