ginza
A Japanese NLP Library using spaCy as framework based on Universal Dependencies
Science Score: 41.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
✓Committers with academic emails
1 of 15 committers (6.7%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.0%) to scientific vocabulary
Keywords from Contributors
Repository
A Japanese NLP Library using spaCy as framework based on Universal Dependencies
Basic Info
- Host: GitHub
- Owner: megagonlabs
- License: mit
- Language: Python
- Default Branch: develop
- Size: 1.02 MB
Statistics
- Stars: 806
- Watchers: 30
- Forks: 58
- Open Issues: 12
- Releases: 27
Metadata Files
README.md

GiNZA NLP Library
An Open Source Japanese NLP Library, based on Universal Dependencies
Please read the Important changes before you upgrade GiNZA.
License
GiNZA NLP Library and GiNZA Japanese Universal Dependencies Models are distributed under the MIT License. You must agree and follow the MIT License to use GiNZA NLP Library and GiNZA Japanese Universal Dependencies Models.
Explosion / spaCy
spaCy is the key framework of GiNZA.
Works Applications Enterprise / Sudachi/SudachiPy - SudachiDict - chiVe
SudachiPy provides high accuracies for tokenization and pos tagging.
Sudachi LICENSE PAGE, SudachiPy LICENSE PAGE, SudachiDict LEGAL PAGE, chiVe LICENSE PAGE
Hugging Face / transformers
The GiNZA v5 Transformers model (jaginzaelectra) is trained by using Hugging Face Transformers as a framework for pretrained models.
Training Datasets
UD Japanese BCCWJ r2.8
The parsing model of GiNZA v5 is trained on a part of UD Japanese BCCWJ r2.8 (Omura and Asahara:2018). This model is developed by National Institute for Japanese Language and Linguistics, and Megagon Labs.
GSK2014-A (2019) BCCWJ edition
The named entity recognition model of GiNZA v5 is trained on a part of GSK2014-A (2019) BCCWJ edition (Hashimoto, Inui, and Murakami:2008). We use two of the named entity label systems, both Sekine's Extended Named Entity Hierarchy and extended OntoNotes5. This model is developed by National Institute for Japanese Language and Linguistics, and Megagon Labs.
mC4
The GiNZA v5 Transformers model (jaginzaelectra) is trained by using transformers-ud-japanese-electra-base-discriminator which is pretrained on more than 200 million Japanese sentences extracted from mC4.
Contains information from mC4 which is made available under the ODC Attribution License.
@article{2019t5,
author = {Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu},
title = {Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer},
journal = {arXiv e-prints},
year = {2019},
archivePrefix = {arXiv},
eprint = {1910.10683},
}
Runtime Environment
This project is developed with Python>=3.8 and pip for it. We do not recommend to use Anaconda environment because the pip install step may not work properly.
Please also see the Development Environment section below.
Runtime set up
1. Install GiNZA NLP Library with Transformer-based Model
Uninstall previous version of ginza and jaginzaelectra packages:
console
$ pip uninstall ginza ja_ginza_electra
Then, install the latest version of ginza and ja_ginza_electra:
console
$ pip install -U ginza ja_ginza_electra
The package of ja_ginza_electra does not include pytorch_model.bin due to PyPI's archive size restrictions.
This large model file will be automatically downloaded at the first run time, and the locally cached file will be used for subsequent runs.
If you need to install ja_ginza_electra along with pytorch_model.bin at the install time, you can specify direct link for GitHub release archive as follows:
console
$ pip install -U ginza https://github.com/megagonlabs/ginza/releases/download/latest/ja_ginza_electra-latest-with-model.tar.gz
If you hope to accelarate the transformers-based models by using GPUs with CUDA support, you can install spacy by specifying the CUDA version as follows:
console
pip install -U "spacy[cuda117]"
And you need to install a version of pytorch that is consistent with the CUDA version.
2. Install GiNZA NLP Library with Standard Model
Uninstall previous version:
console
$ pip uninstall ginza ja_ginza
Then, install the latest version of ginza and ja_ginza:
console
$ pip install -U ginza ja_ginza
When using Apple Silicon such as M1 or M2, you can accelerate the analysis process by installing thinc-apple-ops:
console
$ pip install torch thinc-apple-ops
Execute ginza command
Run ginza command from the console, then input some Japanese text.
After pressing enter key, you will get the parsed results with CoNLL-U Syntactic Annotation format.
```console
$ ginza
銀座でランチをご一緒しましょう。
text = 銀座でランチをご一緒しましょう。
1 銀座 銀座 PROPN 名詞-固有名詞-地名-一般 _ 6 nmod _ SpaceAfter=No|BunsetuBILabel=B|BunsetuPositionType=SEMHEAD|NPB|Reading=ギンザ|NE=B-GPE|ENE=B-City|ClauseHead=6 2 で で ADP 助詞-格助詞 _ 1 case _ SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=SYNHEAD|Reading=デ|ClauseHead=6 3 ランチ ランチ NOUN 名詞-普通名詞-一般 _ 6 obj _ SpaceAfter=No|BunsetuBILabel=B|BunsetuPositionType=SEMHEAD|NPB|Reading=ランチ|ClauseHead=6 4 を を ADP 助詞-格助詞 _ 3 case _ SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=SYNHEAD|Reading=ヲ|ClauseHead=6 5 ご ご NOUN 接頭辞 _ 6 compound _ SpaceAfter=No|BunsetuBILabel=B|BunsetuPositionType=CONT|NPB|Reading=ゴ|ClauseHead=6 6 一緒 一緒 NOUN 名詞-普通名詞-サ変可能 _ 0 root _ SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=ROOT|NPI|Reading=イッショ|ClauseHead=6 7 し する AUX 動詞-非自立可能 _ 6 aux _ SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=SYNHEAD|Inf=サ行変格,連用形-一般|Reading=シ|ClauseHead=6 8 ましょう ます AUX 助動詞 _ 6 aux _ SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=SYNHEAD|Inf=助動詞-マス,意志推量形|Reading=マショウ|ClauseHead=6 9 。 。 PUNCT 補助記号-句点 _ 6 punct _ SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=CONT|Reading=。|ClauseHead=6
`ginzame` command provides tokenization function like [MeCab](https://taku910.github.io/mecab/).
The output format of `ginzame` is almost same as `mecab`, but the last `pronunciation` field is always '*'.
console
$ ginzame
銀座でランチをご一緒しましょう。
銀座 名詞,固有名詞,地名,一般,,,銀座,ギンザ,*
で 助詞,格助詞,,,,,で,デ,*
ランチ 名詞,普通名詞,一般,,,,ランチ,ランチ,
を 助詞,格助詞,,,,,を,ヲ,*
ご 接頭辞,,,,,,御,ゴ,
一緒 名詞,普通名詞,サ変可能,,,,一緒,イッショ,
し 動詞,非自立可能,,,サ行変格,連用形-一般,為る,シ,*
ましょう 助動詞,,,,助動詞-マス,意志推量形,ます,マショウ,
。 補助記号,句点,,,,,。,。,*
EOS
The format of spaCy's JSON is available by specifying `-f 3` or `-f json` for `ginza` command.
console
$ ginza -f json
銀座でランチをご一緒しましょう。
[
{
"paragraphs": [
{
"raw": "銀座でランチをご一緒しましょう。",
"sentences": [
{
"tokens": [
{"id": 1, "orth": "銀座", "tag": "名詞-固有名詞-地名-一般", "pos": "PROPN", "lemma": "銀座", "head": 5, "dep": "obl", "ner": "B-City"},
{"id": 2, "orth": "で", "tag": "助詞-格助詞", "pos": "ADP", "lemma": "で", "head": -1, "dep": "case", "ner": "O"},
{"id": 3, "orth": "ランチ", "tag": "名詞-普通名詞-一般", "pos": "NOUN", "lemma": "ランチ", "head": 3, "dep": "obj", "ner": "O"},
{"id": 4, "orth": "を", "tag": "助詞-格助詞", "pos": "ADP", "lemma": "を", "head": -1, "dep": "case", "ner": "O"},
{"id": 5, "orth": "ご", "tag": "接頭辞", "pos": "NOUN", "lemma": "ご", "head": 1, "dep": "compound", "ner": "O"},
{"id": 6, "orth": "一緒", "tag": "名詞-普通名詞-サ変可能", "pos": "VERB", "lemma": "一緒", "head": 0, "dep": "ROOT", "ner": "O"},
{"id": 7, "orth": "し", "tag": "動詞-非自立可能", "pos": "AUX", "lemma": "する", "head": -1, "dep": "advcl", "ner": "O"},
{"id": 8, "orth": "ましょう", "tag": "助動詞", "pos": "AUX", "lemma": "ます", "head": -2, "dep": "aux", "ner": "O"},
{"id": 9, "orth": "。", "tag": "補助記号-句点", "pos": "PUNCT", "lemma": "。", "head": -3, "dep": "punct", "ner": "O"}
]
}
]
}
]
}
]
If you want to use [`cabocha -f1`](https://taku910.github.io/cabocha/) (lattice style) like output, add `-f 1` or `-f cabocha` option to `ginza` command.
This option's format is almost same as `cabocha -f1` but the `func_index` field (after the slash) is slightly different.
Our `func_index` field indicates the boundary where the `自立語` ends in each `文節` (and the `機能語` might start from there).
And the functional token filter is also slightly different between `cabocha -f1` and ' `ginza -f cabocha`.
console
$ ginza -f cabocha
銀座でランチをご一緒しましょう。
* 0 2D 0/1 0.000000
銀座 名詞,固有名詞,地名,一般,,銀座,ギンザ,* B-City
で 助詞,格助詞,,,,で,デ,* O
* 1 2D 0/1 0.000000
ランチ 名詞,普通名詞,一般,,,ランチ,ランチ, O
を 助詞,格助詞,,,,を,ヲ,* O
* 2 -1D 0/2 0.000000
ご 接頭辞,,,,,ご,ゴ, O
一緒 名詞,普通名詞,サ変可能,,,一緒,イッショ, O
し 動詞,非自立可能,,,サ行変格,連用形-一般,する,シ,* O
ましょう 助動詞,,,,助動詞-マス,意志推量形,ます,マショウ, O
。 補助記号,句点,,,,。,。,* O
EOS
```
Multi-processing (Experimental)
We added -p NUM_PROCESS option from GiNZA v3.0.
Please specify the number of analyzing processes to NUM_PROCESS.
You might want to use all the cpu cores for GiNZA, then execute ginza -p 0.
The memory requirement is about 130MB/process (to be improved).
Coding example
Following steps shows dependency parsing results with sentence boundary 'EOS'.
python
import spacy
nlp = spacy.load('ja_ginza_electra')
doc = nlp('銀座でランチをご一緒しましょう。')
for sent in doc.sents:
for token in sent:
print(
token.i,
token.orth_,
token.lemma_,
token.norm_,
token.morph.get("Reading"),
token.pos_,
token.morph.get("Inflection"),
token.tag_,
token.dep_,
token.head.i,
)
print('EOS')
User Dictionary
The user dictionary files should be set to userDict field of sudachi.json in the installed package directory ofja_ginza_dict package.
Please read the official documents to compile user dictionaries with sudachipy command.
SudachiPy - User defined Dictionary
Sudachi User Dictionary Construction (Japanese Only)
Releases
version 5.x
ginza-5.2.0
- 2024-03-31
- Require python>=3.8
- Migrate to spaCy v3.7
- New functionality
- add Japanese clause recognition API (experimental)
ginza-5.1.3
- 2023-09-25
- Migrate to spaCy v3.6
- Beta release of
ja_ginza_bert_large
ginza-5.1.2
- 2022-03-12
- Migrate to spaCy v3.4
ginza-5.1.1
- 2022-03-12
- Improvements
- auto deploy for pypi by @nimiusrd in #184
- modify github actions: trigger by tagging, stop uploading test pypi by @r-terada in #233
ginza-5.1.0
- 2021-12-10, Euclase
- Important changes
- Upgrade: spaCy v3.2 and Sudachi.rs(SudachiPy v0.6.2)
- Change token information fields #208 #209
doc.user_data["reading_forms"][token.i]->token.morph.get("Reading")doc.user_data["inflections"][token.i]->token.morph.get("Inflection")force_using_normalized_form_as_lemma(True)->token.norm_- All spaCy models, including non-Japanese, are now available with the ginza command #217
- Download and analyze the model at once by specifying the model name in the following form #219
ginza -m en_core_web_md- Change
ginza --require_gpuandginza -gto take agpu_idargument - The default
gpu_idvalue is-1which uses only CPUs ginza -f jsonoption always analyze the line which starts with#regardless the option value of-c. #215
- Improvements
- Batch analysis processing speeds up by 50-60% in GPU environment and 10-40% in CPU environment
- Improved processing efficiency of parallel execution options (
ginza -p {n_process}andginzame) of ginza command #204 - add tests #198 #210 #214
- add benchmark #207 #220
ginza-5.0.3
- 2021-10-15
- Bug fix
Bunsetu span should not cross the sentence boundary#195
ginza-5.0.2
- 2021-09-06
- Bug fix
Command Line -s option and set_split_mode() not working in v5.0.x#185
ginza-5.0.1
- 2021-08-26
- Bug fix
ginzame not woriking in ginza ver. 5#179Command Line -d option not working in v5.0.0#178
- Improvement
- accept
ja-ginzaandja-ginza-electrafor-moption ofginzacommand
- accept
ginza-5.0.0
- 2021-08-26, Demantoid
- Important changes
- Upgrade spaCy to v3
- Release transformer-based
ja-ginza-electramodel - Improve UPOS accuracy of the standard
ja-ginzamodel by addingmorphologizerto the tail of spaCy pipleline - Need to insrtall analysis model along with
ginzapackage - High accuracy model (>=16GB memory needed)
pip install -U ginza ja-ginza-electra
- Speed oriented model
pip install -U ginza ja-ginza
- Change component names of
CompoundSplitterandBunsetuRecognizertocompound_splitterandbunsetu_recognizerrespectively - Also see spaCy v3 Backwards Incompatibilities
- Improvements
- Add command line options
-n- Force using SudachiPy's
normalized_formasToken.lemma_
- Force using SudachiPy's
-m (ja_ginza|ja_ginza_electra)- Select model package
- Revise ENE category name
Degital_GametoDigital_Game
version 4.x
ginza-4.0.6
- 2021-06-01
- Bug fix
- Issue #160: IndexError: list assignment index out of range for empty string
ginza-4.0.5
- 2020-10-01
- Improvements
- Add
-doption, which disables spaCy's sentence separator, toginzacommand line tool
- Add
ginza-4.0.4
- 2020-09-11
- Improvements
ginzacommand line tool works correctly without BunsetuRecognizer in the pipeline
ginza-4.0.3
- 2020-09-10
- Improve bunsetu head identification accuracy over inconsistent deps in ent spans
ginza-4.0.2
- 2020-09-04
- Improvements
- Serialization of
CompoundSplitterfornlp.to_disk() - Bunsetu span detection accuracy
- Serialization of
ginza-4.0.1
- 2020-08-30
- Debug
- Add type arguments for singledispatch register annotations (for Python 3.6)
ginza-4.0.0
- 2020-08-16, Chrysoberyl
- Important changes
- Replace Japanese model with
spacy.lang.jaof spaCy v2.3 - Replace values of
Token.lemma_with the output of SudachiPy'sMorpheme.dictionary_form() - Replace jaginzadict with official SudachiDict-core package
- You can delete
ja_ginza_dictpackage safety - Change options and misc field contents of output of command line tool
- delete usesentenceseparator(-s)
- NE(OntoNotes) BI labels as
B-GPE - Add subfields: Reading, Inf(inflection) and ENE(Extended NE)
- Obsolete
Token._.*and add some entries forDoc.user_data[]and accessors - inflections (
ginza.inflection(Token)) - readingforms (`ginza.readingform(Token)`)
- bunsetubilabels (
ginza.bunsetu_bi_label(Token)) - bunsetupositiontypes (
ginza.bunsetu_position_type(Token)) - bunsetuheads (`ginza.isbunsetu_head(Token)`)
- Change pipeline architecture
- JapaneseCorrector was obsoleted
- Add CompoundSplitter and BunsetuRecognizer
- Upgrade UD_JAPANESE-BCCWJ to v2.6
- Change word2vec to chiVe mc90
- Replace Japanese model with
- API Changes
- Add bunsetu-unit APIs (
from ginza import *) - bunsetu(Token)
- phrase(Token)
- sub_phrases(Token)
- phrases(Span)
- bunsetu_spans(Span)
- bunsetuphrasespans(Span)
- bunsetuheadlist(Span)
- bunsetuheadtokens(Span)
- bunsetubilabels(Span)
- bunsetupositiontypes(Span)
- Add bunsetu-unit APIs (
version 3.x
ginza-3.1.2
- 2020-02-12
- Debug
- Fix: degrade of cabocha mode
ginza-3.1.1
- 2020-01-19
- API Changes
- Extension fields
- The values of
Token._.sudachifield would be set after callingSudachipyTokenizer.set_enable_ex_sudachi(True), to avoid serializtion errors ```python import spacy import pickle nlp = spacy.load('jaginza') doc1 = nlp('This example will be serialized correctly.') doc1.tobytes() with open('sample1.pickle', 'wb') as f: pickle.dump(doc1, f)
nlp.tokenizer.setenableexsudachi(True) doc2 = nlp('This example will cause a serialization error.') doc2.tobytes() with open('sample2.pickle', 'wb') as f: pickle.dump(doc2, f) ```
ginza-3.1.0
- 2020-01-16
- Important changes
- Distribute
ja_ginza_dictfrom PyPI
- Distribute
- API Changes
- commands
ginzaandginzame- add
-ioption to initialize the files ofja_ginza_dict
- add
ginza-3.0.0
- 2020-01-15, Benitoite
- Important changes
- Distribute
ginzaandja_ginzafrom PyPI - Simple installation;
pip install ginza, and runginza - The model package,
ja_ginza, is also available from PyPI. - Model improvements
- Change NER training data-set to GSK2014-A (2019) BCCWJ edition
- Improved accuracy of NER
token.ent_type_value is changed to Sekine's Extended Named Entity Hierarchy- Add
ENE7attribute to the last field of the output ofginza - Move OntoNotes5 -based label to
token._.ne - We extended the OntoNotes5 named entity labels with
PHONE,EMAIL,URL, andPET_NAME
- Overall accuracy is improved by executing
spacy pretrainover 100 epochs- Multi-task learning of
spacy traineffectively working on UD Japanese BCCWJ
- Multi-task learning of
- The newest
SudachiDict_core-20191224 ginzame- Execute
sudachipybymultiprocessing.Pooland output results withmecablike format - Now
sudachipycommand requires additional SudachiDict package installation
- Distribute
- Breaking API Changes
- commands
ginza(ginza.command_line.main_ginza)- change option
modetosudachipy_mode - drop options:
disable_pipesandrecreate_corrector - add options:
hash_comment,parallel,files - add
mecabto the choices for the argument of-foption - add
parallel NUM_PROCESSoption (EXPERIMENTAL) - add
ENE7attribute to conllu miscellaneous field ginza.ent_type_mapping.ENE_NE_MAPPINGis used to convertENE7label toNE
- change option
- add
ginzame(ginza.command_line.main_ginzame)- a multi-process tokenizer providing
mecablike output format
- a multi-process tokenizer providing
- spaCy field extensions
- add
token._.nefor ner label ginza/sudachipy_tokenizer.py- change
SudachiTokenizertoSudachipyTokenizer - use
SUDACHI_DEFAULT_SPLIT_MODEinstead ofSUDACHI_DEFAULT_SPLITMODEorSUDACHI_DEFAULT_MODE
- Dependencies
- upgrade
spacyto v2.2.3 - upgrade
sudachipyto v0.4.2
- upgrade
version 2.x
ginza-2.2.1
- 2019-10-28
- Improvements
- JapaneseCorrector can merge the
as_*type dependencies completely
- JapaneseCorrector can merge the
- Bug fixes
- command line tool failed at the specific situations
ginza-2.2.0
- 2019-10-04, Ametrine
- Important changes
split_modehas been set incorrectly to sudachipy.tokenizer from v2.0.0 (#43)- This bug caused
split_modeincompatibility between the training phase and theginzacommand. split_modewas set to 'B' for training phase and python APIs, but 'C' forginzacommand.- We fixed this bug by setting the default
split_modeto 'C' entirely. - This fix may cause the word segmentation incompatibilities during upgrading GiNZA from v2.0.0 to v2.2.0.
- New features
- Add
-fand--output-formatoption toginzacommand: -f 0or-f conllu: CoNLL-U Syntactic Annotation format-f 1or-f cabocha: cabocha -f1 compatible format- Add custom token fields:
bunsetu_index: bunsetu index starting from 0reading: reading of token (not a pronunciation)sudachi: SudachiPy's morpheme instance (or its list when then tokens are gathered by JapaneseCorrector)
- Add
- Performance improvements
- Tokenizer
- Use latest SudachiDict (SudachiDict_core-20190927.tar.gz)
- Use Cythonized SudachiPy (v0.4.0)
- Dependency parser
- Apply
spacy pretraincommand to capture the language model from UD-Japanese BCCWJ, UD_Japanese-PUD and KWDLC. - Apply multitask objectives by using
-pt 'tag,dep'option ofspacy train - New model file
- ja_ginza-2.2.0.tar.gz
ginza-2.0.0
- 2019-07-08
- Add
ginzacommand- run
ginzafrom the console
- run
- Change package structure
- module package as
ginza - language model package as
ja_ginza spacy.lang.jais overridden byginza
- module package as
- Remove
sudachipyrelated directories- SudachiPy and its dictionary are installed via
pipduringginzainstallation
- SudachiPy and its dictionary are installed via
- User dictionary available
- Token extension fields
- Added
token._.bunsetu_bi_label,token._.bunsetu_position_type- Remained
token._.inf- Removed
pos_detail(same value is set totoken.tag_)
version 1.x
jaginzanopn-1.0.2
- 2019-04-07
- Set depending token index of root as 0 to meet with conllu format definitions
jaginzanopn-1.0.1
- 2019-04-02
- Add new Japanese era 'reiwa' to system_core.dic.
jaginzanopn-1.0.0
- 2019-04-01
- First release version
Development Environment
Development set up
1. Clone from github
console
$ git clone 'https://github.com/megagonlabs/ginza.git'
2. Run python setup.py
For normal environment:
console
$ python setup.py develop
3. Set up system.dic
Copy system.dic from installed package directory of ja_ginza_dict to ./ja_ginza_dict/sudachidict/.
Training models
The analysis model of GiNZA is trained by spacy train command.
console
$ python -m spacy train ja ja_ginza-4.0.0 corpus/ja_ginza-ud-train.json corpus/ja_ginza-ud-dev.json -b ja_vectors_chive_mc90_35k/ -ovl 0.3 -n 100 -m meta.json.ginza -V 4.0.0
Run tests
Ginza uses the pytest framework for testing, and you can run the tests via setup.py without install test requirements explicitly.
Some tests depends on the ginza default models (ja-ginza, ja-ginza-electra), so install them before the tests is needed.
```console $ pip install ja-ginza ja-ginza-electra $ pip install -e .
full test
$ python setup.py test
test single file
$ python setup.py test --addopts ginza/tests/test_analyzer.py ```
Owner
- Name: Megagon Labs
- Login: megagonlabs
- Kind: organization
- Website: https://www.megagon.ai
- Repositories: 23
- Profile: https://github.com/megagonlabs
Citation (CITATION)
@ARTICLE{GiNZA NLP,
AUTHOR = {Hiroshi, Mai and Masayuki},
TITLE = {短単位品詞の用法曖昧性解決と依存関係ラベリングの同時学習},
YEAR = {2019},
JOURNAL = {言語処理学会第25回年次大会},
URL = {http://www.anlp.jp/proceedings/annual_meeting/2019/pdf_dir/F2-3.pdf}
}
GitHub Events
Total
- Issues event: 1
- Watch event: 56
- Fork event: 1
Last Year
- Issues event: 1
- Watch event: 56
- Fork event: 1
Committers
Last synced: about 1 year ago
Top Committers
| Name | Commits | |
|---|---|---|
| hiroshi | h****a@m****i | 322 |
| r-terada | r****3@g****m | 39 |
| Yuta Hayashibe | s****u | 3 |
| wafuwafu13 | m****e@i****p | 3 |
| Shin Uozumi | s****u | 2 |
| Yudai Udagawa | n****d@g****m | 2 |
| Koichi Yasuoka | y****a@k****p | 1 |
| Kuni88 | k****3@g****m | 1 |
| Paul O'Leary McCann | p****m@d****m | 1 |
| Sorami Hisamoto | s@8****o | 1 |
| Yohei Tamura | t****y@g****m | 1 |
| nikkie | t****p@g****m | 1 |
| wataruhashimoto52 | w****e@g****m | 1 |
| Sorami Hisamoto | h****s@w****p | 1 |
| Yusuke Yaguchi | m****e@m****l | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 21
- Total pull requests: 93
- Average time to close issues: about 1 month
- Average time to close pull requests: 3 days
- Total issue authors: 14
- Total pull request authors: 8
- Average comments per issue: 0.71
- Average comments per pull request: 0.16
- Merged pull requests: 88
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- hiroshi-matsuda-rit (7)
- cidrugHug8 (2)
- YuseiYokoyama (1)
- divyadilip91 (1)
- ftnext (1)
- vincentmichael089 (1)
- PyVCEchecker (1)
- adamkolar (1)
- ShoSoejima (1)
- TatsuyaShirakawa (1)
- hungnmai (1)
- lemonov (1)
- tadashikumano (1)
- borh (1)
Pull Request Authors
- hiroshi-matsuda-rit (69)
- r-terada (10)
- shirayu (3)
- wafuwafu13 (3)
- nimiusrd (1)
- wataruhashimoto52 (1)
- ftnext (1)
- sinozu (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 4
-
Total downloads:
- pypi 91,948 last-month
- Total docker downloads: 1,950
-
Total dependent packages: 9
(may contain duplicates) -
Total dependent repositories: 118
(may contain duplicates) - Total versions: 34
- Total maintainers: 1
pypi.org: ginza
GiNZA, An Open Source Japanese NLP Library, based on Universal Dependencies
- Homepage: https://github.com/megagonlabs/ginza
- Documentation: https://ginza.readthedocs.io/
- License: MIT
-
Latest release: 5.2.0
published about 2 years ago
Rankings
Maintainers (1)
pypi.org: ja-ginza
Japanese multi-task CNN trained on UD-Japanese BCCWJ r2.8 + GSK2014-A(2019). Assigns word2vec token vectors. Components: tok2vec, parser, ner, morphologizer, atteribute_ruler, compound_splitter, bunsetu_recognizer.
- Homepage: https://github.com/megagonlabs/ginza
- Documentation: https://ja-ginza.readthedocs.io/
- License: MIT License
-
Latest release: 5.2.0
published about 2 years ago
Rankings
Maintainers (1)
pypi.org: ja-ginza-electra
Japanese multi-task CNN trained on UD-Japanese BCCWJ r2.8 + GSK2014-A(2019) + transformers-ud-japanese-electra--base. Components: transformer, parser, atteribute_ruler, ner, morphologizer, compound_splitter, bunsetu_recognizer.
- Homepage: https://github.com/megagonlabs/ginza
- Documentation: https://ja-ginza-electra.readthedocs.io/
- License: MIT License
-
Latest release: 5.2.0
published about 2 years ago
Rankings
Maintainers (1)
pypi.org: ja-ginza-dict
SudachiDict for ja_ginza (SudachiDict is originally developed by Works Applications Tokushima Laboratory of AI and NLP)
- Homepage: https://github.com/megagonlabs/ginza
- Documentation: https://ja-ginza-dict.readthedocs.io/
- License: MIT
-
Latest release: 3.1.0
published over 6 years ago
Rankings
Maintainers (1)
Dependencies
- SudachiDict-core >=20210802
- SudachiPy >=0.6.2,<0.7.0
- plac >=1.3.3
- spacy >=3.2.0,<3.3.0
- SudachiDict-core >=20210802
- SudachiPy >=0.6.2,<0.7.0
- plac >=1.3.3
- spacy >=3.2.0,<3.3.0
- actions/checkout master composite
- actions/setup-python v1 composite
- pypa/gh-action-pypi-publish master composite
- actions/checkout v2 composite
- actions/setup-python v1 composite