Releases | Open Source Science

What's Changed

Require python>=3.8
Migrate to spaCy v3.7
New functionality
- add Japanese clause recognition API (experimental)

Full Changelog: https://github.com/megagonlabs/ginza/compare/v5.1.3...v5.2.0

How to Use `ja_ginza_bert_large` β1

Prepare Create a virtual-env to separate ja_ginza_bert_large from other GiNZA model environments. (ja_ginza_bert_large requires the latest spacy-transformers version which is not compatible with ja_ginza or ja_ginza_electra) Console $ python -m venv venv_bert_large $ source venv_bert_large/bin/activate
Install Console $ pip install "https://github.com/megagonlabs/ginza/releases/download/v5.2.0/ja_ginza_bert_large-5.2.0b1-py3-none-any.whl"

For CUDA environments, you need to upgrade spacy with CUDA version number as follows: Console $ pip install -U spacy[cuda117]

Analyze ```Console $ ginza -g 0 -b jaginzabertlarge 銀座でランチをご一緒しましょう。 # text = 銀座でランチをご一緒しましょう。 1 銀座銀座 PROPN 名詞-固有名詞-地名-一般 _ 6 obl _ SpaceAfter=No|BunsetuBILabel=B|BunsetuPositionType=SEMHEAD|NPB|Reading=ギンザ|NE=B-GPE|ENE=B-City|ClauseHead=6 2 でで ADP 助詞-格助詞 _ 1 case _ SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=SYNHEAD|Reading=デ|ClauseHead=6 3 ランチランチ NOUN 名詞-普通名詞-一般 _ 6 obj _ SpaceAfter=No|BunsetuBILabel=B|BunsetuPositionType=SEMHEAD|NPB|Reading=ランチ|ClauseHead=6 4 をを ADP 助詞-格助詞 _ 3 case _ SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=SYNHEAD|Reading=ヲ|ClauseHead=6 5 ごご NOUN 接頭辞 _ 6 compound _ SpaceAfter=No|BunsetuBILabel=B|BunsetuPositionType=CONT|Reading=ゴ|ClauseHead=6 6 一緒一緒 VERB 名詞-普通名詞-サ変可能 _ 0 root _ SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=ROOT|Reading=イッショ|ClauseHead=6 7 しする AUX 動詞-非自立可能 _ 6 aux _ SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=SYNHEAD|Inf=サ行変格,連用形-一般|Reading=シ|ClauseHead=6 8 ましょうます AUX 助動詞 _ 6 aux _ SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|Inf=助動詞-マス,意志推量形|Reading=マショウ|ClauseHead=6 9 。。 PUNCT 補助記号-句点 _ 6 punct _ SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=CONT|Reading=。|ClauseHead=6

```

- Python
Published by hiroshi-matsuda-rit about 2 years ago

What's Changed

Migrate to spaCy v3.6
Beta release of ja_ginza_bert_large

Full Changelog: https://github.com/megagonlabs/ginza/compare/v5.1.2...v5.1.3

How to Use `ja_ginza_bert_large` β1

Prepare Create a virtual-env to separate ja_ginza_bert_large from other GiNZA model environments. (ja_ginza_bert_large requires the latest spacy-transformers version which is not compatible with ja_ginza or ja_ginza_electra) Console $ python -m venv venv_bert_large $ source venv_bert_large/bin/activate
Install Console $ pip install "https://github.com/megagonlabs/ginza/releases/download/v5.1.3/ja_ginza_bert_large-5.1.3b1-py3-none-any.whl"

For CUDA environments, you need to upgrade spacy with CUDA version number as follows: Console $ pip install -U spacy[cuda117]

Analyze ```Console $ ginza -g 0 -b jaginzabertlarge 銀座でランチをご一緒しましょう。 # text = 銀座でランチをご一緒しましょう。 1 銀座銀座 PROPN 名詞-固有名詞-地名-一般 _ 6 obl _ SpaceAfter=No|BunsetuBILabel=B|BunsetuPositionType=SEMHEAD|NPB|Reading=ギンザ|NE=B-GPE|ENE=B-City 2 でで ADP 助詞-格助詞 _ 1 case _ SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=SYNHEAD|Reading=デ 3 ランチランチ NOUN 名詞-普通名詞-一般 _ 6 obj _ SpaceAfter=No|BunsetuBILabel=B|BunsetuPositionType=SEMHEAD|NPB|Reading=ランチ 4 をを ADP 助詞-格助詞 _ 3 case _ SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=SYNHEAD|Reading=ヲ 5 ごご NOUN 接頭辞 _ 6 compound _ SpaceAfter=No|BunsetuBILabel=B|BunsetuPositionType=CONT|Reading=ゴ 6 一緒一緒 VERB 名詞-普通名詞-サ変可能 _ 0 root _ SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=ROOT|Reading=イッショ 7 しする AUX 動詞-非自立可能 _ 6 aux _ SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=SYNHEAD|Inf=サ行変格,連用形-一般|Reading=シ 8 ましょうます AUX 助動詞 _ 6 aux _ SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|Inf=助動詞-マス,意志推量形|Reading=マショウ 9 。。 PUNCT 補助記号-句点 _ 6 punct _ SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=CONT|Reading=。

```

- Python
Published by hiroshi-matsuda-rit almost 3 years ago

What's Changed

add pytest github actions workflow by @r-terada in https://github.com/megagonlabs/ginza/pull/241
Migrate to spaCy v3.4 by @hiroshi-matsuda-rit in https://github.com/megagonlabs/ginza/pull/250

New Contributors

@ftnext made their first contribution in https://github.com/megagonlabs/ginza/pull/239
@wafuwafu13 made their first contribution in https://github.com/megagonlabs/ginza/pull/244

Full Changelog: https://github.com/megagonlabs/ginza/compare/v5.1.1...v5.1.2

- Python
Published by hiroshi-matsuda-rit almost 4 years ago

What's Changed

auto deploy for pypi by @nimiusrd in https://github.com/megagonlabs/ginza/pull/184
modify github actions: trigger by tagging, stop uploading test pypi by @r-terada in https://github.com/megagonlabs/ginza/pull/233

New Contributors

@sinozu made their first contribution in https://github.com/megagonlabs/ginza/pull/230
@wataruhashimoto52 made their first contribution in https://github.com/megagonlabs/ginza/pull/236

Full Changelog: https://github.com/megagonlabs/ginza/compare/v5.1.0...v5.1.1

- Python
Published by hiroshi-matsuda-rit over 4 years ago

ginza-5.1.0

2021-12-10, Euclase
Important changes
- Upgrade: spaCy v3.2 and Sudachi.rs(SudachiPy v0.6.2)
- Change token information fields #208 #209
- doc.user_data[“reading_forms”][token.i] -> token.morph.get(“Reading”)
- doc.user_data[“inflections”][token.i] -> token.morph.get(“Inflection”)
- force_using_normalized_form_as_lemma(True) -> token.norm_
- All spaCy models, including non-Japanese, are now available with the ginza command #217
- Download and analyze the model at once by specifying the model name in the following form #219
- ginza -m en_core_web_md
- ginza -f json option always analyze the line which starts with # regardless the option value of -c. #215
Improvements
- Batch analysis processing speeds up by 50-60% in GPU environment and 10-40% in CPU environment
- Improved processing efficiency of parallel execution options (ginza -p {n_process} and ginzame) of ginza command #204
- add tests #198 #210 #214
- add benchmark #207 #220

- Python
Published by hiroshi-matsuda-rit over 4 years ago

ginza-5.0.3

2021-10-15
Bug fix
- Bunsetu span should not cross the sentence boundary #195

- Python
Published by hiroshi-matsuda-rit over 4 years ago

ginza-5.0.2

2021-09-06
Bug fix
- Command Line -s option and set_split_mode() not working in v5.0.x #185

- Python
Published by hiroshi-matsuda-rit over 4 years ago

ginza-5.0.1

2021-08-26
Bug fix
- ginzame not woriking in ginza ver. 5 #179
- Command Line -d option not working in v5.0.0 #178
Improvement
- accept ja-ginza and ja-ginza-electra for -m option of ginza command

- Python
Published by hiroshi-matsuda-rit almost 5 years ago

ginza-5.0.0

2021-08-26, Demantoid
Important changes
- Upgrade spaCy to v3
- Release transformer-based ja-ginza-electra model
- Improve UPOS accuracy of the standard ja-ginza model by adding morphologizer to the tail of spaCy pipleline
- Need to insrtall analysis model along with ginza package
- High accuracy model (>=16GB memory needed)
  - pip install -U ginza ja-ginza-electra
- Speed oriented model
  - pip install -U ginza ja-ginza
- Change component names of CompoundSplitter and BunsetuRecognizer to compound_splitter and bunsetu_recognizer respectively
- Also see spaCy v3 Backwards Incompatibilities
Improvements
- Add command line options
- -n
  - Force using SudachiPy's normalized_form as Token.lemma_
- -m (ja_ginza|ja_ginza_electra)
  - Select model package
- Revise ENE category name
- Degital_Game to Digital_Game

- Python
Published by hiroshi-matsuda-rit almost 5 years ago

ginza-4.0.6

2021-06-01
Bug fix
- Issue #160: IndexError: list assignment index out of range for empty string

- Python
Published by hiroshi-matsuda-rit about 5 years ago

ginza-4.0.5

2020-10-01
Improvements
- Add -d option, which disables spaCy's sentence separator, to ginza command line tool

- Python
Published by hiroshi-matsuda-rit over 5 years ago