cohesion-tools

Preprocessing and evaluation scripts for Japanese cohesion analysis

https://github.com/nobu-g/cohesion-tools

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.1%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Preprocessing and evaluation scripts for Japanese cohesion analysis

Basic Info
  • Host: GitHub
  • Owner: nobu-g
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 474 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 1
  • Releases: 11
Created over 2 years ago · Last pushed 7 months ago
Metadata Files
Readme License Citation

README.md

Cohesion Tools

PyPI PyPI - Python Version test lint Ruff uv CodeFactor Grade license

Requirements

Installation

bash pip install cohesion-tools # or cohesion-tools[eval] `

Usage

Evaluating Predicted Documents

```python from pathlib import Path from typing import List

from rhoknp import Document from rhoknp.cohesion import ExophoraReferentType from cohesion_tools.evaluators import CohesionEvaluator, CohesionScore

documents: List[Document] = [Document.fromknp(path.readtext()) for path in Path("your/dataset").glob("*.knp")] predicteddocuments = yourmodel(documents)

scorer = CohesionEvaluator( exophorareferenttypes=[ExophoraReferentType(t) for t in ("著者", "読者", "不特定:人", "不特定:物")], pas_cases=["ガ", "ヲ", "ニ"], )

score: CohesionScore = scorer.run(predicteddocuments=predicteddocuments, golddocuments=documents) score.todict() # Convert the evaluation result to a dictionary score.exportcsv("score.csv") # Export the evaluation result to score.csv score.exporttxt("score.txt") # Export the evaluation result to score.txt ```

Extracting Labels From Base Phrases

```python from pathlib import Path from typing import Dict, List

from rhoknp import Document from rhoknp.cohesion import ExophoraReferentType, Argument from cohesion_tools.extractors import PasExtractor

pasextractor = PasExtractor( cases=["ガ", "ヲ", "ニ"], exophorareferent_types=[ExophoraReferentType(t) for t in ("著者", "読者", "不特定:人", "不特定:物")], )

examples = [] documents: List[Document] = [Document.fromknp(path.readtext()) for path in Path("your/dataset").glob("*.knp")] for document in documents: for basephrase in document.basephrases: if pasextractor.istarget(basephrase) is True: rels: Dict[str, List[Argument]] = pasextractor.extractrels(basephrase) examples.append(rels)

yourtrainer.train(yourmodel, examples) ```

Reference

bibtex @inproceedings{ueda-etal-2020-bert, title = {{BERT}-based Cohesion Analysis of {J}apanese Texts}, author = {Ueda, Nobuhiro and Kawahara, Daisuke and Kurohashi, Sadao}, booktitle = {Proceedings of the 28th International Conference on Computational Linguistics}, month = dec, year = {2020}, address = {Barcelona, Spain (Online)}, publisher = {International Committee on Computational Linguistics}, url = {https://aclanthology.org/2020.coling-main.114}, doi = {10.18653/v1/2020.coling-main.114}, pages = {1323--1333}, abstract = {The meaning of natural language text is supported by cohesion among various kinds of entities, including coreference relations, predicate-argument structures, and bridging anaphora relations. However, predicate-argument structures for nominal predicates and bridging anaphora relations have not been studied well, and their analyses have been still very difficult. Recent advances in neural networks, in particular, self training-based language models including BERT (Devlin et al., 2019), have significantly improved many natural language processing tasks, making it possible to dive into the study on analysis of cohesion in the whole text. In this study, we tackle an integrated analysis of cohesion in Japanese texts. Our results significantly outperformed existing studies in each task, especially about 10 to 20 point improvement both for zero anaphora and coreference resolution. Furthermore, we also showed that coreference resolution is different in nature from the other tasks and should be treated specially.} }

bibtex @inproceedings{ueda-etal-2023-kwja, title = {{KWJA}: A Unified {J}apanese Analyzer Based on Foundation Models}, author = {Ueda, Nobuhiro and Omura, Kazumasa and Kodama, Takashi and Kiyomaru, Hirokazu and Murawaki, Yugo and Kawahara, Daisuke and Kurohashi, Sadao}, booktitle = {Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)}, month = jul, year = {2023}, address = {Toronto, Canada}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2023.acl-demo.52}, pages = {538--548}, abstract = {We present KWJA, a high-performance unified Japanese text analyzer based on foundation models.KWJA supports a wide range of tasks, including typo correction, word segmentation, word normalization, morphological analysis, named entity recognition, linguistic feature tagging, dependency parsing, PAS analysis, bridging reference resolution, coreference resolution, and discourse relation analysis, making it the most versatile among existing Japanese text analyzers.KWJA solves these tasks in a multi-task manner but still achieves competitive or better performance compared to existing analyzers specialized for each task.KWJA is publicly available under the MIT license at https://github.com/ku-nlp/kwja.} }

License

This software is released under the MIT License, see LICENSE.

Owner

  • Name: Nobuhiro Ueda
  • Login: nobu-g
  • Kind: user
  • Location: Kyoto, Japan
  • Company: Kyoto University

A Ph.D student at Kyoto University.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: "Cohesion Tools: Preprocessing and evaluation scripts for Japanese cohesion analysis"
authors:
  - family-names: Ueda
    given-names: Nobuhiro
version: 0.5.0
date-released: 2023-08-17
url: "https://github.com/nobu-g/cohesion-tools"

GitHub Events

Total
  • Release event: 2
  • Delete event: 3
  • Push event: 39
  • Pull request event: 23
  • Create event: 6
Last Year
  • Release event: 2
  • Delete event: 3
  • Push event: 39
  • Pull request event: 23
  • Create event: 6

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 0
  • Total pull requests: 23
  • Average time to close issues: N/A
  • Average time to close pull requests: 1 day
  • Total issue authors: 0
  • Total pull request authors: 2
  • Average comments per issue: 0
  • Average comments per pull request: 0.13
  • Merged pull requests: 19
  • Bot issues: 0
  • Bot pull requests: 23
Past Year
  • Issues: 0
  • Pull requests: 9
  • Average time to close issues: N/A
  • Average time to close pull requests: about 9 hours
  • Issue authors: 0
  • Pull request authors: 2
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 8
  • Bot issues: 0
  • Bot pull requests: 9
Top Authors
Issue Authors
Pull Request Authors
  • pre-commit-ci[bot] (33)
  • dependabot[bot] (8)
  • nobu-g (1)
Top Labels
Issue Labels
Pull Request Labels
dependencies (8) github_actions (5) python (3)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 914 last-month
  • Total dependent packages: 1
  • Total dependent repositories: 1
  • Total versions: 11
  • Total maintainers: 1
pypi.org: cohesion-tools

A preprocessing and evaluation tools for Japanese cohesion analysis

  • Versions: 11
  • Dependent Packages: 1
  • Dependent Repositories: 1
  • Downloads: 914 Last month
Rankings
Dependent packages count: 4.7%
Downloads: 7.0%
Average: 11.2%
Dependent repos count: 21.8%
Maintainers (1)
Last synced: 7 months ago

Dependencies

.github/workflows/lint.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
.github/workflows/test.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
poetry.lock pypi
  • appnope 0.1.3
  • asttokens 2.2.1
  • backcall 0.2.0
  • colorama 0.4.6
  • decorator 5.1.1
  • exceptiongroup 1.1.2
  • executing 1.2.0
  • iniconfig 2.0.0
  • ipdb 0.13.13
  • ipython 8.12.2
  • ipython 8.14.0
  • jedi 0.18.2
  • matplotlib-inline 0.1.6
  • numpy 1.24.4
  • numpy 1.25.1
  • packaging 23.1
  • pandas 2.0.3
  • parso 0.8.3
  • pexpect 4.8.0
  • pickleshare 0.7.5
  • pluggy 1.2.0
  • prompt-toolkit 3.0.38
  • ptyprocess 0.7.0
  • pure-eval 0.2.2
  • pygments 2.15.1
  • pytest 7.4.0
  • python-dateutil 2.8.2
  • pytz 2023.3
  • rhoknp 1.3.2
  • six 1.16.0
  • stack-data 0.6.2
  • tomli 2.0.1
  • traitlets 5.9.0
  • typing-extensions 4.6.3
  • tzdata 2023.3
  • wcwidth 0.2.6
pyproject.toml pypi
  • pandas ^2.0
  • python ^3.8
  • rhoknp ^1.3