pilota

✈ SCUD generator (解釈文生成器)

https://github.com/megagonlabs/pilota

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.7%) to scientific vocabulary

Keywords

japanese python sentence-generator

Keywords from Contributors

mesh interactive
Last synced: 6 months ago · JSON representation ·

Repository

✈ SCUD generator (解釈文生成器)

Basic Info
  • Host: GitHub
  • Owner: megagonlabs
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 730 KB
Statistics
  • Stars: 1
  • Watchers: 3
  • Forks: 0
  • Open Issues: 16
  • Releases: 6
Topics
japanese python sentence-generator
Created over 2 years ago · Last pushed over 2 years ago
Metadata Files
Readme License Citation

README.md

✈ Pilota: SCUD generator

License CI Typos

| Name | Input Utterance | Output SCUD | | --- | --- | --- | | Agent | 今回の旅行はどういったご旅行でしょうか? | - | | User | 家族で一泊して、USJに行こうと思ってます。 | 今回の旅行は家族で一泊して、USJに行く。 | | Agent | なるほど、ホテルはもうお決まりですか? | - | | User | まだです。 | ホテルはまだ決まっていない。 | | | ただ、近くが良いなとは思ってて。 | ホテルはUSJの近くが良い。| | | 景色が良くて食事も美味しいところが良いです | 景色が良いホテルが良い。
食事が美味しいホテルが良い。|

Screenshot of web demo

Quick start

Install

bash pip install -U 'pilota[ja-line] @ git+https://github.com/megagonlabs/pilota'

If you need compatible torch for your GPU, please install the specific package like the following step. Please read https://pytorch.org/.

bash pip install -U torch --extra-index-url https://download.pytorch.org/whl/cu118

Run

  1. Prepare inputs (Input Format and plain2request)

    • Command

      bash echo -e 'ご要望をお知らせください\tはい。部屋から富士山が見えて、夜景を見ながら食事のできるホテルがいいな。\nこんにちは\tこんにちは' | python -m pilota.convert.plain2request | tee input.jsonl

- Output

    ```jsonl
    {"context": [{"name": "agent", "text": "ご要望をお知らせください"}], "utterance": "はい。部屋から富士山が見えて、夜景を見ながら食事のできるホテルがいいな。", "sentences": null, "meta": {}}
    {"context": [{"name": "agent", "text": "こんにちは"}], "utterance": "こんにちは", "sentences": null, "meta": {}}
    ```
  1. Feed it to Pilota

    • Command

      console pilota -m megagonlabs/pilota_dialog --batch_size 1 --outlen 60 --nbest 1 --beam 5 < input.jsonl

- Output

    ```jsonl
    [{"scuds_nbest": [[]], "original_ranks": [0], "scores": [0.9911208689212798], "scores_detail": [{"OK": 0.9704028964042664, "incorrect_none": 0.04205145686864853, "lack": 0.0007874675211496651, "limited": 0.0003119863977190107, "non_fluent": 0.0002362923405598849, "untruth": 0.0013080810895189643}], "sentence": "はい。"}, {"scuds_nbest": [["部屋から富士山が見えるホテルが良い。", "夜景を見ながら食事のできるホテルが良い。"]], "original_ranks": [0], "scores": [0.9952289938926696], "scores_detail": [{"OK": 0.9840966463088989, "incorrect_none": 0.010280555114150047, "lack": 0.0032871251460164785, "limited": 0.00041511686868034303, "non_fluent": 0.0002954243100248277, "untruth": 0.003289491171017289}], "sentence": "部屋から富士山が見えて、夜景を見ながら食事のできるホテルがいいな。"}]
    [{"scuds_nbest": [[]], "original_ranks": [0], "scores": [0.9831213414669036], "scores_detail": [{"OK": 0.9704028964042664, "incorrect_none": 0.04205145686864853, "lack": 0.0007874675211496651, "limited": 0.0003119863977190107, "non_fluent": 0.0002362923405598849, "untruth": 0.0013080810895189643}], "sentence": "こんにちは"}]
    ```

-m option also accepts paths of local models.

bash pilota -m /path/to/model --batch_size 1 --ol 60 < input.jsonl

Check other options by pilota -h.

Models

Models are available on https://huggingface.co/megagonlabs/.

| Model | Input Context | Input Utterance | Output | | --- | --- | --- | --- | | megagonlabs/pilota_dialog | Dialog between a user looking for an accommodation and an agent | User's last utterance | SCUDs | | megagonlabs/pilota_scud2query | (Not required) | Users' SCUDs | Queries for accommodation search | | megagonlabs/pilotahotelreview | (Not required) | Text of an accommodation review | SCUDs |

Once downloaded, the model will not be downloaded again. If you cancel the download of a model halfway through the first start-up, or if you need to update it to the latest version, please run with --check_model_update.

You can check local path of downloaded models.

bash huggingface-cli scan-cache | grep ^megagonlabs

Documents

References

  1. Yuta Hayashibe. Self-Contained Utterance Description Corpus for Japanese Dialog. Proc of LREC, pp.1249-1255. (LREC 2022) [PDF]
  2. 林部祐太. 要約付き宿検索対話コーパス. 言語処理学会第27回年次大会論文集,pp.340-344. 2021. (NLP 2021) [PDF]
  3. 林部祐太. 発話とレビューに対する解釈文生成とトピック分類. 言語処理学会第29回年次大会論文集,pp.2013-2017. 2023. (NLP 2023) [PDF]

License

Apache License 2.0

Owner

  • Name: Megagon Labs
  • Login: megagonlabs
  • Kind: organization

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this work in a project of yours and write about it, please cite our paper using the following citation data."
authors:
  - family-names: Hayashibe
    given-names: Yuta
title: pilota
url: https://github.com/megagonlabs/pilota
preferred-citation:
  type: conference-paper
  title: Self-Contained Utterance Description Corpus for Japanese Dialog
  authors:
    - family-names: Hayashibe
      given-names: Yuta
  collection-title: Proceedings of the 13th Language Resources and Evaluation Conference
  year: 2022
  month: 5
  publisher: 
    name: European Language Resources Association
  url: https://aclanthology.org/2022.lrec-1.133
  start: 1249
  end: 1255

GitHub Events

Total
Last Year

Committers

Last synced: 8 months ago

All Time
  • Total Commits: 87
  • Total Committers: 2
  • Avg Commits per committer: 43.5
  • Development Distribution Score (DDS): 0.115
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Yuta Hayashibe y****a@h****p 77
dependabot[bot] 4****] 10
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 8 months ago

All Time
  • Total issues: 1
  • Total pull requests: 109
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 9 days
  • Total issue authors: 1
  • Total pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.73
  • Merged pull requests: 12
  • Bot issues: 0
  • Bot pull requests: 109
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • shirayu (1)
Pull Request Authors
  • dependabot[bot] (100)
Top Labels
Issue Labels
Type: Documentation (1)
Pull Request Labels
Type: Dependencies (100) python (62) javascript (19) github_actions (17)