Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (7.7%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
✈ SCUD generator (解釈文生成器)
Basic Info
Statistics
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 16
- Releases: 6
Topics
Metadata Files
README.md
✈ Pilota: SCUD generator
| Name | Input Utterance | Output SCUD |
| --- | --- | --- |
| Agent | 今回の旅行はどういったご旅行でしょうか? | - |
| User | 家族で一泊して、USJに行こうと思ってます。 | 今回の旅行は家族で一泊して、USJに行く。 |
| Agent | なるほど、ホテルはもうお決まりですか? | - |
| User | まだです。 | ホテルはまだ決まっていない。 |
| | ただ、近くが良いなとは思ってて。 | ホテルはUSJの近くが良い。|
| | 景色が良くて食事も美味しいところが良いです | 景色が良いホテルが良い。
食事が美味しいホテルが良い。|

Quick start
Install
bash
pip install -U 'pilota[ja-line] @ git+https://github.com/megagonlabs/pilota'
If you need compatible torch for your GPU, please install the specific package like the following step. Please read https://pytorch.org/.
bash
pip install -U torch --extra-index-url https://download.pytorch.org/whl/cu118
Run
Prepare inputs (Input Format and plain2request)
Command
bash echo -e 'ご要望をお知らせください\tはい。部屋から富士山が見えて、夜景を見ながら食事のできるホテルがいいな。\nこんにちは\tこんにちは' | python -m pilota.convert.plain2request | tee input.jsonl
- Output
```jsonl
{"context": [{"name": "agent", "text": "ご要望をお知らせください"}], "utterance": "はい。部屋から富士山が見えて、夜景を見ながら食事のできるホテルがいいな。", "sentences": null, "meta": {}}
{"context": [{"name": "agent", "text": "こんにちは"}], "utterance": "こんにちは", "sentences": null, "meta": {}}
```
Feed it to Pilota
Command
console pilota -m megagonlabs/pilota_dialog --batch_size 1 --outlen 60 --nbest 1 --beam 5 < input.jsonl
- Output
```jsonl
[{"scuds_nbest": [[]], "original_ranks": [0], "scores": [0.9911208689212798], "scores_detail": [{"OK": 0.9704028964042664, "incorrect_none": 0.04205145686864853, "lack": 0.0007874675211496651, "limited": 0.0003119863977190107, "non_fluent": 0.0002362923405598849, "untruth": 0.0013080810895189643}], "sentence": "はい。"}, {"scuds_nbest": [["部屋から富士山が見えるホテルが良い。", "夜景を見ながら食事のできるホテルが良い。"]], "original_ranks": [0], "scores": [0.9952289938926696], "scores_detail": [{"OK": 0.9840966463088989, "incorrect_none": 0.010280555114150047, "lack": 0.0032871251460164785, "limited": 0.00041511686868034303, "non_fluent": 0.0002954243100248277, "untruth": 0.003289491171017289}], "sentence": "部屋から富士山が見えて、夜景を見ながら食事のできるホテルがいいな。"}]
[{"scuds_nbest": [[]], "original_ranks": [0], "scores": [0.9831213414669036], "scores_detail": [{"OK": 0.9704028964042664, "incorrect_none": 0.04205145686864853, "lack": 0.0007874675211496651, "limited": 0.0003119863977190107, "non_fluent": 0.0002362923405598849, "untruth": 0.0013080810895189643}], "sentence": "こんにちは"}]
```
-m option also accepts paths of local models.
bash
pilota -m /path/to/model --batch_size 1 --ol 60 < input.jsonl
Check other options by pilota -h.
Models
Models are available on https://huggingface.co/megagonlabs/.
| Model | Input Context | Input Utterance | Output | | --- | --- | --- | --- | | megagonlabs/pilota_dialog | Dialog between a user looking for an accommodation and an agent | User's last utterance | SCUDs | | megagonlabs/pilota_scud2query | (Not required) | Users' SCUDs | Queries for accommodation search | | megagonlabs/pilotahotelreview | (Not required) | Text of an accommodation review | SCUDs |
Once downloaded, the model will not be downloaded again.
If you cancel the download of a model halfway through the first start-up, or if you need to update it to the latest version, please run with --check_model_update.
You can check local path of downloaded models.
bash
huggingface-cli scan-cache | grep ^megagonlabs
Documents
References
- Yuta Hayashibe. Self-Contained Utterance Description Corpus for Japanese Dialog. Proc of LREC, pp.1249-1255. (LREC 2022) [PDF]
- 林部祐太. 要約付き宿検索対話コーパス. 言語処理学会第27回年次大会論文集,pp.340-344. 2021. (NLP 2021) [PDF]
- 林部祐太. 発話とレビューに対する解釈文生成とトピック分類. 言語処理学会第29回年次大会論文集,pp.2013-2017. 2023. (NLP 2023) [PDF]
License
Apache License 2.0
Owner
- Name: Megagon Labs
- Login: megagonlabs
- Kind: organization
- Website: https://www.megagon.ai
- Repositories: 23
- Profile: https://github.com/megagonlabs
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this work in a project of yours and write about it, please cite our paper using the following citation data."
authors:
- family-names: Hayashibe
given-names: Yuta
title: pilota
url: https://github.com/megagonlabs/pilota
preferred-citation:
type: conference-paper
title: Self-Contained Utterance Description Corpus for Japanese Dialog
authors:
- family-names: Hayashibe
given-names: Yuta
collection-title: Proceedings of the 13th Language Resources and Evaluation Conference
year: 2022
month: 5
publisher:
name: European Language Resources Association
url: https://aclanthology.org/2022.lrec-1.133
start: 1249
end: 1255
GitHub Events
Total
Last Year
Committers
Last synced: 8 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Yuta Hayashibe | y****a@h****p | 77 |
| dependabot[bot] | 4****] | 10 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 8 months ago
All Time
- Total issues: 1
- Total pull requests: 109
- Average time to close issues: about 1 month
- Average time to close pull requests: 9 days
- Total issue authors: 1
- Total pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.73
- Merged pull requests: 12
- Bot issues: 0
- Bot pull requests: 109
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- shirayu (1)
Pull Request Authors
- dependabot[bot] (100)