instruction_ja

Japanese instruction data (日本語指示データ)

https://github.com/megagonlabs/instruction_ja

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.2%) to scientific vocabulary

Keywords

corpus japanese-language
Last synced: 6 months ago · JSON representation ·

Repository

Japanese instruction data (日本語指示データ)

Basic Info
  • Host: GitHub
  • Owner: megagonlabs
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 2.26 MB
Statistics
  • Stars: 24
  • Watchers: 3
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
corpus japanese-language
Created over 2 years ago · Last pushed over 2 years ago
Metadata Files
Readme License Citation

README.md

megagonlabs/instruction_ja

License CI Typos

This repository provides natural dialogs in Japanese. They are modifications to kunishou/hh-rlhf-49k-ja, which was manual machine-translated from anthropics/hh-rlhf into Japanese. Therefore they are not necessarily a faithful English translation of the original data.

<!-- This count will be automatically replaced. --> Currently, the number of dialogs is 669.

Examples

| 項目 | 内容 | | --- | --- | | Q | MacBook 13インチを使っています。バッテリーは、通常どのくらいもちますか? | | A | ユーザーマニュアルには、「完全に充電されたバッテリーは、通常の電力レベルで約7時間の使用が可能です。」とあります。ただし実際には、バッテリーの使用時間は使用パターンによって大きく異なります。 | | Q (原文) | I have an M1 macbook 13 inch. How long is my battery typically supposed to last before needing service | | A (原文) | The user manual says that “A fully charged battery should provide approximately seven hours of use at normal power levels. The battery’s charge will decrease over time” (link). In fact, though, battery life varies quite a bit based on your usage patterns. | | Q (機械翻訳) | M1 MacBook 13インチがあります。私のバッテリーは、サービスを必要とする前に通常どのくらい続くことになっていますか | | A (機械翻訳) | ユーザーマニュアルには、「完全に充電されたバッテリーは、通常の電力レベルで約7時間の使用を提供する必要があります。バッテリーの充電は時間とともに減少します」(リンク)。実際、バッテリー寿命は使用パターンに基づいて大きく異なります。 |

| 項目 | 内容 | | --- | --- | | Q | ツナサラダはどのように作りますか? | | A | ツナサラダの材料は、ツナ缶1缶、ガーキン(小さいキュウリのピクルス)1瓶、マヨネーズ大さじ1杯、塩ひとつまみ、黒コショウ少々です。これらを全て混ぜ合わせると完成です。 | | Q (原文) | How do you make tuna salad? | | A (原文) | To make tuna salad, you will need: 1 can of tuna, 1 jar of gherkins, 1 cup of mayonnaise, a pinch of salt, and black pepper. You will also need a cutting board and a chef’s knife. | | Q (機械翻訳) | マグロサラダはどのように作りますか? | | A (機械翻訳) | マグロサラダを作るには、1缶のマグロ、ガーキン1瓶、マヨネーズ1杯、塩のピンチ、黒胡pepperが必要です。また、まな板とシェフのナイフも必要です。|

Files

  • data/data.jsonl
    • Manual modified dialog
  • data/skipped.jsonl
    • Skipped dialogs for manual modifications

Data format

The data format is described in src/schema.py.

License

MIT

Citation

tex @misc{megagonlabs_instruction_ja, title={{megagonlabs/instruction_ja}: Japanese instructions data for LLM}, url={https://github.com/megagonlabs/instruction_ja}, author={Yuta Hayashibe}, year={2023}, }

Owner

  • Name: Megagon Labs
  • Login: megagonlabs
  • Kind: organization

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this work in a project of yours and write about it, please cite our paper using the following citation data."
authors:
  - family-names: Hayashibe
    given-names: Yuta
title: "megagonlabs/instruction_ja: Japanese instructions data for LLM"
url: https://github.com/megagonlabs/instruction_ja

GitHub Events

Total
  • Watch event: 2
Last Year
  • Watch event: 2

Committers

Last synced: 8 months ago

All Time
  • Total Commits: 25
  • Total Committers: 1
  • Avg Commits per committer: 25.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Yuta Hayashibe y****a@h****p 25
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 8 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels