Science Score: 52.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
    Organization llm-jp has institutional domain (llm-jp.nii.ac.jp)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.2%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: llm-jp
  • License: apache-2.0
  • Language: Python
  • Default Branch: dev
  • Size: 7.87 MB
Statistics
  • Stars: 137
  • Watchers: 12
  • Forks: 41
  • Open Issues: 6
  • Releases: 10
Created over 2 years ago · Last pushed 6 months ago
Metadata Files
Readme License Citation

README.md

LLM-jp

[ English | ]

test_and_lint

:

-

  • (jaster)

jasterDATASET.md

  1. uv pip
  • uv bash # : https://docs.astral.sh/uv/getting-started/installation/ # install uv $ curl -LsSf https://astral.sh/uv/install.sh | sh $ uv sync
  • pip bash $ cd llm-jp-eval $ pip install .
  1. Taskfile
  • pipx $ pipx install go-task-bin
  • uv $ uv tool install go-task-bin
  1. config file, .env bash $ cp configs/config_template.yaml configs/config.yaml $ cp env.sample .env
  1. taskfile
  2. vllm server

Taskfile

Taskfile TaskfileCLI_ARGS -- --optional_args config.yaml

```

setup eval & inference configs

$ cat << EOF >> configs/config.yaml exporters: local: exportoutputtable: true # outputtopn: 5 # EOF $ cat << EOF >> configs/vllm_inference.yaml model: model: llm-jp/llm-jp-3-3.7b-instruct

tokenizer: pretrainedmodelnameorpath: llm-jp/llm-jp-3-3.7b-instruct EOF

download llm-jp-eval-inference repository & build container for evaluation

$ task install $ task evalinference inferenceconfig=configs/vllminference.yaml evalconfig=configs/config.yaml

if you want to evalute non commercial datasets also, put "-- --includenoncommercial" or revise config.yaml directly

$ task evalinference inferenceconfig=configs/vllminference.yaml evalconfig=configs/config.yaml -- --includenoncommercial ```

vllm server

```

download llm-jp-eval-inference repository & build container for evaluation

$ task install $ cd llm-jp-eval-inference/inference-modules/vllm && uv run vllm serve llm-jp/llm-jp-3-3.7b-instruct &

$ cat << EOF >> configs/config.yaml exporters: local: exportoutputtable: true # outputtopn: 5 #

onlineinferenceconfig: provider: vllm-openai # maxconcurrent: 4 hostname: localhost:8000 modelname: llm-jp/llm-jp-3-3.7b-instruct generationconfig: temperature: 0.0 _EOF

$ task run_sandbox

$ uv run scripts/evaluate_llm.py eval --config configs/config.yaml ```

output-dir llm-jp-eval local_files

bash $ uv run python scripts/preprocess_dataset.py \ --dataset-name example_data \ --output-dir /path/to/dataset_dir \ --version-name dataset_version_name

/path/to/dataset_dir (jaster)

task prepare

```bash

$ task prepare

$ task prepare dataset=example_data

$ task prepare dataset=all `` --version-namellm-jp-eval `

configs/config.yaml output_dir suffix hash EvaluationConfig <output_dir>/datasets/<version>/evaluation/<split> inference_input_dir null prompts_<suffix>

bash $ uv run python scripts/evaluate_llm.py dump --config path/to/config.yaml

** datasets dataset**

yaml datasets: - jamp - janli ... categories: # NLI: description: "Natural Language Inference" default_metric: exact_match # metrics: {} # metrics: # wiki_reading: char_f1 datasets: # () - jamp - janli - jnli - jsem - jsick dataset_info_overrides: # dataset_key: # # attribute src/llm-jp_eval/jaster/base.pyOutputInfo attributes: override_value

```yaml

configs/config.yaml

evaldatasetconfigpath: /path/to/datasetconfig.yaml # : './evalconfigs/alldatasets.yaml'

datasetsdataset

includenoncommercial: false ```

llm-jp/llm-jp-eval-inference README

: ```bash $ git clone https://github.com/llm-jp/llm-jp-eval-inference $ cd llm-jp-eval-inference/inference-modules/vllm && uv sync

$ cd llm-jp-eval-inference/inference-modules/vllm && uv run inference.py --config path/to/inference_config.yaml

vllm serve

$ uv run vllm serve organization/model_name ```

config EvaluationConfig llm-jp-eval

wandb WANDB_API_KEY WANDB API KEY

bash $ CUDA_VISIBLE_DEVICES=0 uv run scripts/evaluate_llm.py eval --config config.yaml \ model.pretrained_model_name_or_path=/path/to/model_dir \ tokenizer.pretrained_model_name_or_path=/path/to/tokenizer_dir \ dataset_dir=/path/to/dataset_dir

/scripts/evaluate_llm.py offline_dir wandb resume run

: bash $ CUDA_VISIBLE_DEVICES=0 uv run python scripts/evaluate_llm.py eval \ --inference_result_dir=./llm-jp-eval-inference/inference-modules/vllm/outputs/llm-jp--llm-jp-13b-v2.0_vllm_yyyyMMdd_hhmmss/

OpenAI HTTP Server :

config : yaml online_inference_config: provider: vllm-openai # max_concurrent: 4 hostname: localhost:8000 model_name: llm-jp/llm-jp-3-3.7b-instruct generation_config: temperature: 0.0

: bash $ uv run scripts/evaluate_llm.py eval --config path/to/config.yaml

llm-jp-evalDify-sandbox ****Dify-sandbox


eval_configs/all_datasets.yaml

```yaml

datasetinfooverrides: mbpp: # codeexecsandbox codeexec metrics: ["codeexec", "pylint_check"] ```

Exporters src/llm_jp_eval/exporter exporters1

yaml exporters: # output_dir/results json local: filename_format: result_{run_name}_{timestamp}.json export_output_table: true # output_top_n: 5 # WandB # W&B wandb: export_output_table: true project: project_name entity: entity_name output_top_n: null # null

READMEConfig EvaluationConfig

  1. src/llm_jp_eval/jaster example_data:

    • src/llm_jp_eval/jaster/example_data.py:
    • eval_configs/all_datasets.yaml eval_dataset_config_path: path/to/eval_dataset_config.yaml
  2. bash $ uv run scripts/preprocess_dataset.py \ --dataset-name example_data \ --output_dir /path/to/dataset_dir \ --version-name dataset_version_name /path/to/dataset_dir

  • traindevtest

- traintest train train: dev = 9:1

jaster

jaster llm-jp-eval jaster llm-jp-eval llm-jp-eval LLM jaster

Apache License 2.0 DATASET.md

Contribution

  • Issue
  • pre-commit
    • pre-commit run --all-files uv run pre-commit run --all-files
  • Pull Request :
    • dev dev Pull Request
    • Pull Request
    • dev main

Owner

  • Name: llm-jp
  • Login: llm-jp
  • Kind: organization
  • Email: llm-jp@nii.ac.jp
  • Location: Japan

Citation (CITATION.cff)

cff-version: 1.3.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Han"
  given-names: "Namgi"
- family-names: "植田"
  given-names: "暢大"
- family-names: "大嶽"
  given-names: "匡俊"
- family-names: "勝又"
  given-names: "智"
- family-names: "鎌田"
  given-names: "啓輔"
- family-names: "清丸"
  given-names: "寛一"
- family-names: "児玉"
  given-names: "貴志"
- family-names: "菅原"
  given-names: "朔"
- family-names: "Chen"
  given-names: "Bowen"
- family-names: "松田"
  given-names: "寛"
- family-names: "宮尾"
  given-names: "祐介"
- family-names: "村脇"
  given-names: "有吾"
- family-names: "劉"
  given-names: "弘毅"
title: "llm-jp-eval"
version: 1.3.0
url: "https://github.com/llm-jp/llm-jp-eval"
preferred-citation:
  type: proceedings
  authors:
  - family-names: "Han"
    given-names: "Namgi"
  - family-names: "植田"
    given-names: "暢大"
  - family-names: "大嶽"
    given-names: "匡俊"
  - family-names: "勝又"
    given-names: "智"
  - family-names: "鎌田"
    given-names: "啓輔"
  - family-names: "清丸"
    given-names: "寛一"
  - family-names: "児玉"
    given-names: "貴志"
  - family-names: "菅原"
    given-names: "朔"
  - family-names: "Chen"
    given-names: "Bowen"
  - family-names: "松田"
    given-names: "寛"
  - family-names: "宮尾"
    given-names: "祐介"
  - family-names: "村脇"
    given-names: "有吾"
  - family-names: "劉"
    given-names: "弘毅"
  title: "llm-jp-eval: 日本語大規模言語モデルの自動評価ツール"
  conference: "言語処理学会第30回年次大会 (NLP2024)"
  month: 3
  year: 2024
  url: "https://www.anlp.jp/proceedings/annual_meeting/2024/pdf_dir/A8-2.pdf"

GitHub Events

Total
  • Create event: 42
  • Release event: 3
  • Issues event: 4
  • Watch event: 36
  • Delete event: 36
  • Issue comment event: 95
  • Push event: 233
  • Pull request review comment event: 103
  • Pull request review event: 153
  • Pull request event: 67
  • Fork event: 4
Last Year
  • Create event: 42
  • Release event: 3
  • Issues event: 4
  • Watch event: 36
  • Delete event: 36
  • Issue comment event: 95
  • Push event: 233
  • Pull request review comment event: 103
  • Pull request review event: 153
  • Pull request event: 67
  • Fork event: 4

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 16
  • Total pull requests: 216
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 6 days
  • Total issue authors: 10
  • Total pull request authors: 31
  • Average comments per issue: 4.31
  • Average comments per pull request: 2.03
  • Merged pull requests: 155
  • Bot issues: 0
  • Bot pull requests: 2
Past Year
  • Issues: 1
  • Pull requests: 41
  • Average time to close issues: 6 months
  • Average time to close pull requests: 6 days
  • Issue authors: 1
  • Pull request authors: 8
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.9
  • Merged pull requests: 26
  • Bot issues: 0
  • Bot pull requests: 1
Top Authors
Issue Authors
  • hiroshi-matsuda-rit (7)
  • odashi (2)
  • otakumesi (1)
  • namgiH (1)
  • hkiyomaru (1)
  • ohashi3399 (1)
  • AkimParis (1)
  • olachinkei (1)
  • Taichi-Ibi (1)
  • YumaTsuta (1)
  • kenoharada (1)
Pull Request Authors
  • namgiH (59)
  • Taichi-Ibi (22)
  • e-mon (19)
  • olachinkei (19)
  • hiroshi-matsuda-rit (12)
  • liwii (9)
  • polm (8)
  • t0-0 (7)
  • corochann (7)
  • nobu-g (5)
  • Taka008 (5)
  • AkimfromParis (4)
  • shintaro-ozaki (4)
  • niboshi (4)
  • Hakuyume (3)
Top Labels
Issue Labels
Pull Request Labels
dependencies (2) github_actions (1)

Packages

  • Total packages: 1
  • Total downloads: unknown
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 9
proxy.golang.org: github.com/llm-jp/llm-jp-eval
  • Versions: 9
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 5.4%
Average: 5.6%
Dependent repos count: 5.8%
Last synced: 6 months ago

Dependencies

.github/workflows/lint.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v4 composite
.github/workflows/requirements.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v4 composite
  • stefanzweifel/git-auto-commit-action v5 composite
.github/workflows/test.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v4 composite
poetry.lock pypi
  • accelerate 0.23.0
  • aiohttp 3.8.6
  • aiosignal 1.3.1
  • annotated-types 0.6.0
  • antlr4-python3-runtime 4.9.3
  • anyio 3.7.1
  • async-timeout 4.0.3
  • attrs 23.1.0
  • bitsandbytes 0.41.1
  • certifi 2023.7.22
  • charset-normalizer 3.3.0
  • cmake 3.27.7
  • colorama 0.4.6
  • dataclasses-json 0.6.1
  • exceptiongroup 1.1.3
  • filelock 3.12.4
  • frozenlist 1.4.0
  • fsspec 2023.9.2
  • fuzzywuzzy 0.18.0
  • greenlet 3.0.0
  • huggingface-hub 0.17.3
  • hydra-core 1.3.2
  • idna 3.4
  • jinja2 3.1.2
  • joblib 1.3.2
  • jsonpatch 1.33
  • jsonpointer 2.4
  • langchain 0.0.300
  • langsmith 0.0.44
  • levenshtein 0.21.1
  • lit 17.0.3
  • lxml 4.9.3
  • markupsafe 2.1.3
  • marshmallow 3.20.1
  • mpmath 1.3.0
  • multidict 6.0.4
  • mypy-extensions 1.0.0
  • networkx 3.1
  • numexpr 2.8.7
  • numpy 1.25.2
  • nvidia-cublas-cu11 11.10.3.66
  • nvidia-cuda-cupti-cu11 11.7.101
  • nvidia-cuda-nvrtc-cu11 11.7.99
  • nvidia-cuda-runtime-cu11 11.7.99
  • nvidia-cudnn-cu11 8.5.0.96
  • nvidia-cufft-cu11 10.9.0.58
  • nvidia-curand-cu11 10.2.10.91
  • nvidia-cusolver-cu11 11.4.0.1
  • nvidia-cusparse-cu11 11.7.4.91
  • nvidia-nccl-cu11 2.14.3
  • nvidia-nvtx-cu11 11.7.91
  • omegaconf 2.3.0
  • packaging 23.2
  • peft 0.5.0
  • plac 1.4.0
  • portalocker 2.8.2
  • psutil 5.9.6
  • pydantic 2.4.2
  • pydantic-core 2.10.1
  • python-levenshtein 0.21.1
  • pywin32 306
  • pyyaml 6.0.1
  • rapidfuzz 3.4.0
  • regex 2023.10.3
  • requests 2.31.0
  • sacrebleu 2.3.1
  • safetensors 0.4.0
  • scikit-learn 1.3.1
  • scipy 1.9.3
  • setuptools 68.2.2
  • sniffio 1.3.0
  • sqlalchemy 2.0.22
  • sumeval 0.2.2
  • sympy 1.12
  • tabulate 0.9.0
  • tenacity 8.2.3
  • threadpoolctl 3.2.0
  • tokenizers 0.14.1
  • torch 2.0.0
  • tqdm 4.66.1
  • transformers 4.34.0
  • triton 2.0.0
  • typing-extensions 4.8.0
  • typing-inspect 0.9.0
  • urllib3 2.0.7
  • wheel 0.41.2
  • xmltodict 0.13.0
  • yarl 1.9.2
pyproject.toml pypi
  • accelerate ^0.23.0
  • bitsandbytes >0.40.0
  • fuzzywuzzy ^0.18.0
  • hydra-core ^1.3.2
  • langchain ^0.0.300
  • peft ^0.5.0
  • python ^3.9
  • python-levenshtein ^0.21.1
  • scikit-learn ^1.3.1
  • sumeval ^0.2.2
  • tokenizers >=0.14.0
  • torch 2.0.0
  • transformers >=4.34.0
  • xmltodict ^0.13.0
requirements.txt pypi
  • accelerate ==0.23.0
  • aiohttp ==3.8.5
  • aiosignal ==1.3.1
  • annotated-types ==0.5.0
  • antlr4-python3-runtime ==4.9.3
  • anyio ==3.7.1
  • async-timeout ==4.0.3
  • attrs ==23.1.0
  • certifi ==2023.7.22
  • charset-normalizer ==3.3.0
  • colorama ==0.4.6
  • dataclasses-json ==0.6.1
  • exceptiongroup ==1.1.3
  • filelock ==3.12.4
  • frozenlist ==1.4.0
  • fsspec ==2023.9.2
  • fuzzywuzzy ==0.18.0
  • greenlet ==3.0.0
  • huggingface-hub ==0.16.4
  • hydra-core ==1.3.2
  • idna ==3.4
  • jinja2 ==3.1.2
  • joblib ==1.3.2
  • jsonpatch ==1.33
  • jsonpointer ==2.4
  • langchain ==0.0.300
  • langsmith ==0.0.42
  • levenshtein ==0.21.1
  • lxml ==4.9.3
  • markupsafe ==2.1.3
  • marshmallow ==3.20.1
  • mpmath ==1.3.0
  • multidict ==6.0.4
  • mypy-extensions ==1.0.0
  • networkx ==3.1
  • numexpr ==2.8.7
  • numpy ==1.25.2
  • omegaconf ==2.3.0
  • packaging ==23.2
  • peft ==0.5.0
  • plac ==1.4.0
  • portalocker ==2.8.2
  • psutil ==5.9.5
  • pydantic ==2.4.2
  • pydantic-core ==2.10.1
  • python-levenshtein ==0.21.1
  • pywin32 ==306
  • pyyaml ==6.0.1
  • rapidfuzz ==3.3.1
  • regex ==2023.10.3
  • requests ==2.31.0
  • sacrebleu ==2.3.1
  • safetensors ==0.3.3
  • scikit-learn ==1.3.1
  • scipy ==1.9.3
  • sniffio ==1.3.0
  • sqlalchemy ==2.0.21
  • sumeval ==0.2.2
  • sympy ==1.12
  • tabulate ==0.9.0
  • tenacity ==8.2.3
  • threadpoolctl ==3.2.0
  • tokenizers ==0.14.0
  • torch ==2.1.0
  • tqdm ==4.66.1
  • transformers ==4.34.0
  • typing-extensions ==4.8.0
  • typing-inspect ==0.9.0
  • urllib3 ==2.0.6
  • xmltodict ==0.13.0
  • yarl ==1.9.2