llm-jp-eval
Science Score: 52.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
✓Institutional organization owner
Organization llm-jp has institutional domain (llm-jp.nii.ac.jp) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (7.2%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: llm-jp
- License: apache-2.0
- Language: Python
- Default Branch: dev
- Size: 7.87 MB
Statistics
- Stars: 137
- Watchers: 12
- Forks: 41
- Open Issues: 6
- Releases: 10
Metadata Files
README.md
LLM-jp
[ English | ]
:
-
- (jaster)
jasterDATASET.md
- uv pip
- uv
bash # : https://docs.astral.sh/uv/getting-started/installation/ # install uv $ curl -LsSf https://astral.sh/uv/install.sh | sh $ uv sync - pip
bash $ cd llm-jp-eval $ pip install .
- pipx
$ pipx install go-task-bin - uv
$ uv tool install go-task-bin
- config file, .env
bash $ cp configs/config_template.yaml configs/config.yaml $ cp env.sample .env
- taskfile
- vllm server
Taskfile
Taskfile
TaskfileCLI_ARGS -- --optional_args config.yaml
```
setup eval & inference configs
$ cat << EOF >> configs/config.yaml exporters: local: exportoutputtable: true # outputtopn: 5 # EOF $ cat << EOF >> configs/vllm_inference.yaml model: model: llm-jp/llm-jp-3-3.7b-instruct
tokenizer: pretrainedmodelnameorpath: llm-jp/llm-jp-3-3.7b-instruct EOF
download llm-jp-eval-inference repository & build container for evaluation
$ task install $ task evalinference inferenceconfig=configs/vllminference.yaml evalconfig=configs/config.yaml
if you want to evalute non commercial datasets also, put "-- --includenoncommercial" or revise config.yaml directly
$ task evalinference inferenceconfig=configs/vllminference.yaml evalconfig=configs/config.yaml -- --includenoncommercial ```
vllm server
```
download llm-jp-eval-inference repository & build container for evaluation
$ task install $ cd llm-jp-eval-inference/inference-modules/vllm && uv run vllm serve llm-jp/llm-jp-3-3.7b-instruct &
$ cat << EOF >> configs/config.yaml exporters: local: exportoutputtable: true # outputtopn: 5 #
onlineinferenceconfig: provider: vllm-openai # maxconcurrent: 4 hostname: localhost:8000 modelname: llm-jp/llm-jp-3-3.7b-instruct generationconfig: temperature: 0.0 _EOF
$ task run_sandbox
$ uv run scripts/evaluate_llm.py eval --config configs/config.yaml ```
output-dir llm-jp-eval local_files
bash
$ uv run python scripts/preprocess_dataset.py \
--dataset-name example_data \
--output-dir /path/to/dataset_dir \
--version-name dataset_version_name
/path/to/dataset_dir (jaster)
task prepare
```bash
$ task prepare
$ task prepare dataset=example_data
$ task prepare dataset=all
``
--version-namellm-jp-eval `
configs/config.yaml output_dir
suffix hash EvaluationConfig
<output_dir>/datasets/<version>/evaluation/<split>
inference_input_dir null prompts_<suffix>
bash
$ uv run python scripts/evaluate_llm.py dump --config path/to/config.yaml
** datasets dataset**
yaml
datasets:
- jamp
- janli
...
categories: #
NLI:
description: "Natural Language Inference"
default_metric: exact_match
#
metrics: {}
# metrics:
# wiki_reading: char_f1
datasets: # ()
- jamp
- janli
- jnli
- jsem
- jsick
dataset_info_overrides: #
dataset_key: #
# attribute src/llm-jp_eval/jaster/base.pyOutputInfo
attributes: override_value
```yaml
configs/config.yaml
evaldatasetconfigpath: /path/to/datasetconfig.yaml # : './evalconfigs/alldatasets.yaml'
datasetsdataset
includenoncommercial: false ```
llm-jp/llm-jp-eval-inference README
- vLLM
- Transformers (TextGenerationPipeline)
- TensorRT-LLM - experimental
: ```bash $ git clone https://github.com/llm-jp/llm-jp-eval-inference $ cd llm-jp-eval-inference/inference-modules/vllm && uv sync
$ cd llm-jp-eval-inference/inference-modules/vllm && uv run inference.py --config path/to/inference_config.yaml
vllm serve
$ uv run vllm serve organization/model_name ```
config EvaluationConfig llm-jp-eval
wandb WANDB_API_KEY WANDB API KEY
bash
$ CUDA_VISIBLE_DEVICES=0 uv run scripts/evaluate_llm.py eval --config config.yaml \
model.pretrained_model_name_or_path=/path/to/model_dir \
tokenizer.pretrained_model_name_or_path=/path/to/tokenizer_dir \
dataset_dir=/path/to/dataset_dir
/scripts/evaluate_llm.py offline_dir
wandb resume run
:
bash
$ CUDA_VISIBLE_DEVICES=0 uv run python scripts/evaluate_llm.py eval \
--inference_result_dir=./llm-jp-eval-inference/inference-modules/vllm/outputs/llm-jp--llm-jp-13b-v2.0_vllm_yyyyMMdd_hhmmss/
OpenAI HTTP Server :
config :
yaml
online_inference_config:
provider: vllm-openai
#
max_concurrent: 4
hostname: localhost:8000
model_name: llm-jp/llm-jp-3-3.7b-instruct
generation_config:
temperature: 0.0
:
bash
$ uv run scripts/evaluate_llm.py eval --config path/to/config.yaml
llm-jp-evalDify-sandbox ****Dify-sandbox
eval_configs/all_datasets.yaml
```yaml
datasetinfooverrides: mbpp: # codeexecsandbox codeexec metrics: ["codeexec", "pylint_check"] ```
Exporters
src/llm_jp_eval/exporter
exporters1
yaml
exporters:
# output_dir/results json
local:
filename_format: result_{run_name}_{timestamp}.json
export_output_table: true #
output_top_n: 5
# WandB
# W&B
wandb:
export_output_table: true
project: project_name
entity: entity_name
output_top_n: null # null
src/llm_jp_eval/jasterexample_data:src/llm_jp_eval/jaster/example_data.py:-
eval_configs/all_datasets.yamleval_dataset_config_path: path/to/eval_dataset_config.yaml
bash $ uv run scripts/preprocess_dataset.py \ --dataset-name example_data \ --output_dir /path/to/dataset_dir \ --version-name dataset_version_name/path/to/dataset_dir
traindevtest
- traintest train train: dev = 9:1
jaster
jaster llm-jp-eval jaster llm-jp-eval llm-jp-eval LLM jaster
Apache License 2.0 DATASET.md
Contribution
- Issue
- pre-commit
pre-commit run --all-filesuv run pre-commit run --all-files
- Pull Request :
devdevPull Request- Pull Request
devmain
Owner
- Name: llm-jp
- Login: llm-jp
- Kind: organization
- Email: llm-jp@nii.ac.jp
- Location: Japan
- Website: https://llm-jp.nii.ac.jp/
- Repositories: 20
- Profile: https://github.com/llm-jp
Citation (CITATION.cff)
cff-version: 1.3.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Han"
given-names: "Namgi"
- family-names: "植田"
given-names: "暢大"
- family-names: "大嶽"
given-names: "匡俊"
- family-names: "勝又"
given-names: "智"
- family-names: "鎌田"
given-names: "啓輔"
- family-names: "清丸"
given-names: "寛一"
- family-names: "児玉"
given-names: "貴志"
- family-names: "菅原"
given-names: "朔"
- family-names: "Chen"
given-names: "Bowen"
- family-names: "松田"
given-names: "寛"
- family-names: "宮尾"
given-names: "祐介"
- family-names: "村脇"
given-names: "有吾"
- family-names: "劉"
given-names: "弘毅"
title: "llm-jp-eval"
version: 1.3.0
url: "https://github.com/llm-jp/llm-jp-eval"
preferred-citation:
type: proceedings
authors:
- family-names: "Han"
given-names: "Namgi"
- family-names: "植田"
given-names: "暢大"
- family-names: "大嶽"
given-names: "匡俊"
- family-names: "勝又"
given-names: "智"
- family-names: "鎌田"
given-names: "啓輔"
- family-names: "清丸"
given-names: "寛一"
- family-names: "児玉"
given-names: "貴志"
- family-names: "菅原"
given-names: "朔"
- family-names: "Chen"
given-names: "Bowen"
- family-names: "松田"
given-names: "寛"
- family-names: "宮尾"
given-names: "祐介"
- family-names: "村脇"
given-names: "有吾"
- family-names: "劉"
given-names: "弘毅"
title: "llm-jp-eval: 日本語大規模言語モデルの自動評価ツール"
conference: "言語処理学会第30回年次大会 (NLP2024)"
month: 3
year: 2024
url: "https://www.anlp.jp/proceedings/annual_meeting/2024/pdf_dir/A8-2.pdf"
GitHub Events
Total
- Create event: 42
- Release event: 3
- Issues event: 4
- Watch event: 36
- Delete event: 36
- Issue comment event: 95
- Push event: 233
- Pull request review comment event: 103
- Pull request review event: 153
- Pull request event: 67
- Fork event: 4
Last Year
- Create event: 42
- Release event: 3
- Issues event: 4
- Watch event: 36
- Delete event: 36
- Issue comment event: 95
- Push event: 233
- Pull request review comment event: 103
- Pull request review event: 153
- Pull request event: 67
- Fork event: 4
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 16
- Total pull requests: 216
- Average time to close issues: about 1 month
- Average time to close pull requests: 6 days
- Total issue authors: 10
- Total pull request authors: 31
- Average comments per issue: 4.31
- Average comments per pull request: 2.03
- Merged pull requests: 155
- Bot issues: 0
- Bot pull requests: 2
Past Year
- Issues: 1
- Pull requests: 41
- Average time to close issues: 6 months
- Average time to close pull requests: 6 days
- Issue authors: 1
- Pull request authors: 8
- Average comments per issue: 1.0
- Average comments per pull request: 0.9
- Merged pull requests: 26
- Bot issues: 0
- Bot pull requests: 1
Top Authors
Issue Authors
- hiroshi-matsuda-rit (7)
- odashi (2)
- otakumesi (1)
- namgiH (1)
- hkiyomaru (1)
- ohashi3399 (1)
- AkimParis (1)
- olachinkei (1)
- Taichi-Ibi (1)
- YumaTsuta (1)
- kenoharada (1)
Pull Request Authors
- namgiH (59)
- Taichi-Ibi (22)
- e-mon (19)
- olachinkei (19)
- hiroshi-matsuda-rit (12)
- liwii (9)
- polm (8)
- t0-0 (7)
- corochann (7)
- nobu-g (5)
- Taka008 (5)
- AkimfromParis (4)
- shintaro-ozaki (4)
- niboshi (4)
- Hakuyume (3)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
- Total downloads: unknown
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 9
proxy.golang.org: github.com/llm-jp/llm-jp-eval
- Documentation: https://pkg.go.dev/github.com/llm-jp/llm-jp-eval#section-documentation
- License: apache-2.0
-
Latest release: v2.0.0+incompatible
published 9 months ago
Rankings
Dependencies
- actions/checkout v4 composite
- actions/setup-python v4 composite
- actions/checkout v4 composite
- actions/setup-python v4 composite
- stefanzweifel/git-auto-commit-action v5 composite
- actions/checkout v4 composite
- actions/setup-python v4 composite
- accelerate 0.23.0
- aiohttp 3.8.6
- aiosignal 1.3.1
- annotated-types 0.6.0
- antlr4-python3-runtime 4.9.3
- anyio 3.7.1
- async-timeout 4.0.3
- attrs 23.1.0
- bitsandbytes 0.41.1
- certifi 2023.7.22
- charset-normalizer 3.3.0
- cmake 3.27.7
- colorama 0.4.6
- dataclasses-json 0.6.1
- exceptiongroup 1.1.3
- filelock 3.12.4
- frozenlist 1.4.0
- fsspec 2023.9.2
- fuzzywuzzy 0.18.0
- greenlet 3.0.0
- huggingface-hub 0.17.3
- hydra-core 1.3.2
- idna 3.4
- jinja2 3.1.2
- joblib 1.3.2
- jsonpatch 1.33
- jsonpointer 2.4
- langchain 0.0.300
- langsmith 0.0.44
- levenshtein 0.21.1
- lit 17.0.3
- lxml 4.9.3
- markupsafe 2.1.3
- marshmallow 3.20.1
- mpmath 1.3.0
- multidict 6.0.4
- mypy-extensions 1.0.0
- networkx 3.1
- numexpr 2.8.7
- numpy 1.25.2
- nvidia-cublas-cu11 11.10.3.66
- nvidia-cuda-cupti-cu11 11.7.101
- nvidia-cuda-nvrtc-cu11 11.7.99
- nvidia-cuda-runtime-cu11 11.7.99
- nvidia-cudnn-cu11 8.5.0.96
- nvidia-cufft-cu11 10.9.0.58
- nvidia-curand-cu11 10.2.10.91
- nvidia-cusolver-cu11 11.4.0.1
- nvidia-cusparse-cu11 11.7.4.91
- nvidia-nccl-cu11 2.14.3
- nvidia-nvtx-cu11 11.7.91
- omegaconf 2.3.0
- packaging 23.2
- peft 0.5.0
- plac 1.4.0
- portalocker 2.8.2
- psutil 5.9.6
- pydantic 2.4.2
- pydantic-core 2.10.1
- python-levenshtein 0.21.1
- pywin32 306
- pyyaml 6.0.1
- rapidfuzz 3.4.0
- regex 2023.10.3
- requests 2.31.0
- sacrebleu 2.3.1
- safetensors 0.4.0
- scikit-learn 1.3.1
- scipy 1.9.3
- setuptools 68.2.2
- sniffio 1.3.0
- sqlalchemy 2.0.22
- sumeval 0.2.2
- sympy 1.12
- tabulate 0.9.0
- tenacity 8.2.3
- threadpoolctl 3.2.0
- tokenizers 0.14.1
- torch 2.0.0
- tqdm 4.66.1
- transformers 4.34.0
- triton 2.0.0
- typing-extensions 4.8.0
- typing-inspect 0.9.0
- urllib3 2.0.7
- wheel 0.41.2
- xmltodict 0.13.0
- yarl 1.9.2
- accelerate ^0.23.0
- bitsandbytes >0.40.0
- fuzzywuzzy ^0.18.0
- hydra-core ^1.3.2
- langchain ^0.0.300
- peft ^0.5.0
- python ^3.9
- python-levenshtein ^0.21.1
- scikit-learn ^1.3.1
- sumeval ^0.2.2
- tokenizers >=0.14.0
- torch 2.0.0
- transformers >=4.34.0
- xmltodict ^0.13.0
- accelerate ==0.23.0
- aiohttp ==3.8.5
- aiosignal ==1.3.1
- annotated-types ==0.5.0
- antlr4-python3-runtime ==4.9.3
- anyio ==3.7.1
- async-timeout ==4.0.3
- attrs ==23.1.0
- certifi ==2023.7.22
- charset-normalizer ==3.3.0
- colorama ==0.4.6
- dataclasses-json ==0.6.1
- exceptiongroup ==1.1.3
- filelock ==3.12.4
- frozenlist ==1.4.0
- fsspec ==2023.9.2
- fuzzywuzzy ==0.18.0
- greenlet ==3.0.0
- huggingface-hub ==0.16.4
- hydra-core ==1.3.2
- idna ==3.4
- jinja2 ==3.1.2
- joblib ==1.3.2
- jsonpatch ==1.33
- jsonpointer ==2.4
- langchain ==0.0.300
- langsmith ==0.0.42
- levenshtein ==0.21.1
- lxml ==4.9.3
- markupsafe ==2.1.3
- marshmallow ==3.20.1
- mpmath ==1.3.0
- multidict ==6.0.4
- mypy-extensions ==1.0.0
- networkx ==3.1
- numexpr ==2.8.7
- numpy ==1.25.2
- omegaconf ==2.3.0
- packaging ==23.2
- peft ==0.5.0
- plac ==1.4.0
- portalocker ==2.8.2
- psutil ==5.9.5
- pydantic ==2.4.2
- pydantic-core ==2.10.1
- python-levenshtein ==0.21.1
- pywin32 ==306
- pyyaml ==6.0.1
- rapidfuzz ==3.3.1
- regex ==2023.10.3
- requests ==2.31.0
- sacrebleu ==2.3.1
- safetensors ==0.3.3
- scikit-learn ==1.3.1
- scipy ==1.9.3
- sniffio ==1.3.0
- sqlalchemy ==2.0.21
- sumeval ==0.2.2
- sympy ==1.12
- tabulate ==0.9.0
- tenacity ==8.2.3
- threadpoolctl ==3.2.0
- tokenizers ==0.14.0
- torch ==2.1.0
- tqdm ==4.66.1
- transformers ==4.34.0
- typing-extensions ==4.8.0
- typing-inspect ==0.9.0
- urllib3 ==2.0.6
- xmltodict ==0.13.0
- yarl ==1.9.2