llm-jp-eval
Modified llm-jp-eval with API and HF scripts for LFMs.
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.5%) to scientific vocabulary
Keywords
Repository
Modified llm-jp-eval with API and HF scripts for LFMs.
Basic Info
Statistics
- Stars: 1
- Watchers: 8
- Forks: 1
- Open Issues: 1
- Releases: 0
Topics
Metadata Files
README.md
Run Evaluation through vLLM API
Overview
- Run the model through vLLM with an OpenAI compatible API.
- For Liquid models, run the on-prem stack, or use Liquid
labs. - For other models, use the
run-vllm.shscript, or use 3rd party providers.
- For Liquid models, run the on-prem stack, or use Liquid
- Run the evaluation script with the model API endpoint and API key.
- The evaluation can be run with Docker (recommended) or locally without Docker.
Run Evaluation with Docker
bash
bin/api/run_docker_eval.sh --config <config-filen>.yaml \
--model-name <model-name> \
--model-url <model-url>/v1 \
--model-api-key <API-KEY>
Examples
Run Swallow evaluation on lfm-3b-jp on-prem:
```bash
bin/api/rundockereval.sh --config configapiswallow.yaml \
--model-name lfm-3b-jp \
--model-url http://localhost:8000/v1 \
--model-api-key
output: ./results/swallow/lfm-3b-jp
```
Run Swallow evaluation on lfm-3b-ichikara on-prem:
```bash
bin/api/rundockereval.sh --config configapiswallow.yaml \
--model-name lfm-3b-ichikara \
--model-url http://localhost:8000/v1 \
--model-api-key
output: ./results/swallow/lfm-3b-ichikara
```
Run Nejumi evaluation on lfm-3b-jp on labs:
```bash
bin/api/rundockereval.sh --config configapinejumi.yaml \
--model-name lfm-3b-jp \
--model-url https://inference-1.liquid.ai/v1 \
--model-api-key
output: ./results/nejumi/lfm-3b-jp
```
Run Evaluation without Docker
(click to see details)
### Installation It is recommended to create a brand new `conda` environment first. But this step is optional. ```bash conda create -n llm-jp-eval python=3.10 conda activate llm-jp-eval ``` Run the following commands to set up the environment and install the dependencies. This step can take a few minutes. They are idempotent and safe to run multiple times. ```bash bin/api/prepare.sh bin/api/download_data.sh ``` Then run the evaluation script: ```bash bin/api/run_api_eval.sh --configConfigs
(click to see details about swallow and nejumi configs)
### Swallow Both `configs/config_api.yaml` and `configs/config_api_swallow.yaml` are for running [Swallow](https://swallow-llm.github.io/evaluation/about.ja.html) evaluations. It runs all samples, and sets different shots for different tests: | Test | Number of Shots | | --- | --- | | ALT, JCom, JEMHopQA, JSQuAD, MGSM, NIILC, WikiCorpus | 4 | | JMMLU, MMLU_EN, XL-SUM (0-shot) | 5 | `configs/config_api.yaml` has been deprecated and will be removed in the future. Please use `configs/config_api_swallow.yaml` instead. ### Nejumi `configs/config_api_nejumi.yaml` is for running Nejumi evaluations. It sets **0-shot** and runs **100 samples** for each test.Non-Liquid Model Evaluation
To launch any model on HuggingFace, first run the following command in the on-prem stack:
```bash
./run-vllm.sh \
--model-name
e.g.
./run-vllm.sh \ --model-name llama-7b \ --hf-model-path "meta-llama/Llama-2-7b-chat-hf" \ --hf-token hfmocktoken_abcd ```
Note that no API key is needed for generic vLLM launched by run-vllm.sh.
Then run the evaluation script using the relevant URL and model name.
Troubleshooting
(click to expand)
### `PermissionError` when running `XL-SUM` tests Tests like `XL-SUM` need to download extra models from Huggingface for evaluation. This process requires access to the Huggingface cache directory. The `bin/api/prepare.sh` script does create this directory manually. However, if the cache directory has already been created by root or other users on the machine, the download will fail with a `PermissionError` like below: > PermissionError: [Errno 13] Permission denied: '/home/ubuntu/.cache/huggingface/hub/.locks/models--bert-base-multilingual-cased' The fix is to change the ownership of the cache directory to the current user: ```bash sudo chown $USER:$USER ~/.cache/huggingface/hub/.locks ```Acknowledgement
This repository is modified from llm-jp/llm-jp-eval.
Owner
- Name: Liquid AI
- Login: Liquid4All
- Kind: organization
- Email: code@liquid.ai
- Location: United States of America
- Website: https://liquid.ai
- Twitter: LiquidAI_
- Repositories: 1
- Profile: https://github.com/Liquid4All
Liquid AI, Inc.
GitHub Events
Total
- Watch event: 1
- Delete event: 3
- Push event: 26
- Public event: 1
- Pull request event: 6
- Fork event: 1
- Create event: 3
Last Year
- Watch event: 1
- Delete event: 3
- Push event: 26
- Public event: 1
- Pull request event: 6
- Fork event: 1
- Create event: 3
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 0
- Total pull requests: 5
- Average time to close issues: N/A
- Average time to close pull requests: 7 minutes
- Total issue authors: 0
- Total pull request authors: 3
- Average comments per issue: 0
- Average comments per pull request: 0.2
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 3
Past Year
- Issues: 0
- Pull requests: 5
- Average time to close issues: N/A
- Average time to close pull requests: 7 minutes
- Issue authors: 0
- Pull request authors: 3
- Average comments per issue: 0
- Average comments per pull request: 0.2
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 3
Top Authors
Issue Authors
Pull Request Authors
- tuliren (2)
- dependabot[bot] (2)
- devin-ai-integration[bot] (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/checkout v4 composite
- actions/setup-python v5 composite
- actions/checkout v4 composite
- actions/setup-python v5 composite
- stefanzweifel/git-auto-commit-action v5 composite
- actions/checkout v4 composite
- actions/setup-python v5 composite
- actions/cache v4 composite
- actions/checkout v4 composite
- actions/setup-python v5 composite
- ubuntu 22.04 build
- bitsandbytes *
- hydra-core *
- peft >=0.12.0
- torch ==2.4.0
- transformers >=4.44.2
- wandb >=0.17.7,<0.18.0
- wheel *
- bitsandbytes *
- hydra-core *
- peft >=0.12.0
- torch ==2.4.0
- transformers >=4.44.2
- wandb >=0.17.7,<0.18.0
- wheel *
- click ==8.0.2
- cython <3.0.0
- hydra-core <1.3.0
- markdown-it-py <2.3.0
- omegaconf <2.3.0
- setuptools ==65.5.1
- wandb >=0.17.7,<0.18.0
- mpmath ==1.3.0
- nemo-toolkit <=1.20.0,>=1.18.0
- pydantic >=2.0.0
- transformers_stream_generator ==0.0.4
- hydra-core *
- numpy ==1.26.4
- torch ==2.4.0
- transformers >=4.45.1,<4.46.0
- vllm ==0.6.2
- vllm-flash-attn *
- wandb >=0.17.7,<0.18.0
- wheel *
- 155 dependencies
- mock * develop
- pytest ^7.4.3 develop
- accelerate ^0.26.0
- bert-score ^0.3.12
- datasets ^2.9.0
- fastparquet ^2023.10.0
- fuzzywuzzy ^0.18.0
- hydra-core ^1.3.2
- langchain ^0.2
- langchain-community ^0.2.3
- langchain-huggingface ^0.0.2
- langchain-openai ^0.1.7
- pandas ^2.1.3
- peft ^0.5.0
- pyarrow ^15.0.0
- pylint ^3.0.0
- python >=3.9,<3.13
- python-levenshtein ^0.25.1
- rhoknp ^1.6.0
- rouge-score ^0.1.2
- sacrebleu ^2.3.0
- scikit-learn ^1.3.1
- sumeval ^0.2.2
- tokenizers >=0.14.0
- torch >=2.1.1
- transformers ^4.42.0
- typing-extensions ^4.8.0
- unbabel-comet ^2.2.0
- wandb >=0.16.0
- xmltodict ^0.13.0
- absl-py ==2.1.0
- accelerate ==0.26.1
- aiohappyeyeballs ==2.4.0
- aiohttp ==3.10.5
- aiosignal ==1.3.1
- annotated-types ==0.7.0
- antlr4-python3-runtime ==4.9.3
- anyio ==4.4.0
- astroid ==3.2.4
- async-timeout ==4.0.3
- attrs ==24.2.0
- bert-score ==0.3.13
- certifi ==2024.8.30
- charset-normalizer ==3.3.2
- click ==8.1.7
- colorama ==0.4.6
- contourpy ==1.3.0
- cramjam ==2.8.3
- cycler ==0.12.1
- dataclasses-json ==0.6.7
- datasets ==2.21.0
- dill ==0.3.8
- distro ==1.9.0
- docker-pycreds ==0.4.0
- entmax ==1.3
- exceptiongroup ==1.2.2
- fastparquet ==2023.10.1
- filelock ==3.15.4
- fonttools ==4.53.1
- frozenlist ==1.4.1
- fsspec ==2024.6.1
- fuzzywuzzy ==0.18.0
- gitdb ==4.0.11
- gitpython ==3.1.43
- greenlet ==3.0.3
- h11 ==0.14.0
- httpcore ==1.0.5
- httpx ==0.27.2
- huggingface-hub ==0.24.6
- hydra-core ==1.3.2
- idna ==3.8
- importlib-resources ==6.4.4
- ipadic ==1.0.0
- isort ==5.13.2
- jinja2 ==3.1.4
- jiter ==0.5.0
- joblib ==1.4.2
- jsonargparse ==3.13.1
- jsonpatch ==1.33
- jsonpointer ==3.0.0
- kiwisolver ==1.4.7
- langchain ==0.2.16
- langchain-community ==0.2.16
- langchain-core ==0.2.38
- langchain-huggingface ==0.0.2
- langchain-openai ==0.1.23
- langchain-text-splitters ==0.2.4
- langsmith ==0.1.111
- levenshtein ==0.25.1
- lightning-utilities ==0.11.7
- lxml ==5.3.0
- markupsafe ==2.1.5
- marshmallow ==3.22.0
- matplotlib ==3.9.2
- mccabe ==0.7.0
- mecab-python3 ==1.0.9
- mpmath ==1.3.0
- multidict ==6.0.5
- multiprocess ==0.70.16
- mypy-extensions ==1.0.0
- networkx ==3.2.1
- nltk ==3.9.1
- numpy ==1.26.4
- nvidia-cublas-cu12 ==12.1.3.1
- nvidia-cuda-cupti-cu12 ==12.1.105
- nvidia-cuda-nvrtc-cu12 ==12.1.105
- nvidia-cuda-runtime-cu12 ==12.1.105
- nvidia-cudnn-cu12 ==9.1.0.70
- nvidia-cufft-cu12 ==11.0.2.54
- nvidia-curand-cu12 ==10.3.2.106
- nvidia-cusolver-cu12 ==11.4.5.107
- nvidia-cusparse-cu12 ==12.1.0.106
- nvidia-nccl-cu12 ==2.20.5
- nvidia-nvjitlink-cu12 ==12.6.68
- nvidia-nvtx-cu12 ==12.1.105
- omegaconf ==2.3.0
- openai ==1.43.0
- orjson ==3.10.7
- packaging ==24.1
- pandas ==2.2.2
- peft ==0.5.0
- pillow ==10.4.0
- plac ==1.4.3
- platformdirs ==4.2.2
- portalocker ==2.10.1
- protobuf ==4.25.4
- psutil ==6.0.0
- pyarrow ==15.0.2
- pydantic ==2.8.2
- pydantic-core ==2.20.1
- pylint ==3.2.7
- pyparsing ==3.1.4
- python-dateutil ==2.9.0.post0
- python-levenshtein ==0.25.1
- pytorch-lightning ==2.4.0
- pytz ==2024.1
- pywin32 ==306
- pyyaml ==6.0.2
- rapidfuzz ==3.9.7
- regex ==2024.7.24
- requests ==2.32.3
- rhoknp ==1.7.0
- rouge-score ==0.1.2
- sacrebleu ==2.4.3
- safetensors ==0.4.4
- scikit-learn ==1.5.1
- scipy ==1.13.1
- sentence-transformers ==3.0.1
- sentencepiece ==0.1.99
- sentry-sdk ==2.13.0
- setproctitle ==1.3.3
- setuptools ==74.1.1
- six ==1.16.0
- smmap ==5.0.1
- sniffio ==1.3.1
- sqlalchemy ==2.0.33
- sumeval ==0.2.2
- sympy ==1.13.2
- tabulate ==0.9.0
- tenacity ==8.5.0
- text-generation ==0.7.0
- threadpoolctl ==3.5.0
- tiktoken ==0.7.0
- tokenizers ==0.19.1
- tomli ==2.0.1
- tomlkit ==0.13.2
- torch ==2.4.0
- torchmetrics ==0.10.3
- tqdm ==4.66.5
- transformers ==4.44.2
- triton ==3.0.0
- typing-extensions ==4.12.2
- typing-inspect ==0.9.0
- tzdata ==2024.1
- unbabel-comet ==2.2.2
- urllib3 ==2.2.2
- wandb ==0.17.8
- xmltodict ==0.13.0
- xxhash ==3.5.0
- yarl ==1.9.8
- zipp ==3.20.1
- python 3.9-slim build