Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.7%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: Wolffy427
- License: apache-2.0
- Language: Python
- Default Branch: main
- Size: 268 KB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
EDINET-Bench
📚 Paper | 📝 Blog | 📁 Dataset | 🧑💻 Code
This code can be used to evaluate LLMs on EDINET-Bench, a Japanese financial benchmark designed to evaluate the performance of LLMs on challenging financial tasks including accounting fraud detection, earnings forecasting, and industry prediction. This dataset is built leveraging EDINET, a platform managed by the Financial Services Agency (FSA) of Japan that provides access to disclosure documents such as securities reports.
For the dataset construction code, please visit https://github.com/SakanaAI/edinet2dataset.
Install
Install the dependencies using uv.
uv sync
You also need to configure the API keys for each LLM provider in the .env file.
Evaluation
Accounting Fraud Detection and Earnings Forecast
Use Claude 3.5 Sonnet to predict whether a report is fraudulent based on the Balance Sheet (BS), Cash Flow (CF), Profit and Loss (PL), and summary items from annual reports.
bash
$ python src/edinet_bench/predict.py --task fraud_detection --model claude-3-5-sonnet-20241022 --sheets bs cf pl summary
Use logistic model as a baseline.
bash
$ python src/edinet_bench/logistic.py --task earnings_forecast
Create a leaderboard for each model.
bash
$ python src/edinet_bench/make_leaderboard.py --task fraud_detection
Industry Prediction
Predict a company's industry type (e.g., Banking) based on its current annual report.
bash
$ python src/edinet_bench/industry_prediction/predict.py --model claude-3-5-sonnet-20241022 --sheets bs cf pl summary
Create a leaderboard for each model.
$ python src/edinet_bench/industry_prediction/make_leaderboard.py
Citation
@misc{sugiura2025edinet,
author={Issa Sugiura and Takashi Ishida and Taro Makino and Chieko Tazuke and Takanori Nakagawa and Kosuke Nakago and David Ha},
title={{EDINET-Bench: Evaluating LLMs on Complex Financial Tasks using Japanese Financial Statements}},
year={2025},
eprint={2506.08762},
archivePrefix={arXiv},
primaryClass={q-fin.ST},
url={https://arxiv.org/abs/2506.08762},
}
Owner
- Name: Zhou ke
- Login: Wolffy427
- Kind: user
- Repositories: 1
- Profile: https://github.com/Wolffy427
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
type: dataset
title: "EDINET-Bench: Evaluating LLMs on Complex Financial Tasks using Japanese Financial Statements"
authors:
- family-names: Sugiura
given-names: Issa
- family-names: Ishida
given-names: Takashi
- family-names: Makino
given-names: Taro
- family-names: Tazuke
given-names: Chieko
- family-names: Nakagawa
given-names: Takanori
- family-names: Nakago
given-names: Kosuke
- family-names: Ha
given-names: David
url: "https://arxiv.org/abs/2506.08762"
keywords:
- "financial analysis"
- "large language models"
- "benchmark"
- "Japanese"
- "EDINET"
license: Apache-2.0
GitHub Events
Total
- Push event: 1
- Create event: 2
Last Year
- Push event: 1
- Create event: 2
Dependencies
- anthropic >=0.40.0
- backoff >=2.2.1
- datasets >=3.5.0
- loguru >=0.7.3
- matplotlib >=3.10.0
- openai >=1.60.0
- pandas >=2.2.3
- python-dotenv >=1.0.1
- scikit-learn >=1.6.0
- tqdm >=4.67.1
- weave >=0.51.27