Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.7%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: Wolffy427
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Size: 268 KB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 12 months ago · Last pushed 12 months ago
Metadata Files
Readme License Citation

README.md

EDINET-Bench

📚 Paper | 📝 Blog | 📁 Dataset | 🧑‍💻 Code

This code can be used to evaluate LLMs on EDINET-Bench, a Japanese financial benchmark designed to evaluate the performance of LLMs on challenging financial tasks including accounting fraud detection, earnings forecasting, and industry prediction. This dataset is built leveraging EDINET, a platform managed by the Financial Services Agency (FSA) of Japan that provides access to disclosure documents such as securities reports.

Overview of EDINET-Bench.

For the dataset construction code, please visit https://github.com/SakanaAI/edinet2dataset.

Install

Install the dependencies using uv. uv sync

You also need to configure the API keys for each LLM provider in the .env file.

Evaluation

Accounting Fraud Detection and Earnings Forecast

Use Claude 3.5 Sonnet to predict whether a report is fraudulent based on the Balance Sheet (BS), Cash Flow (CF), Profit and Loss (PL), and summary items from annual reports. bash $ python src/edinet_bench/predict.py --task fraud_detection --model claude-3-5-sonnet-20241022 --sheets bs cf pl summary

Use logistic model as a baseline. bash $ python src/edinet_bench/logistic.py --task earnings_forecast

Create a leaderboard for each model. bash $ python src/edinet_bench/make_leaderboard.py --task fraud_detection

Industry Prediction

Predict a company's industry type (e.g., Banking) based on its current annual report. bash $ python src/edinet_bench/industry_prediction/predict.py --model claude-3-5-sonnet-20241022 --sheets bs cf pl summary

Create a leaderboard for each model. $ python src/edinet_bench/industry_prediction/make_leaderboard.py

Citation

@misc{sugiura2025edinet, author={Issa Sugiura and Takashi Ishida and Taro Makino and Chieko Tazuke and Takanori Nakagawa and Kosuke Nakago and David Ha}, title={{EDINET-Bench: Evaluating LLMs on Complex Financial Tasks using Japanese Financial Statements}}, year={2025}, eprint={2506.08762}, archivePrefix={arXiv}, primaryClass={q-fin.ST}, url={https://arxiv.org/abs/2506.08762}, }

Owner

  • Name: Zhou ke
  • Login: Wolffy427
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
type: dataset
title: "EDINET-Bench: Evaluating LLMs on Complex Financial Tasks using Japanese Financial Statements"
authors:
  - family-names: Sugiura
    given-names: Issa
  - family-names: Ishida
    given-names: Takashi
  - family-names: Makino
    given-names: Taro
  - family-names: Tazuke
    given-names: Chieko
  - family-names: Nakagawa
    given-names: Takanori
  - family-names: Nakago
    given-names: Kosuke
  - family-names: Ha
    given-names: David
url: "https://arxiv.org/abs/2506.08762"
keywords:
  - "financial analysis"
  - "large language models"
  - "benchmark"
  - "Japanese"
  - "EDINET"
license: Apache-2.0

GitHub Events

Total
  • Push event: 1
  • Create event: 2
Last Year
  • Push event: 1
  • Create event: 2

Dependencies

pyproject.toml pypi
  • anthropic >=0.40.0
  • backoff >=2.2.1
  • datasets >=3.5.0
  • loguru >=0.7.3
  • matplotlib >=3.10.0
  • openai >=1.60.0
  • pandas >=2.2.3
  • python-dotenv >=1.0.1
  • scikit-learn >=1.6.0
  • tqdm >=4.67.1
  • weave >=0.51.27