https://github.com/chenghaomou/idefics2-contract-qa

https://github.com/chenghaomou/idefics2-contract-qa

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (3.9%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: ChenghaoMou
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 28.3 KB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Created about 2 years ago · Last pushed almost 2 years ago
Metadata Files
Readme License

README.md

Fine-tuning Idefics2 on EDGAR Contract QA Dataset

See Blog for more details.

  1. dataset.ipynb prepares the dataset.
  2. train.py fine-tunes the model on the dataset.
  3. benchmark.ipynb evaluates the model on the test dataset.

Datasets

chenghao/sec-material-contracts-qa-splitted consists of the following data: 1. chenghao/sec-material-contracts-qa 2. jordyvl/DUDEsubset100val

Data splits: train (80%), test (20%)

Model

More details can be found at idefics2-edgar. The training script can be run with a single GPU (A100-80GB) with low resolution input and QLoRA training.

References:

  1. @NielsRogge's tutorial FinetuneIdefics2formultipagePDFquestionansweringonDUDE.ipynb
  2. Idefics2

Owner

  • Name: Chenghao Mou
  • Login: ChenghaoMou
  • Kind: user
  • Location: Ireland

NLP/AI

GitHub Events

Total
Last Year