https://github.com/ai-forever/libra

https://github.com/ai-forever/libra

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.1%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: ai-forever
  • Language: Python
  • Default Branch: main
  • Size: 2.41 MB
Statistics
  • Stars: 18
  • Watchers: 5
  • Forks: 2
  • Open Issues: 1
  • Releases: 0
Created almost 2 years ago · Last pushed about 1 year ago
Metadata Files
Readme

README.md

LIBRA: Long Input Benchmark for Russian Analysis

LIBRA

Introduction

Welcome to the official GitHub repository for LIBRA (Long Input Benchmark for Russian Analysis). This repository contains the codebase and documentation for evaluating the capabilities of large language models (LLMs) in understanding and processing long texts in Russian.

Usage

Adding Your Own Model

In order to add your own model, create a config file using configs/template.ini for it (e.g., longchat32k.ini) and specify the necessary parameters in it.

Generating Answers

First, you need to generate answers for each task, to do this, use the following command:

bash python predict.py -c path_to_config

The predictions will be saved in "predictions/" or wherever you choose in your config.

Metric Evaluation

After the generated predictions are saved, you need to run the command to evaluate:

bash python eval.py -p path_to_predictions

The results will be saved in "results/".

Datasets

LIBRA includes 21 datasets adapted for different tasks and complexities. The datasets are divided into four complexity groups and allow evaluation across various context lengths ranging from 4k to 128k tokens.

LIBRA

Tasks and Complexity Groups

Group I: Simple Information Retrieval

  • Passkey: Extract a relevant piece of code number from a long text fragment. Based on the original PassKey test from the m LongLLaMA’s GitHub repo.
  • PasskeyWithLibrusec: Similar to Passkey but with added noise from Librusec texts.

Group II: Question Answering and Multiple Choice

  • MatreshkaNames: Identify the person in dialogues based on the discussed topic. We used Matreshka dataset and Russian Names dataset to create this and the next task.
  • MatreshkaYesNo: Indicate whether a specific topic was mentioned in the dialog.
  • LibrusecHistory: Answer questions based on historical texts. Ideologically similiar to the PassageRetrieval dataset from LongBench.
  • ruTREC: Few-shot in-context learning for topic classification. Created by translating the TREC dataset from LongBench.
  • ruSciFi: Answer true/false based on context and general world knowledge. Translation of SciFi dataset from L-Eval which originally was based on SF-Gram.
  • ruSciAbstractRetrieval: Retrieve relevant paragraphs from scientific abstracts.
  • ruTPO: Multiple-choice questions similar to TOEFL exams. Translation of the TPO dataset from L-Eval.
  • ruQuALITY: Multiple-choice QA tasks based on detailed texts. Created by translating the QuALITY dataset from L-Eval.

Group III: Multi-hop Question Answering

  • ruBABILongQA: 5 long-context reasoning tasks for QA using facts hidden among irrelevant information.
  • LongContextMultiQ: Multi-hop QA based on Wikidata and Wikipedia.
  • LibrusecMHQA: Multi-hop QA requiring information distributed across several text parts.
  • ru2WikiMultihopQA: Translation of the 2WikiMultihopQA dataset from LongBench.

Group IV: Complex Reasoning and Mathematical Problems

  • ruSciPassageCount: Count unique paragraphs in a long text. Uses the basic idea of the original PassageCount dataset from LongBench.
  • ruQasper: Question Answering over academic research papers. Created by translating the Qasper dataset from LongBench.
  • ruGSM100: Solve math problems using Chain-of-Thought reasoning. Created by translating the GSM100 dataset from L-Eval.

Citation

@misc{churin2024longinputbenchmarkrussian, title={Long Input Benchmark for Russian Analysis}, author={Igor Churin and Murat Apishev and Maria Tikhonova and Denis Shevelev and Aydar Bulatov and Yuri Kuratov and Sergei Averkiev and Alena Fenogenova}, year={2024}, eprint={2408.02439}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2408.02439}, }

Owner

  • Name: AI Forever
  • Login: ai-forever
  • Kind: organization
  • Location: Armenia

Creating ML for the future. AI projects you already know. We are non-profit organization with members from all over the world.

GitHub Events

Total
  • Watch event: 1
  • Delete event: 6
  • Push event: 2
  • Pull request event: 1
  • Create event: 1
Last Year
  • Watch event: 1
  • Delete event: 6
  • Push event: 2
  • Pull request event: 1
  • Create event: 1

Issues and Pull Requests

Last synced: about 1 year ago

All Time
  • Total issues: 1
  • Total pull requests: 9
  • Average time to close issues: 14 days
  • Average time to close pull requests: 15 days
  • Total issue authors: 1
  • Total pull request authors: 4
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.11
  • Merged pull requests: 8
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 9
  • Average time to close issues: 14 days
  • Average time to close pull requests: 15 days
  • Issue authors: 1
  • Pull request authors: 4
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.11
  • Merged pull requests: 8
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • thehir0 (1)
Pull Request Authors
  • averkij (7)
  • Gscraid (4)
  • yurakuratov (2)
  • thehir0 (2)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

requirements.txt pypi
  • accelerate ==0.28.0
  • datasets ==2.18.0
  • flash-attn ==2.6.1
  • transformers ==4.40.0