rolebench

ROLEBENCH- A Role Prompting Benchmark

https://github.com/devichand579/rolebench

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.9%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

ROLEBENCH- A Role Prompting Benchmark

Basic Info
  • Host: GitHub
  • Owner: devichand579
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 347 KB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Created about 2 years ago · Last pushed about 2 years ago
Metadata Files
Readme License Citation

README.md

ROLEBENCH

ROLEBENCH is a framework for evaluating the performance of Role-Prompting across different datasets and Large Language Models. - Have a quick run 🏃 Open In Colab

Supported models

  • Llama3-8B Instruct
  • Phi-3 mini-4K Instruct
  • Mistral-7B Instruct
  • Gemma-7B Instruct

Datasets

  • BoolQ (validation split - 3270 samples)
  • COMMONSENSEQA (validation split - 1221 samples)
  • iwslt2017-en-fr dataset (validation split - 890 samples)
  • samsum dataset (test split - 819 samples)

Prompt Template

```bash

BoolQ - Based on the passage:'{passage}'\nAnswer True/False to the question: '{question}' as an Omniscient person.

COMMONSENSEQA - Choose the answer as a critical thinker.\n{question}\n{opt1}. {text1}\n{opt2}. {text2}\n{opt3}. {text3}\n{opt4}. {text4}\n{opt5}. {text5}

IWSLT2017en-fr - Translate '{eng_text}' to french as a Translator.

SamSum - Summarise the Dialogue: {dialogue} as a Storyteller. ```

Results

Model BoolQ COMMONSENSEQA IWSLT2017en-fr SamSum
Llama3 Accuracy = 0.8507
F1 score = 0.8793
Accuracy = 0.7371 BLEU = 0.2399
METEOR = 0.5436
Rouge1 = 0.1725
RougeL = 0.1229
Phi-3 Accuracy = 0.8113
F1 score = 0.8344
Accuracy = 0.7068 BLEU = 0.1928
METEOR = 0.4950
Rouge1 = 0.1383
RougeL = 0.0951
Mistral-7B Accuracy = 0.8281
F1 score = 0.8548
Accuracy = 0.6490 BLEU = 0.1507
METEOR = 0.4763
Rouge1 = 0.1359
RougeL = 0.0991
Gemma-7B Accuracy = 0.6288
F1 score = 0.5831
Accuracy = 0.6288 BLEU = 0.0940
METEOR = 0.3611
Rouge1 = 0.1192
RougeL = 0.0793

Repository Structure

```bash

llama3roleall.ipynb -- Role prompting on all datasets using Llama3-8B Instruct model | |phi3roleall.ipynb -- Role prompting on all datasets using Phi-3 mini-4K Instruct model | |mistralroleall.ipynb -- Role prompting on all datasets using Mistral-7B Instruct model | |Gemmaroleall.ipynb -- Role prompting on all datasets using Gemma-7B Instruct model | |Roleprompting___quantitaiveanalysis.txt |qualitativeanalysis.txt ```

Contribution

The project will always remain OPEN-SOURCE, further contributions involving new models and datasets, formulating new roles in the prompt templates are always welcome.

References

if you find this work useful, please cite this repository: bibtex @software{Budagam_ROLEBENCH-_A_Role_2024, author = {Budagam, Devichand}, month = may, title = {{ROLEBENCH- A Role Prompting Benchmark}}, url = {https://github.com/devichand579/ROLEBENCH}, year = {2024} }

Owner

  • Name: Devichand
  • Login: devichand579
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Budagam"
  given-names: "Devichand"


title: "ROLEBENCH- A Role Prompting Benchmark"
date-released: 2024-05-16
url: "https://github.com/devichand579/ROLEBENCH"

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1