Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (6.9%) to scientific vocabulary
Repository
ROLEBENCH- A Role Prompting Benchmark
Basic Info
- Host: GitHub
- Owner: devichand579
- License: mit
- Language: Jupyter Notebook
- Default Branch: main
- Size: 347 KB
Statistics
- Stars: 1
- Watchers: 1
- Forks: 1
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
ROLEBENCH
ROLEBENCH is a framework for evaluating the performance of Role-Prompting across different datasets and Large Language Models.
- Have a quick run 🏃
Supported models
- Llama3-8B Instruct
- Phi-3 mini-4K Instruct
- Mistral-7B Instruct
- Gemma-7B Instruct
Datasets
- BoolQ (validation split - 3270 samples)
- COMMONSENSEQA (validation split - 1221 samples)
- iwslt2017-en-fr dataset (validation split - 890 samples)
- samsum dataset (test split - 819 samples)
Prompt Template
```bash
BoolQ - Based on the passage:'{passage}'\nAnswer True/False to the question: '{question}' as an Omniscient person.
COMMONSENSEQA - Choose the answer as a critical thinker.\n{question}\n{opt1}. {text1}\n{opt2}. {text2}\n{opt3}. {text3}\n{opt4}. {text4}\n{opt5}. {text5}
IWSLT2017en-fr - Translate '{eng_text}' to french as a Translator.
SamSum - Summarise the Dialogue: {dialogue} as a Storyteller. ```
Results
| Model | BoolQ | COMMONSENSEQA | IWSLT2017en-fr | SamSum |
|---|---|---|---|---|
| Llama3 | Accuracy = 0.8507 F1 score = 0.8793 |
Accuracy = 0.7371 | BLEU = 0.2399 METEOR = 0.5436 |
Rouge1 = 0.1725 RougeL = 0.1229 |
| Phi-3 | Accuracy = 0.8113 F1 score = 0.8344 |
Accuracy = 0.7068 | BLEU = 0.1928 METEOR = 0.4950 |
Rouge1 = 0.1383 RougeL = 0.0951 |
| Mistral-7B | Accuracy = 0.8281 F1 score = 0.8548 |
Accuracy = 0.6490 | BLEU = 0.1507 METEOR = 0.4763 |
Rouge1 = 0.1359 RougeL = 0.0991 |
| Gemma-7B | Accuracy = 0.6288 F1 score = 0.5831 |
Accuracy = 0.6288 | BLEU = 0.0940 METEOR = 0.3611 |
Rouge1 = 0.1192 RougeL = 0.0793 |
Repository Structure
```bash
llama3roleall.ipynb -- Role prompting on all datasets using Llama3-8B Instruct model | |phi3roleall.ipynb -- Role prompting on all datasets using Phi-3 mini-4K Instruct model | |mistralroleall.ipynb -- Role prompting on all datasets using Mistral-7B Instruct model | |Gemmaroleall.ipynb -- Role prompting on all datasets using Gemma-7B Instruct model | |Roleprompting___quantitaiveanalysis.txt |qualitativeanalysis.txt ```
Contribution
The project will always remain OPEN-SOURCE, further contributions involving new models and datasets, formulating new roles in the prompt templates are always welcome.
References
if you find this work useful, please cite this repository:
bibtex
@software{Budagam_ROLEBENCH-_A_Role_2024,
author = {Budagam, Devichand},
month = may,
title = {{ROLEBENCH- A Role Prompting Benchmark}},
url = {https://github.com/devichand579/ROLEBENCH},
year = {2024}
}
Owner
- Name: Devichand
- Login: devichand579
- Kind: user
- Repositories: 1
- Profile: https://github.com/devichand579
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Budagam" given-names: "Devichand" title: "ROLEBENCH- A Role Prompting Benchmark" date-released: 2024-05-16 url: "https://github.com/devichand579/ROLEBENCH"
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1