turkish-question-generation

Automated question generation and question answering from Turkish texts using text-to-text transformers

https://github.com/obss/turkish-question-generation

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
○
Academic publication links
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (8.5%) to scientific vocabulary

Keywords

arxiv mt5 multilingual neptune-ai nlp question-answering question-generation t5 transformers turkish wandb xquad

Last synced: 6 months ago · JSON representation ·

Repository

Automated question generation and question answering from Turkish texts using text-to-text transformers

Basic Info

Host: GitHub
Owner: obss
License: mit
Language: Python
Default Branch: main
Homepage: https://journals.tubitak.gov.tr/elektrik/vol30/iss5/17/
Size: 39.1 KB

Statistics

Stars: 47
Watchers: 2
Forks: 3
Open Issues: 0
Releases: 1

Topics

arxiv mt5 multilingual neptune-ai nlp question-answering question-generation t5 transformers turkish wandb xquad

Created over 4 years ago · Last pushed over 3 years ago

Metadata Files

Readme License Citation

Turkish Question Generation

Offical source code for "Automated question generation & question answering from Turkish texts"

citation

If you use this software in your work, please cite as: ``` @article{akyon2022questgen, author = {Akyon, Fatih Cagatay and Cavusoglu, Ali Devrim Ekin and Cengiz, Cemil and Altinuc, Sinan Onur and Temizel, Alptekin}, doi = {10.3906/elk-1300-0632.3914}, journal = {Turkish Journal of Electrical Engineering and Computer Sciences}, title = {{Automated question generation and question answering from Turkish texts}}, url = {https://journals.tubitak.gov.tr/elektrik/vol30/iss5/17/}, year = {2022} } ```

install

```bash git clone https://github.com/obss/turkish-question-generation.git cd turkish-question-generation pip install -r requirements.txt ```

train

- start a training using args: ```bash python run.py --model_name_or_path google/mt5-small --output_dir runs/exp1 --do_train --do_eval --tokenizer_name_or_path mt5_qg_tokenizer --per_device_train_batch_size 4 --gradient_accumulation_steps 2 --learning_rate 1e-4 --seed 42 --save_total_limit 1 ``` - download [json config](configs/default/config.json) file and start a training: ```bash python run.py config.json ``` - downlaod [yaml config](configs/default/config.yaml) file and start a training: ```bash python run.py config.yaml ```

evaluate

- arrange related params in config: ```yaml do_train: false do_eval: true eval_dataset_list: ["tquad2-valid", "xquad.tr"] prepare_data: true mt5_task_list: ["qa", "qg", "ans_ext"] mt5_qg_format: "both" no_cuda: false ``` - start an evaluation: ```bash python run.py config.yaml ```

neptune

- install neptune: ```bash pip install neptune-client ``` - download [config](configs/default/config.yaml) file and arrange neptune params: ```yaml run_name: 'exp1' neptune_project: 'name/project' neptune_api_token: 'YOUR_API_TOKEN' ``` - start a training: ```bash python train.py config.yaml ```

wandb

- install wandb: ```bash pip install wandb ``` - download [config](configs/default/config.yaml) file and arrange wandb params: ```yaml run_name: 'exp1' wandb_project: 'turque' ``` - start a training: ```bash python train.py config.yaml ```

finetuned checkpoints

|name |model |training
data |trained
tasks |model size
^{(GB) |
|--- |--- |--- |--- |--- |
|[mt5-small-3task-highlight-tquad2][model_url4] |[mt5-small][model_url2] |[tquad2-train][data_url1] |QA,QG,AnsExt |1.2GB |
|[mt5-small-3task-prepend-tquad2][model_url6] |[mt5-small][model_url2] |[tquad2-train][data_url1] |QA,QG,AnsExt |1.2GB |
|[mt5-small-3task-highlight-combined3][model_url7] |[mt5-small][model_url2] |[tquad2-train][data_url1]+[tquad2-valid][data_url2]+[xquad.tr][data_url3]|QA,QG,AnsExt |1.2GB |
|[mt5-base-3task-highlight-tquad2][model_url5] |[mt5-base][model_url3] |[tquad2-train][data_url1] |QA,QG,AnsExt |2.3GB |
|[mt5-base-3task-highlight-combined3][model_url8] |[mt5-base][model_url3] |[tquad2-train][data_url1]+[tquad2-valid][data_url2]+[xquad.tr][data_url3]|QA,QG,AnsExt |2.3GB |}

format

- answer extraction: input: ``` " Osman Bey 1258 yılında Söğüt’te doğdu. Osman Bey 1 Ağustos 1326’da Bursa’da hayatını kaybetmiştir.1281 yılında Osman Bey 23 yaşında iken Ahi teşkilatından olan Şeyh Edebali’nin kızı Malhun Hatun ile evlendi." ``` target: ``` 1258 Söğüt’te ``` - question answering: input: ``` "question: Osman Bey nerede doğmuştur? context: Osman Bey 1258 yılında Söğüt’te doğdu. Osman Bey 1 Ağustos 1326’da Bursa’da hayatını kaybetmiştir.1281 yılında Osman Bey 23 yaşında iken Ahi teşkilatından olan Şeyh Edebali’nin kızı Malhun Hatun ile evlendi." ``` target: ``` "Söğüt’te" ``` - question generation (prepend): input: ``` "answer: Söğüt’te context: Osman Bey 1258 yılında Söğüt’te doğdu. Osman Bey 1 Ağustos 1326’da Bursa’da hayatını kaybetmiştir.1281 yılında Osman Bey 23 yaşında iken Ahi teşkilatından olan Şeyh Edebali’nin kızı Malhun Hatun ile evlendi." ``` target: ``` "Osman Bey nerede doğmuştur?" ``` - question generation (highlight): input: ``` "generate question: Osman Bey 1258 yılında Söğüt’te doğdu. Osman Bey 1 Ağustos 1326’da Bursa’da hayatını kaybetmiştir.1281 yılında Osman Bey 23 yaşında iken Ahi teşkilatından olan Şeyh Edebali’nin kızı Malhun Hatun ile evlendi." ``` target: ``` "Osman Bey nerede doğmuştur?" ``` - question generation (both): input: ``` "answer: Söğüt’te context: Osman Bey 1258 yılında Söğüt’te doğdu. Osman Bey 1 Ağustos 1326’da Bursa’da hayatını kaybetmiştir.1281 yılında Osman Bey 23 yaşında iken Ahi teşkilatından olan Şeyh Edebali’nin kızı Malhun Hatun ile evlendi." ``` target: ``` "Osman Bey nerede doğmuştur?" ```

paper results

BERTurk-base and mT5-base QA evaluation results for TQuADv2 fine-tuning.

mT5-base QG evaluation results for single-task (ST) and multi-task (MT) for TQuADv2 fine-tuning.

TQuADv1 and TQuADv2 fine-tuning QG evaluation results for multi-task mT5 variants. MT-Both means, mT5 model is fine-tuned with ’Both’ input format and in a multi-task setting.

paper configs

You can find the config files used in the paper under [configs/paper](configs/paper).

contributing

Before opening a PR: - Install required development packages: ```bash pip install "black==21.7b0" "flake8==3.9.2" "isort==5.9.2" ``` - Reformat with black and isort: ```bash black . --config pyproject.toml isort . ```

Owner

Name: Open Business Software Solutions
Login: obss
Kind: organization
Email: rcm@obss.tech
Location: Istanbul

Website: https://obss.tech
Twitter: obsstech
Repositories: 13
Profile: https://github.com/obss

Open Source for Open Business

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this package, please consider citing it."
authors:
- family-names: "Akyon"
  given-names: "Fatih Cagatay"
  orcid: "https://orcid.org/0000-0001-7098-3944"
- family-names: "Cavusoglu"
  given-names: "Devrim"
  orcid: "https://orcid.org/0000-0002-5218-1283"
- family-names: "Cengiz"
  given-names: "Cemil"
  orcid: "https://orcid.org/0000-0003-2681-5059"
- family-names: "Altinuc"
  given-names: "Sinan Onur"
  orcid: "https://orcid.org/0000-0001-5119-160X"
- family-names: "Temizel"
  given-names: "Alptekin"
  orcid: "https://orcid.org/0000-0001-6082-2573"
title: "Turkish Question Generation"
version: 2.0.4
doi: arXiv:2111.06476
date-released: 2021-11-11
url: "https://github.com/obss/turkish-question-generation"
preferred-citation:
  type: article
  title: Automated question generation and question answering from Turkish texts
  doi: 10.3906/elk-1300-0632.3914
  url: https://journals.tubitak.gov.tr/elektrik/vol30/iss5/17/
  journal: Turkish Journal of Electrical Engineering and Computer Sciences
  authors:
  - family-names: "Akyon"
    given-names: "Fatih Cagatay"
  - family-names: "Cavusoglu"
    given-names: "Ali Devrim Ekin"
  - family-names: "Cengiz"
    given-names: "Cemil"
  - family-names: "Altinuc"
    given-names: "Sinan Onur"
  - family-names: "Temizel"
    given-names: "Alptekin"
  year: 2022
  pages: 1931,1940

GitHub Events

Total

Watch event: 4
Member event: 1

Last Year

Watch event: 4
Member event: 1

Committers

Last synced: 8 months ago

All Time

Total Commits: 8
Total Committers: 1
Avg Commits per committer: 8.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
fatih	3****n	8

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 1
Total pull requests: 6
Average time to close issues: less than a minute
Average time to close pull requests: about 13 hours
Total issue authors: 1
Total pull request authors: 1
Average comments per issue: 0.0
Average comments per pull request: 0.0
Merged pull requests: 6
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

fcakyon (1)

Pull Request Authors

fcakyon (6)

Top Labels

Issue Labels

Pull Request Labels

documentation (3) enhancement (3)

Dependencies

environment.yml pypi

bert-score ==0.3.10
black ==21.7b0
datasets >=1.12.0,<2.0.0
flake8 ==3.9.2
gdown *
isort ==5.9.2
jupyterlab ==3.0.14
jury >=2.1.0,<3.0.0
protobuf >=3.17.3
pysocks ==1.5.6
pyyaml *
rouge-score ==0.0.4
sacrebleu ==1.5.1
sentencepiece ==0.1.96
transformers >=4.10.0,<5.0.0
trtokenizer ==0.0.3

requirements.txt pypi

bert-score ==0.3.10
click ==8.0.4
datasets >=1.12.0,<2.0.0
gdown *
jury >=2.1.0,<3.0.0
protobuf <=3.20.1
protobuf >=3.17.3
pysocks ==1.5.6
pyyaml *
rouge-score ==0.0.4
sacrebleu ==1.5.1
sentencepiece ==0.1.96
torch ==1.10.0
transformers >=4.10.0,<5.0.0
trtokenizer ==0.0.3

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

turkish-question-generation

Science Score: 57.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Turkish Question Generation

Offical source code for "Automated question generation & question answering from Turkish texts"

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies