turkish-question-generation

Automated question generation and question answering from Turkish texts using text-to-text transformers

https://github.com/obss/turkish-question-generation

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.5%) to scientific vocabulary

Keywords

arxiv mt5 multilingual neptune-ai nlp question-answering question-generation t5 transformers turkish wandb xquad
Last synced: 6 months ago · JSON representation ·

Repository

Automated question generation and question answering from Turkish texts using text-to-text transformers

Basic Info
Statistics
  • Stars: 47
  • Watchers: 2
  • Forks: 3
  • Open Issues: 0
  • Releases: 1
Topics
arxiv mt5 multilingual neptune-ai nlp question-answering question-generation t5 transformers turkish wandb xquad
Created over 4 years ago · Last pushed over 3 years ago
Metadata Files
Readme License Citation

README.md

Turkish Question Generation

Offical source code for "Automated question generation & question answering from Turkish texts"

citation If you use this software in your work, please cite as: ``` @article{akyon2022questgen, author = {Akyon, Fatih Cagatay and Cavusoglu, Ali Devrim Ekin and Cengiz, Cemil and Altinuc, Sinan Onur and Temizel, Alptekin}, doi = {10.3906/elk-1300-0632.3914}, journal = {Turkish Journal of Electrical Engineering and Computer Sciences}, title = {{Automated question generation and question answering from Turkish texts}}, url = {https://journals.tubitak.gov.tr/elektrik/vol30/iss5/17/}, year = {2022} } ```
install ```bash git clone https://github.com/obss/turkish-question-generation.git cd turkish-question-generation pip install -r requirements.txt ```
train - start a training using args: ```bash python run.py --model_name_or_path google/mt5-small --output_dir runs/exp1 --do_train --do_eval --tokenizer_name_or_path mt5_qg_tokenizer --per_device_train_batch_size 4 --gradient_accumulation_steps 2 --learning_rate 1e-4 --seed 42 --save_total_limit 1 ``` - download [json config](configs/default/config.json) file and start a training: ```bash python run.py config.json ``` - downlaod [yaml config](configs/default/config.yaml) file and start a training: ```bash python run.py config.yaml ```
evaluate - arrange related params in config: ```yaml do_train: false do_eval: true eval_dataset_list: ["tquad2-valid", "xquad.tr"] prepare_data: true mt5_task_list: ["qa", "qg", "ans_ext"] mt5_qg_format: "both" no_cuda: false ``` - start an evaluation: ```bash python run.py config.yaml ```
neptune - install neptune: ```bash pip install neptune-client ``` - download [config](configs/default/config.yaml) file and arrange neptune params: ```yaml run_name: 'exp1' neptune_project: 'name/project' neptune_api_token: 'YOUR_API_TOKEN' ``` - start a training: ```bash python train.py config.yaml ```
wandb - install wandb: ```bash pip install wandb ``` - download [config](configs/default/config.yaml) file and arrange wandb params: ```yaml run_name: 'exp1' wandb_project: 'turque' ``` - start a training: ```bash python train.py config.yaml ```
finetuned checkpoints |name |model |training
data |trained
tasks |model size
(GB) | |--- |--- |--- |--- |--- | |[mt5-small-3task-highlight-tquad2][model_url4] |[mt5-small][model_url2] |[tquad2-train][data_url1] |QA,QG,AnsExt |1.2GB | |[mt5-small-3task-prepend-tquad2][model_url6] |[mt5-small][model_url2] |[tquad2-train][data_url1] |QA,QG,AnsExt |1.2GB | |[mt5-small-3task-highlight-combined3][model_url7] |[mt5-small][model_url2] |[tquad2-train][data_url1]+[tquad2-valid][data_url2]+[xquad.tr][data_url3]|QA,QG,AnsExt |1.2GB | |[mt5-base-3task-highlight-tquad2][model_url5] |[mt5-base][model_url3] |[tquad2-train][data_url1] |QA,QG,AnsExt |2.3GB | |[mt5-base-3task-highlight-combined3][model_url8] |[mt5-base][model_url3] |[tquad2-train][data_url1]+[tquad2-valid][data_url2]+[xquad.tr][data_url3]|QA,QG,AnsExt |2.3GB |
format - answer extraction: input: ``` " Osman Bey 1258 yılında Söğüt’te doğdu. Osman Bey 1 Ağustos 1326’da Bursa’da hayatını kaybetmiştir.1281 yılında Osman Bey 23 yaşında iken Ahi teşkilatından olan Şeyh Edebali’nin kızı Malhun Hatun ile evlendi." ``` target: ``` 1258 Söğüt’te ``` - question answering: input: ``` "question: Osman Bey nerede doğmuştur? context: Osman Bey 1258 yılında Söğüt’te doğdu. Osman Bey 1 Ağustos 1326’da Bursa’da hayatını kaybetmiştir.1281 yılında Osman Bey 23 yaşında iken Ahi teşkilatından olan Şeyh Edebali’nin kızı Malhun Hatun ile evlendi." ``` target: ``` "Söğüt’te" ``` - question generation (prepend): input: ``` "answer: Söğüt’te context: Osman Bey 1258 yılında Söğüt’te doğdu. Osman Bey 1 Ağustos 1326’da Bursa’da hayatını kaybetmiştir.1281 yılında Osman Bey 23 yaşında iken Ahi teşkilatından olan Şeyh Edebali’nin kızı Malhun Hatun ile evlendi." ``` target: ``` "Osman Bey nerede doğmuştur?" ``` - question generation (highlight): input: ``` "generate question: Osman Bey 1258 yılında Söğüt’te doğdu. Osman Bey 1 Ağustos 1326’da Bursa’da hayatını kaybetmiştir.1281 yılında Osman Bey 23 yaşında iken Ahi teşkilatından olan Şeyh Edebali’nin kızı Malhun Hatun ile evlendi." ``` target: ``` "Osman Bey nerede doğmuştur?" ``` - question generation (both): input: ``` "answer: Söğüt’te context: Osman Bey 1258 yılında Söğüt’te doğdu. Osman Bey 1 Ağustos 1326’da Bursa’da hayatını kaybetmiştir.1281 yılında Osman Bey 23 yaşında iken Ahi teşkilatından olan Şeyh Edebali’nin kızı Malhun Hatun ile evlendi." ``` target: ``` "Osman Bey nerede doğmuştur?" ```
paper results
BERTurk-base and mT5-base QA evaluation results for TQuADv2 fine-tuning.

mT5-base QG evaluation results for single-task (ST) and multi-task (MT) for TQuADv2 fine-tuning.

TQuADv1 and TQuADv2 fine-tuning QG evaluation results for multi-task mT5 variants. MT-Both means, mT5 model is fine-tuned with ’Both’ input format and in a multi-task setting.

paper configs You can find the config files used in the paper under [configs/paper](configs/paper).
contributing Before opening a PR: - Install required development packages: ```bash pip install "black==21.7b0" "flake8==3.9.2" "isort==5.9.2" ``` - Reformat with black and isort: ```bash black . --config pyproject.toml isort . ```

Owner

  • Name: Open Business Software Solutions
  • Login: obss
  • Kind: organization
  • Email: rcm@obss.tech
  • Location: Istanbul

Open Source for Open Business

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this package, please consider citing it."
authors:
- family-names: "Akyon"
  given-names: "Fatih Cagatay"
  orcid: "https://orcid.org/0000-0001-7098-3944"
- family-names: "Cavusoglu"
  given-names: "Devrim"
  orcid: "https://orcid.org/0000-0002-5218-1283"
- family-names: "Cengiz"
  given-names: "Cemil"
  orcid: "https://orcid.org/0000-0003-2681-5059"
- family-names: "Altinuc"
  given-names: "Sinan Onur"
  orcid: "https://orcid.org/0000-0001-5119-160X"
- family-names: "Temizel"
  given-names: "Alptekin"
  orcid: "https://orcid.org/0000-0001-6082-2573"
title: "Turkish Question Generation"
version: 2.0.4
doi: arXiv:2111.06476
date-released: 2021-11-11
url: "https://github.com/obss/turkish-question-generation"
preferred-citation:
  type: article
  title: Automated question generation and question answering from Turkish texts
  doi: 10.3906/elk-1300-0632.3914
  url: https://journals.tubitak.gov.tr/elektrik/vol30/iss5/17/
  journal: Turkish Journal of Electrical Engineering and Computer Sciences
  authors:
  - family-names: "Akyon"
    given-names: "Fatih Cagatay"
  - family-names: "Cavusoglu"
    given-names: "Ali Devrim Ekin"
  - family-names: "Cengiz"
    given-names: "Cemil"
  - family-names: "Altinuc"
    given-names: "Sinan Onur"
  - family-names: "Temizel"
    given-names: "Alptekin"
  year: 2022
  pages: 1931,1940

GitHub Events

Total
  • Watch event: 4
  • Member event: 1
Last Year
  • Watch event: 4
  • Member event: 1

Committers

Last synced: 8 months ago

All Time
  • Total Commits: 8
  • Total Committers: 1
  • Avg Commits per committer: 8.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
fatih 3****n 8

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 1
  • Total pull requests: 6
  • Average time to close issues: less than a minute
  • Average time to close pull requests: about 13 hours
  • Total issue authors: 1
  • Total pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 6
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • fcakyon (1)
Pull Request Authors
  • fcakyon (6)
Top Labels
Issue Labels
Pull Request Labels
documentation (3) enhancement (3)

Dependencies

environment.yml pypi
  • bert-score ==0.3.10
  • black ==21.7b0
  • datasets >=1.12.0,<2.0.0
  • flake8 ==3.9.2
  • gdown *
  • isort ==5.9.2
  • jupyterlab ==3.0.14
  • jury >=2.1.0,<3.0.0
  • protobuf >=3.17.3
  • pysocks ==1.5.6
  • pyyaml *
  • rouge-score ==0.0.4
  • sacrebleu ==1.5.1
  • sentencepiece ==0.1.96
  • transformers >=4.10.0,<5.0.0
  • trtokenizer ==0.0.3
requirements.txt pypi
  • bert-score ==0.3.10
  • click ==8.0.4
  • datasets >=1.12.0,<2.0.0
  • gdown *
  • jury >=2.1.0,<3.0.0
  • protobuf <=3.20.1
  • protobuf >=3.17.3
  • pysocks ==1.5.6
  • pyyaml *
  • rouge-score ==0.0.4
  • sacrebleu ==1.5.1
  • sentencepiece ==0.1.96
  • torch ==1.10.0
  • transformers >=4.10.0,<5.0.0
  • trtokenizer ==0.0.3