turkish-question-generation
Automated question generation and question answering from Turkish texts using text-to-text transformers
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.5%) to scientific vocabulary
Keywords
arxiv
mt5
multilingual
neptune-ai
nlp
question-answering
question-generation
t5
transformers
turkish
wandb
xquad
Last synced: 6 months ago
·
JSON representation
·
Repository
Automated question generation and question answering from Turkish texts using text-to-text transformers
Basic Info
- Host: GitHub
- Owner: obss
- License: mit
- Language: Python
- Default Branch: main
- Homepage: https://journals.tubitak.gov.tr/elektrik/vol30/iss5/17/
- Size: 39.1 KB
Statistics
- Stars: 47
- Watchers: 2
- Forks: 3
- Open Issues: 0
- Releases: 1
Topics
arxiv
mt5
multilingual
neptune-ai
nlp
question-answering
question-generation
t5
transformers
turkish
wandb
xquad
Created over 4 years ago
· Last pushed over 3 years ago
Metadata Files
Readme
License
Citation
README.md
Turkish Question Generation
Offical source code for "Automated question generation & question answering from Turkish texts"
citation
If you use this software in your work, please cite as: ``` @article{akyon2022questgen, author = {Akyon, Fatih Cagatay and Cavusoglu, Ali Devrim Ekin and Cengiz, Cemil and Altinuc, Sinan Onur and Temizel, Alptekin}, doi = {10.3906/elk-1300-0632.3914}, journal = {Turkish Journal of Electrical Engineering and Computer Sciences}, title = {{Automated question generation and question answering from Turkish texts}}, url = {https://journals.tubitak.gov.tr/elektrik/vol30/iss5/17/}, year = {2022} } ```install
```bash git clone https://github.com/obss/turkish-question-generation.git cd turkish-question-generation pip install -r requirements.txt ```train
- start a training using args: ```bash python run.py --model_name_or_path google/mt5-small --output_dir runs/exp1 --do_train --do_eval --tokenizer_name_or_path mt5_qg_tokenizer --per_device_train_batch_size 4 --gradient_accumulation_steps 2 --learning_rate 1e-4 --seed 42 --save_total_limit 1 ``` - download [json config](configs/default/config.json) file and start a training: ```bash python run.py config.json ``` - downlaod [yaml config](configs/default/config.yaml) file and start a training: ```bash python run.py config.yaml ```evaluate
- arrange related params in config: ```yaml do_train: false do_eval: true eval_dataset_list: ["tquad2-valid", "xquad.tr"] prepare_data: true mt5_task_list: ["qa", "qg", "ans_ext"] mt5_qg_format: "both" no_cuda: false ``` - start an evaluation: ```bash python run.py config.yaml ```neptune
- install neptune: ```bash pip install neptune-client ``` - download [config](configs/default/config.yaml) file and arrange neptune params: ```yaml run_name: 'exp1' neptune_project: 'name/project' neptune_api_token: 'YOUR_API_TOKEN' ``` - start a training: ```bash python train.py config.yaml ```wandb
- install wandb: ```bash pip install wandb ``` - download [config](configs/default/config.yaml) file and arrange wandb params: ```yaml run_name: 'exp1' wandb_project: 'turque' ``` - start a training: ```bash python train.py config.yaml ```finetuned checkpoints
|name |model |trainingdata |trained
tasks |model size
(GB) | |--- |--- |--- |--- |--- | |[mt5-small-3task-highlight-tquad2][model_url4] |[mt5-small][model_url2] |[tquad2-train][data_url1] |QA,QG,AnsExt |1.2GB | |[mt5-small-3task-prepend-tquad2][model_url6] |[mt5-small][model_url2] |[tquad2-train][data_url1] |QA,QG,AnsExt |1.2GB | |[mt5-small-3task-highlight-combined3][model_url7] |[mt5-small][model_url2] |[tquad2-train][data_url1]+[tquad2-valid][data_url2]+[xquad.tr][data_url3]|QA,QG,AnsExt |1.2GB | |[mt5-base-3task-highlight-tquad2][model_url5] |[mt5-base][model_url3] |[tquad2-train][data_url1] |QA,QG,AnsExt |2.3GB | |[mt5-base-3task-highlight-combined3][model_url8] |[mt5-base][model_url3] |[tquad2-train][data_url1]+[tquad2-valid][data_url2]+[xquad.tr][data_url3]|QA,QG,AnsExt |2.3GB |
format
- answer extraction: input: ``` "paper results
BERTurk-base and mT5-base QA evaluation results for TQuADv2 fine-tuning.
mT5-base QG evaluation results for single-task (ST) and multi-task (MT) for TQuADv2 fine-tuning.
TQuADv1 and TQuADv2 fine-tuning QG evaluation results for multi-task mT5 variants. MT-Both means, mT5 model is fine-tuned with ’Both’ input format and in a multi-task setting.
paper configs
You can find the config files used in the paper under [configs/paper](configs/paper).contributing
Before opening a PR: - Install required development packages: ```bash pip install "black==21.7b0" "flake8==3.9.2" "isort==5.9.2" ``` - Reformat with black and isort: ```bash black . --config pyproject.toml isort . ```Owner
- Name: Open Business Software Solutions
- Login: obss
- Kind: organization
- Email: rcm@obss.tech
- Location: Istanbul
- Website: https://obss.tech
- Twitter: obsstech
- Repositories: 13
- Profile: https://github.com/obss
Open Source for Open Business
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this package, please consider citing it."
authors:
- family-names: "Akyon"
given-names: "Fatih Cagatay"
orcid: "https://orcid.org/0000-0001-7098-3944"
- family-names: "Cavusoglu"
given-names: "Devrim"
orcid: "https://orcid.org/0000-0002-5218-1283"
- family-names: "Cengiz"
given-names: "Cemil"
orcid: "https://orcid.org/0000-0003-2681-5059"
- family-names: "Altinuc"
given-names: "Sinan Onur"
orcid: "https://orcid.org/0000-0001-5119-160X"
- family-names: "Temizel"
given-names: "Alptekin"
orcid: "https://orcid.org/0000-0001-6082-2573"
title: "Turkish Question Generation"
version: 2.0.4
doi: arXiv:2111.06476
date-released: 2021-11-11
url: "https://github.com/obss/turkish-question-generation"
preferred-citation:
type: article
title: Automated question generation and question answering from Turkish texts
doi: 10.3906/elk-1300-0632.3914
url: https://journals.tubitak.gov.tr/elektrik/vol30/iss5/17/
journal: Turkish Journal of Electrical Engineering and Computer Sciences
authors:
- family-names: "Akyon"
given-names: "Fatih Cagatay"
- family-names: "Cavusoglu"
given-names: "Ali Devrim Ekin"
- family-names: "Cengiz"
given-names: "Cemil"
- family-names: "Altinuc"
given-names: "Sinan Onur"
- family-names: "Temizel"
given-names: "Alptekin"
year: 2022
pages: 1931,1940
GitHub Events
Total
- Watch event: 4
- Member event: 1
Last Year
- Watch event: 4
- Member event: 1
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 1
- Total pull requests: 6
- Average time to close issues: less than a minute
- Average time to close pull requests: about 13 hours
- Total issue authors: 1
- Total pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 6
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- fcakyon (1)
Pull Request Authors
- fcakyon (6)
Top Labels
Issue Labels
Pull Request Labels
documentation (3)
enhancement (3)
Dependencies
environment.yml
pypi
- bert-score ==0.3.10
- black ==21.7b0
- datasets >=1.12.0,<2.0.0
- flake8 ==3.9.2
- gdown *
- isort ==5.9.2
- jupyterlab ==3.0.14
- jury >=2.1.0,<3.0.0
- protobuf >=3.17.3
- pysocks ==1.5.6
- pyyaml *
- rouge-score ==0.0.4
- sacrebleu ==1.5.1
- sentencepiece ==0.1.96
- transformers >=4.10.0,<5.0.0
- trtokenizer ==0.0.3
requirements.txt
pypi
- bert-score ==0.3.10
- click ==8.0.4
- datasets >=1.12.0,<2.0.0
- gdown *
- jury >=2.1.0,<3.0.0
- protobuf <=3.20.1
- protobuf >=3.17.3
- pysocks ==1.5.6
- pyyaml *
- rouge-score ==0.0.4
- sacrebleu ==1.5.1
- sentencepiece ==0.1.96
- torch ==1.10.0
- transformers >=4.10.0,<5.0.0
- trtokenizer ==0.0.3