https://github.com/awslabs/durepa-hybrid-qa

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.5%) to scientific vocabulary

Last synced: 9 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: awslabs
License: apache-2.0
Language: Python
Default Branch: main
Size: 44.9 KB

Statistics

Stars: 13
Watchers: 2
Forks: 1
Open Issues: 2
Releases: 0

Created over 4 years ago · Last pushed about 2 years ago

Metadata Files

Readme Contributing License Code of conduct

DuRePa: Dual Reader-Parser on Hybrid Textual and Tabular Evidence for Open Domain Question Answering

Code and model from our ACL 2021 paper.

Abstract

The current state-of-the-art generative models for open-domain question answering (ODQA) have focused on generating direct answers from unstructured textual information. However, a large amount of world's knowledge is stored in structured databases, and need to be accessed using query languages such as SQL. Furthermore, query languages can answer questions that require complex reasoning, as well as offering full explainability. In this paper, we propose a hybrid framework that takes both textual and tabular evidence as input and generates either direct answers or SQL queries depending on which form could better answer the question. The generated SQL queries can then be executed on the associated databases to obtain the final answers. To the best of our knowledge, this is the first paper that applies Text2SQL to ODQA tasks. Empirically, we demonstrate that on several ODQA datasets, the hybrid methods consistently outperforms the baseline models that only take homogeneous input by a large margin. Specifically we achieve state-of-the-art performance on OpenSQuAD dataset using a T5-base model. In a detailed analysis, we demonstrate that the being able to generate structural SQL queries can always bring gains, especially for those questions that requires complex reasoning.

Setup

conda create --name durepa python=3.7 source activate durepa conda install pytorch=1.6 cudatoolkit=11.0 -c pytorch pip install -r requirements.txt

Train model

python run_ranking.py

Inference

python run_inference.py

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Owner

Name: Amazon Web Services - Labs
Login: awslabs
Kind: organization
Location: Seattle, WA

Website: http://amazon.com/aws/
Repositories: 914
Profile: https://github.com/awslabs

AWS Labs

GitHub Events

Total

Watch event: 1

Last Year

Watch event: 1

Issues and Pull Requests

Last synced: about 2 years ago

All Time

Total issues: 1
Total pull requests: 2
Average time to close issues: N/A
Average time to close pull requests: 6 months
Total issue authors: 1
Total pull request authors: 1
Average comments per issue: 1.0
Average comments per pull request: 0.5
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 2

Past Year

Issues: 0
Pull requests: 2
Average time to close issues: N/A
Average time to close pull requests: 6 months
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.5
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 2

View more stats

Top Authors

Issue Authors

yeliu918 (1)

Pull Request Authors

dependabot[bot] (2)

Top Labels

Issue Labels

Pull Request Labels

dependencies (2)

Dependencies

requirements.txt pypi

jsonlines *
omegaconf ==2.0.5
pytorch-lightning ==1.1.4
transformers ==3.0.2
wandb *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science