https://github.com/boostcampaitech5/level2_nlp_mrc-nlp-11

level2_nlp_mrc-nlp-11 created by GitHub Classroom

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (4.8%) to scientific vocabulary

Last synced: 9 months ago · JSON representation

Repository

level2_nlp_mrc-nlp-11 created by GitHub Classroom

Basic Info

Host: GitHub
Owner: boostcampaitech5
Language: Python
Default Branch: main
Size: 119 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 5
Releases: 0

Created about 3 years ago · Last pushed almost 3 years ago

https://github.com/boostcampaitech5/level2_nlp_mrc-nlp-11/blob/main/

# Open-Domain Question Answering
> Boostcamp AI Tech 5 Level 2   

## Leader Board



![image](https://github.com/boostcampaitech5/level2_nlp_mrc-nlp-11/assets/95160680/a3462e23-b732-4127-8f73-2e6b8d9e7856)




## Outline

: **Linking MRC and Retrieval**

![image](https://github.com/boostcampaitech5/level2_nlp_mrc-nlp-11/assets/95160680/1fe5ad53-d143-487d-a260-029c60343539)


- **ODQA:**          Knowledge Source     Retriever          Reader  .
- Query(input):  GDP   ?
 Retriever Model    A, B, C    Reader Model 
Answer(output):  4.

### A.  

-   EM F1 Score    EM   F1 Score .
- **Exact Match (EM)**:         .    0  1 .         .
- **F1 Score**: EM    .  ,  "Barack Obama"  "Barak Hussein Obama II" , EM  0  F1 Score      .

### B. 

||||||
|:-:|:-:|:-:|:-:|:-:|
||||||
|[](https://github.com/line1029)|[](https://github.com/Minwoo0206)|[](https://github.com/jaekwanyda)|[](https://github.com/wjdals3406)|[](https://github.com/jiho-hong)|

### C. 

|    |   |
| --- | --- |
|  | Elasticsearch   BM25 , Hard Negative Sampling |
|  | BM25+CE, Negative Sampling  |
|  | Negative Sampling, KorQuad   Fine-tuning |
|  |   , Curriculum Learning   Fine-tuning |
|  | Elasticsearch   BM25 , Fine-tuning  |

### D. Skill

- PyTorch
- Hugging Face
- Elasticsearch

## Structure

```
level2_nlp_mrc-nlp-11
|-- README.md
|-- code
|   |-- arguments.py
|   |-- config.yaml
|   |-- curriculum_learning.py
|   |-- evaluation.py
|   |-- inference.py
|   |-- negative_sampling.py
|   |-- retrieval.py
|   |-- sweep.py
|   |-- sweep.yaml
|   |-- train.py
|   |-- trainer_qa.py
|   `-- utils_qa.py
|-- data
|-- elasticsearch
|   `-- README.md
`-- requirements.txt
```

- yaml     ,   train.py,     inference.py .

## Data (EDA)

### A. ** **

|  |  ( ) |  |   |
| --- | --- | --- | --- |
| train dataset | train(3952)
 validation(240) |  |   
(id, question, context, answers, document_id, title) |
| test_dataset | public(240)
 private(360) |  | id, question  |

### B. Context Length 

![image](https://github.com/boostcampaitech5/level2_nlp_mrc-nlp-11/assets/95160680/10f13527-7cbd-4189-9f13-589483bf0846)

### C. Question Length 

![image](https://github.com/boostcampaitech5/level2_nlp_mrc-nlp-11/assets/95160680/773b8ca9-71f1-40ab-addf-45787286ed2a)


### D. Answers Length 

![image](https://github.com/boostcampaitech5/level2_nlp_mrc-nlp-11/assets/95160680/db86e37c-2c59-4007-8b28-ac9025a2b78a)


## Retrieval Model

### A. Baseline: TF-IDF

-                
- TF-IDF TF IDF               .
- Term Frequency (TF):  
- Inverse Document Frequency (IDF):   

### B. Elasticsearch BM25

- [Elasticsearch](https://www.elastic.co/kr/elasticsearch/) Apache Lucene     , Okapi BM25, DFR       .    ,     BM25 .

- **BM25**
    - TF-IDF  ,   
    - TF      
    -              

### C. Performance check

#### Hit@k

- k    Positive Passage   1,   0  .

![image](https://github.com/boostcampaitech5/level2_nlp_mrc-nlp-11/assets/95160680/21fb00ee-2ddd-472a-b07f-33ddcbd81777)

![image](https://github.com/boostcampaitech5/level2_nlp_mrc-nlp-11/assets/95160680/5415207f-469b-4958-ae87-6ed3de93b2e1)

## Reader Model

Reader   Retrieval     ,  top-k 10 .

### A. Model Selection

     klue/roberta-large         .

|  | EM | F1 | Retrieval Model |
| --- | --- | --- | --- |
| klue/bert-base | 35.4200 | 48.4100 | TF-IDF |
| klue/roberta-large | 42.0800 | 53.1700 | TF-IDF |
| xlm-roberta-large | 35.4200 | 44.0600 | TF-IDF |
| monologg/koelectra-base-v3-finetuned-korquad | 37.5000 | 42.2600 | TF-IDF |

           . 

### B. Training Strategy

#### **1) KorQuad Fine-tuning**

- KorQuad   data augmentation  negative sampling  . KorQuad    v1.0 training dataset     v1.0 .
-   KLUE  KorQuad       .
    
    
    |  | EM(lb) | F1(lb) | retrieval |
    | --- | --- | --- | --- |
    | Baseline | 46.2505 | 55.3097 | BM25 |
    | Baseline(Augmentation) | 45.8333 | 53.7075 | BM25 |
-        context        .
    
  ![image](https://github.com/boostcampaitech5/level2_nlp_mrc-nlp-11/assets/95160680/2a3ff0bc-0cb4-4f8e-b311-409ff852e200)

    
-            .     KorQuad 1 Fine-tuning    Train   Fine-tuning       .
    
    
    |  | EM | F1 | retrieval |
    | --- | --- | --- | --- |
    | Baseline(1 finetuning) | 50.0 | 59.4816 | BM25 |
    | Baseline(2 fientuning) | 56.25 | 64.7707 | BM25 |

#### 2) Curriculum Learning

- curriculum learning         ,                .
-                .
- curriculum dataset   
    - KLUE     KorQuad  Reader   Train   F1 Score .
    -  F1 Score     5  .
    -         5    .
        
        
        |  | EM | F1 |
        | --- | --- | --- |
        | 1 KorQuad
2 KLUE | 59.1667 | 67.1318 |
        | 1 KorQuad
2 Curriculum | 56.25 | 63.5327 |

#### 3) Negative Sampling

- DPR(Dense Passage Retrieval)    ,         Hard Negative Sampling    .
-   , Hard Negative Sampling Reader      .                .      .
- KorQuad:  title   context 
- KLUE: BM25  20 hard negative context 
    
    
    |  | EM | F1 |
    | --- | --- | --- |
    | 1: KorQuad
2: KLUE
3: KLUE(negative) | 63.3300 | 73.6100 |
    | 1: KorQuad(negative) 
2: KLUE 
3: KLUE(negative) | 61.6667 | 70.4874 |

### C. Hyperparmeter Tuning

wandb sweep   hyperparameter  .

#### 1) Hyperparameter list

- learning rate
- epochs
- batch size
- warmup ratio

### D. Ensemble

            hard voting  soft voting .  soft voting  EM Score    .

## Result

  

|  | EM | F1 |
| --- | --- | --- |
| Public Score | 67.0800 | 77.0000 |
| Private Score | 65.8300 | 77.8700 |

##      

### A. BM25 + CE

-  [BEIR(Takur et al., 2021)](https://arxiv.org/pdf/2104.08663.pdf)  BM25 Cross Encoder  Re-ranking   Retriever    .
-  , BM25    $k_x$    Cross Encoder  Re-ranking    $k_y\ \ (y < x)$  Reader Model .
- Cross Encoder Fine-tuning   BM25 + CE  Hit@k  BM25   , BM25         .

##  

### A.  

-                 .
-      .    ,     ,      .                      .          .
- Pytorch  PytorchLightning  Train    Huggingface          . ,     Huggingface trainer source code    .

### B.  

-    .                   .                  .
-           .           .      ,         .
-         . Dense Retrieval Model, ODQA Task  SOTA          .             .

## Reference

[1] Bengio, Y., Louradour, J., Collobert, R., & Weston, J. (2009, June). Curriculum learning. InProceedings of the 26th annual international conference on machine learning

[2] Kedia, A., Zaidi, M. A., & Lee, H. (2022). FiE: Building a Global Probability Space by Leveraging Early Fusion in Encoder for Open-Domain Question Answering.*arXiv preprint arXiv:2211.10147*.

[3] Thakur, N., Reimers, N., Rckl, A., Srivastava, A., & Gurevych, I. (2021). BEIR: A heterogenous benchmark for zero-shot evaluation of information retrieval models.*arXiv preprint arXiv:2104.08663*.