https://github.com/artificialzeng/e-commerce-search-recall

天池阿里灵杰问天引擎电商搜索算法赛非官方 baseline，又名 NLP 从入门到 22/2771。

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (4.3%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

天池阿里灵杰问天引擎电商搜索算法赛非官方 baseline，又名 NLP 从入门到 22/2771。

Basic Info

Host: GitHub
Owner: ArtificialZeng
Default Branch: main
Homepage:
Size: 1.77 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Fork of muyuuuu/E-commerce-Search-Recall

Created about 4 years ago · Last pushed about 4 years ago

https://github.com/ArtificialZeng/E-commerce-Search-Recall/blob/main/

# 

![](docs/0.png)

## 

 query title10w  query-title   90w  title [](https://github.com/Alibaba-NLP/Multi-CPR/tree/main/data/ecom) query title

```
	M27Q27KVM0.5ms170Hz2K
	2kg
	
	
	1312-1514
	
125	LX125/TK120120
	
```

-  query  title  query 100w  titleMRR@10 
-  query  10  title query  10  title MRR@10 

 NLP NLP  10 `main.py`  `run.sh`  debug

## 

 baseline 

1.  DSSM baseline 0.057
2.  CoSENT 0.159
3.  SimCSE 0.227

tools 

## Trick

1.  model.py  first-last-avg  0.22  0.245  0.25
    Details

    ```py
    def forward(self, input_ids, attention_mask, token_type_ids):
        out = self.extractor(input_ids,
                             attention_mask=attention_mask,
                             token_type_ids=token_type_ids,
                             output_hidden_states=True)

        first = out.hidden_states[1].transpose(1, 2)
        last = out.hidden_states[-1].transpose(1, 2)
        first_avg = torch.avg_pool1d(
            first, kernel_size=last.shape[-1]).squeeze(-1)  # [batch, 768]
        last_avg = torch.avg_pool1d(last, kernel_size=last.shape[-1]).squeeze(
            -1)  # [batch, 768]
        avg = torch.cat((first_avg.unsqueeze(1), last_avg.unsqueeze(1)),
                        dim=1)  # [batch, 2, 768]
        out = torch.avg_pool1d(avg.transpose(1, 2), kernel_size=2).squeeze(-1)
        x = self.fc(out)
        x = F.normalize(x, p=2, dim=-1)
        return x
     ```

    


2.  `unilm`  `UniLM`  100w title [](https://github.com/muyuuuu/E-commerce-Search-Recall/blob/main/unilm/utils_unilm.py#L268-L282) 0.265 : [YunwenTechnology/Unilm](https://github.com/YunwenTechnology/Unilm)

3.  `simbert`  `simbertv2`  bart  100w title  0.3  query 0.31 [ZhuiyiTechnology/roformer-sim](https://github.com/ZhuiyiTechnology/roformer-sim)

 tricktransfer-mixuplabel-smootham-softmax [EASE: Entity-Aware Contrastive Learning of Sentence Embedding](https://github.com/studio-ousia/ease)[Dense Passage Retrieval](https://github.com/facebookresearch/DPR) [Embedding-based Retrieval in Facebook Search](https://arxiv.org/abs/2006.11632) 

## 

 `rank`  `tf1.x`  baseline basline `tf1.x` 

`pair-wise`  NSP 

# 

- [CoSENT ](https://github.com/shawroad/CoSENT_Pytorch)
- [SimCSE ](https://github.com/zhengyanzhao1997/NLP-model/tree/main/model/model/Torch_model/SimCSE-Chinese)
- [UniLM](https://github.com/YunwenTechnology/Unilm)
- [SimBERTv2](https://github.com/ZhuiyiTechnology/roformer-sim)
- [tensorflow-simcse](https://github.com/jifei/simcse-tf2)

# 

 LSH  ANN [](https://github.com/muyuuuu/high-performance-LSH) star

# 

- . Supported by High-performance Computing Platform of XiDian University.
- [](https://github.com/xzhws)
-  [DunZhang](https://github.com/DunZhang)

Owner

Name: Dr. Artificial曾小健
Login: ArtificialZeng
Kind: user
Location: Beijing

Website: https://blog.csdn.net/sinat_37574187?type=blog
Repositories: 171
Profile: https://github.com/ArtificialZeng

LLM practitioner/engineer, AI/ML/DL Quant

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/artificialzeng/e-commerce-search-recall

Science Score: 10.0%

Repository

Basic Info

Statistics

https://github.com/ArtificialZeng/E-commerce-Search-Recall/blob/main/

Owner

GitHub Events

Total

Last Year