https://github.com/artificialzeng/e-commerce-search-recall

天池阿里灵杰问天引擎电商搜索算法赛非官方 baseline,又名 NLP 从入门到 22/2771。

https://github.com/artificialzeng/e-commerce-search-recall

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (4.3%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

天池阿里灵杰问天引擎电商搜索算法赛非官方 baseline,又名 NLP 从入门到 22/2771。

Basic Info
  • Host: GitHub
  • Owner: ArtificialZeng
  • Default Branch: main
  • Homepage:
  • Size: 1.77 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Fork of muyuuuu/E-commerce-Search-Recall
Created about 4 years ago · Last pushed about 4 years ago

https://github.com/ArtificialZeng/E-commerce-Search-Recall/blob/main/

# 

![](docs/0.png)

## 

 query title10w  query-title   90w  title [](https://github.com/Alibaba-NLP/Multi-CPR/tree/main/data/ecom) query title

```
	M27Q27KVM0.5ms170Hz2K
	2kg
	
	
	1312-1514
	
125	LX125/TK120120
	
```

-  query  title  query 100w  titleMRR@10 
-  query  10  title query  10  title MRR@10 

 NLP NLP  10 `main.py`  `run.sh`  debug

## 

 baseline 

1.  DSSM baseline 0.057
2.  CoSENT 0.159
3.  SimCSE 0.227

tools 

## Trick

1.  model.py  first-last-avg  0.22  0.245  0.25
    
Details ```py def forward(self, input_ids, attention_mask, token_type_ids): out = self.extractor(input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids, output_hidden_states=True) first = out.hidden_states[1].transpose(1, 2) last = out.hidden_states[-1].transpose(1, 2) first_avg = torch.avg_pool1d( first, kernel_size=last.shape[-1]).squeeze(-1) # [batch, 768] last_avg = torch.avg_pool1d(last, kernel_size=last.shape[-1]).squeeze( -1) # [batch, 768] avg = torch.cat((first_avg.unsqueeze(1), last_avg.unsqueeze(1)), dim=1) # [batch, 2, 768] out = torch.avg_pool1d(avg.transpose(1, 2), kernel_size=2).squeeze(-1) x = self.fc(out) x = F.normalize(x, p=2, dim=-1) return x ```
2. `unilm` `UniLM` 100w title [](https://github.com/muyuuuu/E-commerce-Search-Recall/blob/main/unilm/utils_unilm.py#L268-L282) 0.265 : [YunwenTechnology/Unilm](https://github.com/YunwenTechnology/Unilm) 3. `simbert` `simbertv2` bart 100w title 0.3 query 0.31 [ZhuiyiTechnology/roformer-sim](https://github.com/ZhuiyiTechnology/roformer-sim) tricktransfer-mixuplabel-smootham-softmax [EASE: Entity-Aware Contrastive Learning of Sentence Embedding](https://github.com/studio-ousia/ease)[Dense Passage Retrieval](https://github.com/facebookresearch/DPR) [Embedding-based Retrieval in Facebook Search](https://arxiv.org/abs/2006.11632) ## `rank` `tf1.x` baseline basline `tf1.x` `pair-wise` NSP # - [CoSENT ](https://github.com/shawroad/CoSENT_Pytorch) - [SimCSE ](https://github.com/zhengyanzhao1997/NLP-model/tree/main/model/model/Torch_model/SimCSE-Chinese) - [UniLM](https://github.com/YunwenTechnology/Unilm) - [SimBERTv2](https://github.com/ZhuiyiTechnology/roformer-sim) - [tensorflow-simcse](https://github.com/jifei/simcse-tf2) # LSH ANN [](https://github.com/muyuuuu/high-performance-LSH) star # - . Supported by High-performance Computing Platform of XiDian University. - [](https://github.com/xzhws) - [DunZhang](https://github.com/DunZhang)

Owner

  • Name: Dr. Artificial曾小健
  • Login: ArtificialZeng
  • Kind: user
  • Location: Beijing

LLM practitioner/engineer, AI/ML/DL Quant

GitHub Events

Total
Last Year