https://github.com/artificialzeng/e-commerce-search-recall
天池阿里灵杰问天引擎电商搜索算法赛非官方 baseline,又名 NLP 从入门到 22/2771。
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (4.3%) to scientific vocabulary
Last synced: 10 months ago
·
JSON representation
Repository
天池阿里灵杰问天引擎电商搜索算法赛非官方 baseline,又名 NLP 从入门到 22/2771。
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Fork of muyuuuu/E-commerce-Search-Recall
Created about 4 years ago
· Last pushed about 4 years ago
https://github.com/ArtificialZeng/E-commerce-Search-Recall/blob/main/
#

##
query title10w query-title 90w title [](https://github.com/Alibaba-NLP/Multi-CPR/tree/main/data/ecom) query title
```
M27Q27KVM0.5ms170Hz2K
2kg
1312-1514
125 LX125/TK120120
```
- query title query 100w titleMRR@10
- query 10 title query 10 title MRR@10
NLP NLP 10 `main.py` `run.sh` debug
##
baseline
1. DSSM baseline 0.057
2. CoSENT 0.159
3. SimCSE 0.227
tools
## Trick
1. model.py first-last-avg 0.22 0.245 0.25
Details
```py
def forward(self, input_ids, attention_mask, token_type_ids):
out = self.extractor(input_ids,
attention_mask=attention_mask,
token_type_ids=token_type_ids,
output_hidden_states=True)
first = out.hidden_states[1].transpose(1, 2)
last = out.hidden_states[-1].transpose(1, 2)
first_avg = torch.avg_pool1d(
first, kernel_size=last.shape[-1]).squeeze(-1) # [batch, 768]
last_avg = torch.avg_pool1d(last, kernel_size=last.shape[-1]).squeeze(
-1) # [batch, 768]
avg = torch.cat((first_avg.unsqueeze(1), last_avg.unsqueeze(1)),
dim=1) # [batch, 2, 768]
out = torch.avg_pool1d(avg.transpose(1, 2), kernel_size=2).squeeze(-1)
x = self.fc(out)
x = F.normalize(x, p=2, dim=-1)
return x
```
2. `unilm` `UniLM` 100w title [](https://github.com/muyuuuu/E-commerce-Search-Recall/blob/main/unilm/utils_unilm.py#L268-L282) 0.265 : [YunwenTechnology/Unilm](https://github.com/YunwenTechnology/Unilm)
3. `simbert` `simbertv2` bart 100w title 0.3 query 0.31 [ZhuiyiTechnology/roformer-sim](https://github.com/ZhuiyiTechnology/roformer-sim)
tricktransfer-mixuplabel-smootham-softmax [EASE: Entity-Aware Contrastive Learning of Sentence Embedding](https://github.com/studio-ousia/ease)[Dense Passage Retrieval](https://github.com/facebookresearch/DPR) [Embedding-based Retrieval in Facebook Search](https://arxiv.org/abs/2006.11632)
##
`rank` `tf1.x` baseline basline `tf1.x`
`pair-wise` NSP
#
- [CoSENT ](https://github.com/shawroad/CoSENT_Pytorch)
- [SimCSE ](https://github.com/zhengyanzhao1997/NLP-model/tree/main/model/model/Torch_model/SimCSE-Chinese)
- [UniLM](https://github.com/YunwenTechnology/Unilm)
- [SimBERTv2](https://github.com/ZhuiyiTechnology/roformer-sim)
- [tensorflow-simcse](https://github.com/jifei/simcse-tf2)
#
LSH ANN [](https://github.com/muyuuuu/high-performance-LSH) star
#
- . Supported by High-performance Computing Platform of XiDian University.
- [](https://github.com/xzhws)
- [DunZhang](https://github.com/DunZhang)
Owner
- Name: Dr. Artificial曾小健
- Login: ArtificialZeng
- Kind: user
- Location: Beijing
- Website: https://blog.csdn.net/sinat_37574187?type=blog
- Repositories: 171
- Profile: https://github.com/ArtificialZeng
LLM practitioner/engineer, AI/ML/DL Quant