https://github.com/hidadeng/wordexpansion

使用SO_PMI互信息算法、词向量法快速构建不同领域(手机、汽车等)的专业情感词典

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
○
Academic publication links
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (2.8%) to scientific vocabulary

Last synced: 8 months ago · JSON representation

Repository

使用SO_PMI互信息算法、词向量法快速构建不同领域(手机、汽车等)的专业情感词典

Basic Info

Host: GitHub
Owner: hiDaDeng
License: mit
Language: Python
Default Branch: master
Homepage:
Size: 3.13 MB

Statistics

Stars: 85
Watchers: 2
Forks: 17
Open Issues: 0
Releases: 0

Created about 6 years ago · Last pushed over 4 years ago

https://github.com/hiDaDeng/wordexpansion/blob/master/

wordexpansion[cntext](https://github.com/hidadeng/cntext)star




# 





- HowNet
- 



****

0************



() 






# 

- https://github.com/liuhuanyong/SentimentWordExpansion
- https://github.com/MS20190155/Measuring-Corporate-Culture-Using-Machine-Learning








## 2.1 

SO_PMI

- 
- 



- ********
- ********

https://github.com/liuhuanyong/SentimentWordExpansion 





## 2.2 

()

- 
- 

100cos



> Kai Li, Feng Mai, Rui Shen, Xinyan Yan, [**Measuring Corporate Culture Using Machine Learning**](https://academic.oup.com/rfs/advance-article-abstract/doi/10.1093/rfs/hhaa079/5869446?redirectedFrom=fulltext), *The Review of Financial Studies*, 2020
>
> github https://github.com/MS20190155/Measuring-Corporate-Culture-Using-Machine-Learning

github

- stanfordnlpwordexpansionjiebanltk

- word2vecNgramwordexpansionNgram

  











# 



,

```
pip3 install wordexpansion
```






# test

>****
>txtutf-8

```
|---test 
    |---
       |--find_newwords.py          #
       |--corpus1.txt               #5.5M
       |--test_seed_words.txt       #
       |--neg_candi.txt             #find_newwords.py
       |--pos_candi.txt             #find_newwords.py
       
    |---
       |--run_w2v.py                #
       |--corpus2.txt               #34M
       |--seeds                     #(5txt)
       |--model                     #word2vec()
       |--candidate_words           #5txt
       

```



# 

### 5.1 

1005050

**test_seed_words.txt**

- 
- negpos
- tab

```
	neg
	neg
	neg
	neg
	neg
	neg
	neg
	neg
	neg
	neg
	neg
	pos
	pos
	pos
	pos
	pos
	pos
	pos
	pos
	pos
	pos
...
...
```



### 5.2 

**wordexpansion****find_newwords.py**



```python
from wordexpansion import ChineseSoPmi

sopmier = ChineseSoPmi(inputtext_file='test_corpus.txt',
                       seedword_txtfile='test_seed_words.txt',
                       pos_candi_txt_file='pos_candi.txt',
                       neg_candi_txtfile='neg_candi.txt')
sopmier.sopmi()
```



**test_corpus.txt** 5.5M10060s



### 5.3 

**find_newwords.py****(find_newwords.py)**txt

- pos_candi.txt
- neg_candi.txt

**pos_candi.txt**, 

```
word,sopmi,polarity,word_length,postag
,87.28493062512524,pos,2,v
,70.15627986116269,pos,2,n
,66.28476448498694,pos,4,n
,64.40272795986517,pos,2,vn
,63.71800916752807,pos,2,df
,61.2024367757337,pos,2,n
,59.415315156715586,pos,2,n
,59.321140440512984,pos,1,f
,58.5817208758171,pos,2,v
,57.71720491331896,pos,2,vn
,57.067969337267684,pos,2,v
,53.25503772499689,pos,2,r
,52.80686380719989,pos,2,v
,52.12334472663675,pos,1,c
,51.58193211655792,pos,2,d
,51.095865548255034,pos,2,a
...
```

**neg_candi.txt**, 

```
word,sopmi,polarity,word_length,postag
,33.17993872989303,neg,2,n
,31.77900620939178,neg,2,f
,30.87839808390589,neg,2,ns
,29.594976229171877,neg,2,n
,29.47870186147108,neg,2,a
,27.86014637934966,neg,2,v
,27.27304813428452,neg,2,nr
,26.433136238404746,neg,2,n
,25.83859896903048,neg,2,v
,25.105021416064616,neg,2,d
,25.09148586460598,neg,2,vn
,24.48343281812743,neg,1,c
,22.20695894382675,neg,1,v
,22.041049266517774,neg,2,v
...
```





neg_candi.txtpos_candi.txt



# 

## 6.1 

txt

- innovation.txt
- integrity.txt
- quality.txt
- respect.txt
- teamwork.txt

txt



### 6.2 

**wordexpansion****run_w2v.py**



```python
from wordexpansion import W2VModels

from similarity import W2VModels
import pandas as pd
import os

#
model = W2VModels(cwd=os.getcwd())
model.train(documents=list(open('documents.txt').readlines()))

#
integrity = [w for w in open('seeds/integrity.txt').read().split('\n') if w!='']
innovation = [w for w in open('seeds/innovation.txt').read().split('\n') if w!='']
quality = [w for w in open('seeds/quality.txt').read().split('\n') if w!='']
respect = [w for w in open('seeds/respect.txt').read().split('\n') if w!='']
teamwork = [w for w in open('seeds/teamwork.txt').read().split('\n') if w!='']

#100
model.find(seedwords=integrity, seedwordsname='integrity', topn=100)
model.find(seedwords=innovation, seedwordsname='innovation', topn=100)
model.find(seedwords=quality, seedwordsname='quality', topn=100)
model.find(seedwords=respect, seedwordsname='respect', topn=100)
model.find(seedwords=teamwork, seedwordsname='teamwork', topn=100)

```



**30+** 30+M5030s




### 6.3 

**run_w2v.py****candidate_words**5txt

- innovation.txt
- integrity.txt
- quality.txt
- respect.txt
- teamwork.txt

**innovation.txt**, 

```
innovation
innovate
innovative
creativity
creative
create
passion
passionate
efficiency
efficient
excellence
pride
enhance
expertise
optimizing
adapt
capability
awareness
creating
value-added
optimize
leveraging
attract
innovative
manufacture
attracting
maximizing
fine-tune
enable
headquarter
platform
tightly
aligned
flexible
fulfillment
rationalize
back-office
...
```

**respect.txt**, 

```
respectful
talent
talented
employee
dignity
empowerment
empower
skills
backbone
training
database
designers
sdk
recruit
engine
dealers
selecting
resource
onsite
computer
functions
wholesalers
educational
expertise
coordination
value-added
...

```

txt

55txt








# 
1. so_pmi  

2. so_pmi1005M62.679  

3. PMI 

4. **txtutf-8**

5. NgramNgram 

6. **5**

7. > Kai Li, Feng Mai, Rui Shen, Xinyan Yan, Measuring Corporate Culture Using Machine Learning, *The Review of Financial Studies*,2020






# 

[python](https://ke.qq.com/course/482241?tuin=163164df)o(**)o

- python
- 
- 
- 
- 
- 

 [python](https://ke.qq.com/course/482241?tuin=163164df)~

[![](img/.png)](https://ke.qq.com/course/482241?tuin=163164df)



# 

- [B:python](https://space.bilibili.com/122592901/channel/detail?cid=66008)

- python

- [](https://zhuanlan.zhihu.com/dadeng)

![](img/Python.png)

Owner

Name: DaDeng
Login: hiDaDeng
Kind: user
Location: China

Website: https://hidadeng.github.io/
Repositories: 15
Profile: https://github.com/hiDaDeng

圣马家沟男子职业技术学院在读博士

GitHub Events

Total

Watch event: 7

Last Year

Watch event: 7

Committers

Last synced: 9 months ago

All Time

Total Commits: 31
Total Committers: 1
Avg Commits per committer: 31.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
DaDeng	t**t@q**m	31

Committer Domains (Top 20 + Academic)

qq.com: 1

Packages

Total packages: 1
Total downloads:
- pypi 18 last-month

Total dependent packages: 0
Total dependent repositories: 1
Total versions: 3
Total maintainers: 1

pypi.org: wordexpansion

Homepage: https://github.com/hidadeng/wordexpansion
Documentation: https://wordexpansion.readthedocs.io/
License: MIT
Latest release: 0.0.7
published about 6 years ago

Versions: 3
Dependent Packages: 0
Dependent Repositories: 1
Downloads: 18 Last month

Rankings

Stargazers count: 8.3%

Forks count: 8.9%

Dependent packages count: 10.1%

Average: 17.7%

Dependent repos count: 21.6%

Downloads: 39.6%

Maintainers (1)

thunderhit

Last synced: 9 months ago

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/hidadeng/wordexpansion

Science Score: 13.0%

Repository

Basic Info

Statistics

https://github.com/hiDaDeng/wordexpansion/blob/master/

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Packages

pypi.org: wordexpansion

Rankings

Maintainers (1)