https://github.com/hidadeng/wordexpansion
使用SO_PMI互信息算法、词向量法快速构建不同领域(手机、汽车等)的专业情感词典
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 1 DOI reference(s) in README -
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (2.8%) to scientific vocabulary
Last synced: 6 months ago
·
JSON representation
Repository
使用SO_PMI互信息算法、词向量法快速构建不同领域(手机、汽车等)的专业情感词典
Basic Info
Statistics
- Stars: 85
- Watchers: 2
- Forks: 17
- Open Issues: 0
- Releases: 0
Created about 6 years ago
· Last pushed over 4 years ago
https://github.com/hiDaDeng/wordexpansion/blob/master/
wordexpansion[cntext](https://github.com/hidadeng/cntext)star
# - HowNet - **** 0************ ()
# - https://github.com/liuhuanyong/SentimentWordExpansion - https://github.com/MS20190155/Measuring-Corporate-Culture-Using-Machine-Learning
## 2.1 SO_PMI - - - ******** - ******** https://github.com/liuhuanyong/SentimentWordExpansion ## 2.2 () - - 100cos > Kai Li, Feng Mai, Rui Shen, Xinyan Yan, [**Measuring Corporate Culture Using Machine Learning**](https://academic.oup.com/rfs/advance-article-abstract/doi/10.1093/rfs/hhaa079/5869446?redirectedFrom=fulltext), *The Review of Financial Studies*, 2020 > > github https://github.com/MS20190155/Measuring-Corporate-Culture-Using-Machine-Learning github - stanfordnlpwordexpansionjiebanltk - word2vecNgramwordexpansionNgram
# , ``` pip3 install wordexpansion ```
# test >**** >txtutf-8 ``` |---test |--- |--find_newwords.py # |--corpus1.txt #5.5M |--test_seed_words.txt # |--neg_candi.txt #find_newwords.py |--pos_candi.txt #find_newwords.py |--- |--run_w2v.py # |--corpus2.txt #34M |--seeds #(5txt) |--model #word2vec() |--candidate_words #5txt ``` # ### 5.1 1005050 **test_seed_words.txt** - - negpos - tab ``` neg neg neg neg neg neg neg neg neg neg neg pos pos pos pos pos pos pos pos pos pos ... ... ``` ### 5.2 **wordexpansion****find_newwords.py** ```python from wordexpansion import ChineseSoPmi sopmier = ChineseSoPmi(inputtext_file='test_corpus.txt', seedword_txtfile='test_seed_words.txt', pos_candi_txt_file='pos_candi.txt', neg_candi_txtfile='neg_candi.txt') sopmier.sopmi() ``` **test_corpus.txt** 5.5M10060s ### 5.3 **find_newwords.py****(find_newwords.py)**txt - pos_candi.txt - neg_candi.txt **pos_candi.txt**, ``` word,sopmi,polarity,word_length,postag ,87.28493062512524,pos,2,v ,70.15627986116269,pos,2,n ,66.28476448498694,pos,4,n ,64.40272795986517,pos,2,vn ,63.71800916752807,pos,2,df ,61.2024367757337,pos,2,n ,59.415315156715586,pos,2,n ,59.321140440512984,pos,1,f ,58.5817208758171,pos,2,v ,57.71720491331896,pos,2,vn ,57.067969337267684,pos,2,v ,53.25503772499689,pos,2,r ,52.80686380719989,pos,2,v ,52.12334472663675,pos,1,c ,51.58193211655792,pos,2,d ,51.095865548255034,pos,2,a ... ``` **neg_candi.txt**, ``` word,sopmi,polarity,word_length,postag ,33.17993872989303,neg,2,n ,31.77900620939178,neg,2,f ,30.87839808390589,neg,2,ns ,29.594976229171877,neg,2,n ,29.47870186147108,neg,2,a ,27.86014637934966,neg,2,v ,27.27304813428452,neg,2,nr ,26.433136238404746,neg,2,n ,25.83859896903048,neg,2,v ,25.105021416064616,neg,2,d ,25.09148586460598,neg,2,vn ,24.48343281812743,neg,1,c ,22.20695894382675,neg,1,v ,22.041049266517774,neg,2,v ... ``` neg_candi.txtpos_candi.txt # ## 6.1 txt - innovation.txt - integrity.txt - quality.txt - respect.txt - teamwork.txt txt ### 6.2 **wordexpansion****run_w2v.py** ```python from wordexpansion import W2VModels from similarity import W2VModels import pandas as pd import os # model = W2VModels(cwd=os.getcwd()) model.train(documents=list(open('documents.txt').readlines())) # integrity = [w for w in open('seeds/integrity.txt').read().split('\n') if w!=''] innovation = [w for w in open('seeds/innovation.txt').read().split('\n') if w!=''] quality = [w for w in open('seeds/quality.txt').read().split('\n') if w!=''] respect = [w for w in open('seeds/respect.txt').read().split('\n') if w!=''] teamwork = [w for w in open('seeds/teamwork.txt').read().split('\n') if w!=''] #100 model.find(seedwords=integrity, seedwordsname='integrity', topn=100) model.find(seedwords=innovation, seedwordsname='innovation', topn=100) model.find(seedwords=quality, seedwordsname='quality', topn=100) model.find(seedwords=respect, seedwordsname='respect', topn=100) model.find(seedwords=teamwork, seedwordsname='teamwork', topn=100) ``` **30+** 30+M5030s
### 6.3 **run_w2v.py****candidate_words**5txt - innovation.txt - integrity.txt - quality.txt - respect.txt - teamwork.txt **innovation.txt**, ``` innovation innovate innovative creativity creative create passion passionate efficiency efficient excellence pride enhance expertise optimizing adapt capability awareness creating value-added optimize leveraging attract innovative manufacture attracting maximizing fine-tune enable headquarter platform tightly aligned flexible fulfillment rationalize back-office ... ``` **respect.txt**, ``` respectful talent talented employee dignity empowerment empower skills backbone training database designers sdk recruit engine dealers selecting resource onsite computer functions wholesalers educational expertise coordination value-added ... ``` txt 55txt
# 1. so_pmi 2. so_pmi1005M62.679 3. PMI 4. **txtutf-8** 5. NgramNgram 6. **5** 7. > Kai Li, Feng Mai, Rui Shen, Xinyan Yan, Measuring Corporate Culture Using Machine Learning, *The Review of Financial Studies*,2020
# [python](https://ke.qq.com/course/482241?tuin=163164df)o(**)o - python - - - - - [python](https://ke.qq.com/course/482241?tuin=163164df)~ [](https://ke.qq.com/course/482241?tuin=163164df) # - [B:python](https://space.bilibili.com/122592901/channel/detail?cid=66008) - python - [](https://zhuanlan.zhihu.com/dadeng) 
Owner
- Name: DaDeng
- Login: hiDaDeng
- Kind: user
- Location: China
- Website: https://hidadeng.github.io/
- Repositories: 15
- Profile: https://github.com/hiDaDeng
圣 马家沟男子职业技术学院在读博士
GitHub Events
Total
- Watch event: 7
Last Year
- Watch event: 7
Committers
Last synced: 7 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| DaDeng | t****t@q****m | 31 |
Committer Domains (Top 20 + Academic)
qq.com: 1
Packages
- Total packages: 1
-
Total downloads:
- pypi 18 last-month
- Total dependent packages: 0
- Total dependent repositories: 1
- Total versions: 3
- Total maintainers: 1
pypi.org: wordexpansion
- Homepage: https://github.com/hidadeng/wordexpansion
- Documentation: https://wordexpansion.readthedocs.io/
- License: MIT
-
Latest release: 0.0.7
published about 6 years ago
Rankings
Stargazers count: 8.3%
Forks count: 8.9%
Dependent packages count: 10.1%
Average: 17.7%
Dependent repos count: 21.6%
Downloads: 39.6%
Maintainers (1)
Last synced:
7 months ago