https://github.com/hidadeng/wordexpansion

使用SO_PMI互信息算法、词向量法快速构建不同领域(手机、汽车等)的专业情感词典

https://github.com/hidadeng/wordexpansion

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (2.8%) to scientific vocabulary
Last synced: 6 months ago · JSON representation

Repository

使用SO_PMI互信息算法、词向量法快速构建不同领域(手机、汽车等)的专业情感词典

Basic Info
  • Host: GitHub
  • Owner: hiDaDeng
  • License: mit
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 3.13 MB
Statistics
  • Stars: 85
  • Watchers: 2
  • Forks: 17
  • Open Issues: 0
  • Releases: 0
Created about 6 years ago · Last pushed over 4 years ago

https://github.com/hiDaDeng/wordexpansion/blob/master/

wordexpansion[cntext](https://github.com/hidadeng/cntext)star


# - HowNet - **** 0************ ()
# - https://github.com/liuhuanyong/SentimentWordExpansion - https://github.com/MS20190155/Measuring-Corporate-Culture-Using-Machine-Learning
## 2.1 SO_PMI - - - ******** - ******** https://github.com/liuhuanyong/SentimentWordExpansion ## 2.2 () - - 100cos > Kai Li, Feng Mai, Rui Shen, Xinyan Yan, [**Measuring Corporate Culture Using Machine Learning**](https://academic.oup.com/rfs/advance-article-abstract/doi/10.1093/rfs/hhaa079/5869446?redirectedFrom=fulltext), *The Review of Financial Studies*, 2020 > > github https://github.com/MS20190155/Measuring-Corporate-Culture-Using-Machine-Learning github - stanfordnlpwordexpansionjiebanltk - word2vecNgramwordexpansionNgram
# , ``` pip3 install wordexpansion ```
# test >**** >txtutf-8 ``` |---test |--- |--find_newwords.py # |--corpus1.txt #5.5M |--test_seed_words.txt # |--neg_candi.txt #find_newwords.py |--pos_candi.txt #find_newwords.py |--- |--run_w2v.py # |--corpus2.txt #34M |--seeds #(5txt) |--model #word2vec() |--candidate_words #5txt ``` # ### 5.1 1005050 **test_seed_words.txt** - - negpos - tab ``` neg neg neg neg neg neg neg neg neg neg neg pos pos pos pos pos pos pos pos pos pos ... ... ``` ### 5.2 **wordexpansion****find_newwords.py** ```python from wordexpansion import ChineseSoPmi sopmier = ChineseSoPmi(inputtext_file='test_corpus.txt', seedword_txtfile='test_seed_words.txt', pos_candi_txt_file='pos_candi.txt', neg_candi_txtfile='neg_candi.txt') sopmier.sopmi() ``` **test_corpus.txt** 5.5M10060s ### 5.3 **find_newwords.py****(find_newwords.py)**txt - pos_candi.txt - neg_candi.txt **pos_candi.txt**, ``` word,sopmi,polarity,word_length,postag ,87.28493062512524,pos,2,v ,70.15627986116269,pos,2,n ,66.28476448498694,pos,4,n ,64.40272795986517,pos,2,vn ,63.71800916752807,pos,2,df ,61.2024367757337,pos,2,n ,59.415315156715586,pos,2,n ,59.321140440512984,pos,1,f ,58.5817208758171,pos,2,v ,57.71720491331896,pos,2,vn ,57.067969337267684,pos,2,v ,53.25503772499689,pos,2,r ,52.80686380719989,pos,2,v ,52.12334472663675,pos,1,c ,51.58193211655792,pos,2,d ,51.095865548255034,pos,2,a ... ``` **neg_candi.txt**, ``` word,sopmi,polarity,word_length,postag ,33.17993872989303,neg,2,n ,31.77900620939178,neg,2,f ,30.87839808390589,neg,2,ns ,29.594976229171877,neg,2,n ,29.47870186147108,neg,2,a ,27.86014637934966,neg,2,v ,27.27304813428452,neg,2,nr ,26.433136238404746,neg,2,n ,25.83859896903048,neg,2,v ,25.105021416064616,neg,2,d ,25.09148586460598,neg,2,vn ,24.48343281812743,neg,1,c ,22.20695894382675,neg,1,v ,22.041049266517774,neg,2,v ... ``` neg_candi.txtpos_candi.txt # ## 6.1 txt - innovation.txt - integrity.txt - quality.txt - respect.txt - teamwork.txt txt ### 6.2 **wordexpansion****run_w2v.py** ```python from wordexpansion import W2VModels from similarity import W2VModels import pandas as pd import os # model = W2VModels(cwd=os.getcwd()) model.train(documents=list(open('documents.txt').readlines())) # integrity = [w for w in open('seeds/integrity.txt').read().split('\n') if w!=''] innovation = [w for w in open('seeds/innovation.txt').read().split('\n') if w!=''] quality = [w for w in open('seeds/quality.txt').read().split('\n') if w!=''] respect = [w for w in open('seeds/respect.txt').read().split('\n') if w!=''] teamwork = [w for w in open('seeds/teamwork.txt').read().split('\n') if w!=''] #100 model.find(seedwords=integrity, seedwordsname='integrity', topn=100) model.find(seedwords=innovation, seedwordsname='innovation', topn=100) model.find(seedwords=quality, seedwordsname='quality', topn=100) model.find(seedwords=respect, seedwordsname='respect', topn=100) model.find(seedwords=teamwork, seedwordsname='teamwork', topn=100) ``` **30+** 30+M5030s
### 6.3 **run_w2v.py****candidate_words**5txt - innovation.txt - integrity.txt - quality.txt - respect.txt - teamwork.txt **innovation.txt**, ``` innovation innovate innovative creativity creative create passion passionate efficiency efficient excellence pride enhance expertise optimizing adapt capability awareness creating value-added optimize leveraging attract innovative manufacture attracting maximizing fine-tune enable headquarter platform tightly aligned flexible fulfillment rationalize back-office ... ``` **respect.txt**, ``` respectful talent talented employee dignity empowerment empower skills backbone training database designers sdk recruit engine dealers selecting resource onsite computer functions wholesalers educational expertise coordination value-added ... ``` txt 55txt
# 1. so_pmi 2. so_pmi1005M62.679 3. PMI 4. **txtutf-8** 5. NgramNgram 6. **5** 7. > Kai Li, Feng Mai, Rui Shen, Xinyan Yan, Measuring Corporate Culture Using Machine Learning, *The Review of Financial Studies*,2020
# [python](https://ke.qq.com/course/482241?tuin=163164df)o(**)o - python - - - - - [python](https://ke.qq.com/course/482241?tuin=163164df)~ [![](img/.png)](https://ke.qq.com/course/482241?tuin=163164df) # - [B:python](https://space.bilibili.com/122592901/channel/detail?cid=66008) - python - [](https://zhuanlan.zhihu.com/dadeng) ![](img/Python.png)

Owner

  • Name: DaDeng
  • Login: hiDaDeng
  • Kind: user
  • Location: China

圣 马家沟男子职业技术学院在读博士

GitHub Events

Total
  • Watch event: 7
Last Year
  • Watch event: 7

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 31
  • Total Committers: 1
  • Avg Commits per committer: 31.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
DaDeng t****t@q****m 31
Committer Domains (Top 20 + Academic)
qq.com: 1

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 18 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 3
  • Total maintainers: 1
pypi.org: wordexpansion
  • Versions: 3
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 18 Last month
Rankings
Stargazers count: 8.3%
Forks count: 8.9%
Dependent packages count: 10.1%
Average: 17.7%
Dependent repos count: 21.6%
Downloads: 39.6%
Maintainers (1)
Last synced: 7 months ago