senta

Baidu's open-source Sentiment Analysis System.

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.3%) to scientific vocabulary

Keywords

aspect-level-sentiment natural-language-processing opinion-target-extraction paddlepaddle sentiment-analysis sentiment-classification

Keywords from Contributors

interactive serializer packaging network-simulation shellcodes hacking autograding observability genomics embedded

Last synced: 6 months ago · JSON representation

Repository

Baidu's open-source Sentiment Analysis System.

Basic Info

Host: GitHub
Owner: baidu
License: apache-2.0
Language: Python
Default Branch: master
Homepage:
Size: 26 MB

Statistics

Stars: 1,981
Watchers: 60
Forks: 370
Open Issues: 74
Releases: 0

Topics

aspect-level-sentiment natural-language-processing opinion-target-extraction paddlepaddle sentiment-analysis sentiment-classification

Created over 7 years ago · Last pushed over 1 year ago

Metadata Files

Readme License Authors

`Senta`

Senta is a python library for many sentiment analysis tasks. It contains support for running multiple tasks such as sentence-level sentiment classification, aspect-level sentiment classification and opinion role labeling. The bulk of the code in this repository is used to implement SKEP: Sentiment Knowledge Enhanced Pre-training for Sentiment Analysis. In the paper, we demonstrate how to integrate sentiment knowledge into pre-trained models to learn a unified sentiment representation for multiple sentiment analysis tasks.

How to use

Pip

You can directly use the Python package to predict sentiment analysis tasks by loading a pre-trained SKEP model.

Installation

Senta supports Python 3.6 or later. This repository requires PaddlePaddle 1.6.3, please see here for installaton instruction.
Install Senta

shell python -m pip install Senta or

shell git clone https://github.com/baidu/Senta.git cd Senta python -m pip install .

Quick Tour

```python
from senta import Senta
my_senta = Senta()

# get pre-trained model, we provide three pre-trained models, all of which are based on the SKEP
print(my_senta.get_support_model()) # ["ernie_1.0_skep_large_ch", "ernie_2.0_skep_large_en", "roberta_skep_large_en"]
                                    # ernie_1.0_skep_large_ch, skep Chinese pre-trained model based on ERNIE 1.0 large.
                                    # ernie_2.0_skep_large_en, skep English pre-trained model based on ERNIE 2.0 large.
                                    # roberta_skep_large_en, skep English pre-trained model based on RoBERTa large, which is used in our paper.

# get supported task
print(my_senta.get_support_task()) # ["sentiment_classify", "aspect_sentiment_classify", "extraction"]

use_cuda = True # set True or False

# predict different tasks
my_senta.init_model(model_class="roberta_skep_large_en", task="sentiment_classify", use_cuda=use_cuda)
texts = ["a sometimes tedious film ."]
result = my_senta.predict(texts)
print(result)

my_senta.init_model(model_class="roberta_skep_large_en", task="aspect_sentiment_classify", use_cuda=use_cuda)
texts = ["I love the operating system and the preloaded software."]
aspects = ["operating system"]
result = my_senta.predict(texts, aspects)
print(result)

my_senta.init_model(model_class="roberta_skep_large_en", task="extraction", use_cuda=use_cuda)
texts = ["The JCC would be very pleased to welcome your organization as a corporate sponsor ."]
result = my_senta.predict(texts)
print(result)
```

From source

You can use the source code to run pre-training and fine-tuning tasks. The config folder has different files to help you reproduce the results of our paper.

Preparation

```shell
# download code
git clone https://github.com/baidu/Senta.git

# download a pre-trained skep model
cd ./Senta/model_files
sh download_roberta_skep_large_en.sh # download roberta_skep_large_en model. For other pre-trained skep models, you can find them in this dir.
cd -

# download task dataset
cd ./Senta/data/
sh download_en_data.sh # download English dataset used in our paper. For Chinese dataset, you can find its download script in this dir.
cd - 
```

Installation

Senta supports Python 3.6 or later. This repository requires PaddlePaddle 1.6.3, please see here for installaton instruction.
Install python dependencies

shell python -m pip install -r requirements.txt
Set up environment variables such as Python, CUDA, cuDNN, PaddlePaddle in env.sh file. Details about environment variables related to PaddlePaddle can be found at the PaddlePaddle Documentation.

Quick Tour

Training

shell sh ./script/run_pretrain_roberta_skep_large_en.sh # pre-trained model roberta_skep_large_en, which is used in our paper
Fine-tuning and predict

```shell sh ./script/runtrain.sh ./config/robertaskeplargeen.SST-2.cls.json # fine-tuning on SST-2 sh ./script/runinfer.sh ./config/robertaskeplargeen.SST-2.infer.json # predict

sh ./script/runtrain.sh ./config/robertaskeplargeen.absalaptops.cls.json # fine-tuning on ABSA(laptops) sh ./script/runinfer.sh ./config/robertaskeplargeen.absalaptops.infer.json # predict

sh ./script/runtrain.sh ./config/robertaskeplargeen.MPQA.orl.json # fine-tuning on MPQA 2.0 sh ./script/runinfer.sh ./config/robertaskeplargeen.MPQA.infer.json # predict ```
An old version of Senta can be found at here, which includes BoW, CNN and BiLSTM models for Chinese sentence-level sentiment classification.

Citation

If you extend or use this work, please cite the paper where it was introduced:

text @inproceedings{tian-etal-2020-skep, title = "{SKEP}: Sentiment Knowledge Enhanced Pre-training for Sentiment Analysis", author = "Tian, Hao and Gao, Can and Xiao, Xinyan and Liu, Hao and He, Bolei and Wu, Hua and Wang, Haifeng and wu, feng", booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics", month = jul, year = "2020", address = "Online", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.acl-main.374", pages = "4067--4076", abstract = "Recently, sentiment analysis has seen remarkable advance with the help of pre-training approaches. However, sentiment knowledge, such as sentiment words and aspect-sentiment pairs, is ignored in the process of pre-training, despite the fact that they are widely used in traditional sentiment analysis approaches. In this paper, we introduce Sentiment Knowledge Enhanced Pre-training (SKEP) in order to learn a unified sentiment representation for multiple sentiment analysis tasks. With the help of automatically-mined knowledge, SKEP conducts sentiment masking and constructs three sentiment knowledge prediction objectives, so as to embed sentiment information at the word, polarity and aspect level into pre-trained sentiment representation. In particular, the prediction of aspect-sentiment pairs is converted into multi-label classification, aiming to capture the dependency between words in a pair. Experiments on three kinds of sentiment tasks show that SKEP significantly outperforms strong pre-training baseline, and achieves new state-of-the-art results on most of the test datasets. We release our code at https://github.com/baidu/Senta.", }

Owner

Name: Baidu
Login: baidu
Kind: organization
Email: opensource@baidu.com
Location: Beijing, China

Website: http://www.baidu.com
Repositories: 110
Profile: https://github.com/baidu

Baidu Open Source Projects

GitHub Events

Total

Watch event: 81
Issue comment event: 2
Fork event: 8

Last Year

Watch event: 81
Issue comment event: 2
Fork event: 8

Committers

Last synced: 9 months ago

All Time

Total Commits: 29
Total Committers: 5
Avg Commits per committer: 5.8
Development Distribution Score (DDS): 0.517

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
ChinaLiuHao	l**4@b**m	14
xfcygaocan	x**n@g**m	8
ChinaLiuHao	g**1@b**m	4
lujun	l**3@1**m	2
dependabot[bot]	4****]	1

Committer Domains (Top 20 + Academic)

baidu.com: 2 126.com: 1

Issues and Pull Requests

Last synced: 8 months ago

All Time

Total issues: 86
Total pull requests: 15
Average time to close issues: 2 months
Average time to close pull requests: over 1 year
Total issue authors: 82
Total pull request authors: 10
Average comments per issue: 1.73
Average comments per pull request: 0.0
Merged pull requests: 3
Bot issues: 0
Bot pull requests: 4

Past Year

Issues: 0
Pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 1

View more stats

Top Authors

Issue Authors

QiAnXinCodeSafe (2)
JohnnyYuan93 (2)
ericxsun (2)
bitallin (2)
iclementine (1)
Kolt1911 (1)
izhx (1)
MrWaterZhou (1)
s5248 (1)
chengyumeng (1)
wangmeng1014 (1)
chensvm (1)
jiangxinke (1)
mrmrfan (1)
ZhiliWang (1)

Pull Request Authors

dependabot[bot] (5)
hczhcz (2)
junjun315 (2)
Alexis-GP (2)
igfuns (1)
UncleLLD (1)
slyrx (1)
wi24rd (1)
iclementine (1)
LucienShui (1)

Top Labels

Issue Labels

Pull Request Labels

dependencies (5)

Packages

Total packages: 1
Total downloads:
- pypi 46 last-month

Total dependent packages: 0
Total dependent repositories: 1
Total versions: 5
Total maintainers: 2

pypi.org: senta

A sentiment classification tools made by Baidu NLP.

Homepage: https://github.com/baidu/senta
Documentation: https://senta.readthedocs.io/
License: Apache 2.0
Latest release: 2.0.0
published almost 6 years ago

Versions: 5
Dependent Packages: 0
Dependent Repositories: 1
Downloads: 46 Last month

Rankings

Stargazers count: 1.7%

Forks count: 2.8%

Dependent packages count: 10.0%

Average: 10.9%

Downloads: 18.3%

Dependent repos count: 21.7%

Maintainers (2)

jiyunjie xfcygaocan

Last synced: 6 months ago

Dependencies

requirements.txt pypi

nltk ==3.4.5
numpy ==1.14.5
scikit-learn ==0.20.4
sentencepiece ==0.1.83
six ==1.11.0

setup.py pypi

nltk *
numpy *
scikit-learn *
sentencepiece *
six *

senta

Science Score: 13.0%

Keywords

Keywords from Contributors

Repository

Basic Info

Statistics

Topics

Metadata Files

README.en.md

Senta

How to use

Pip

Installation

Quick Tour

From source

Preparation

Installation

Quick Tour

Citation

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: senta

Rankings

Maintainers (2)

Dependencies

`Senta`