multi-language-sentiment

Pipeline for language agnostic sentiment analysis

https://github.com/aaltorse/multi-language-sentiment

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (6.1%) to scientific vocabulary

Last synced: 11 months ago · JSON representation ·

Repository

Pipeline for language agnostic sentiment analysis

Basic Info

Host: GitHub
Owner: AaltoRSE
License: mit
Language: Python
Default Branch: main
Size: 19.5 KB

Statistics

Stars: 0
Watchers: 5
Forks: 0
Open Issues: 0
Releases: 1

Created over 2 years ago · Last pushed over 2 years ago

Metadata Files

Readme License Citation

README.md

multi-language-sentiment

A pipeline for sentiment analysis for texts with unknown language.

We use the lingua-language-detector to detect the language and run the text samples through an sentiment analysis appropriate pipeline for that language.

Usage

Basic usage for analysing a list of sentences:

``` python import multilanguagesentiment

texts = ["This is a positive sentence", "Tämä on ikävä juttu"] sentiments = multilanguagesentiment.sentiment(texts) print(sentiments) ```

This should print [{'label': 'positive', 'score': 0.89024418592453}, {'label': 'negative', 'score': 0.8899219632148743}]

Supported language

The module currently supports the following langauges by default: English, Japanese, Arabic, German, Spanish, French, Chinese, Indonesian, Hindi, Italian, Malay, Portuguese, Swedish, and Finnish.

For other languages, you must supply a path for a HuggingFace sentiment analysis pipeline. To supply a pipelien for a new language, use the models parameter:

``` python import multilanguagesentiment from lingua import Language

texts = ["This is a positive sentence", "Tämä on ikävä juttu"] models = {Language.FINNISH: "fergusq/finbert-finnsentiment"} sentiments = multilanguagesentiment.sentiment(texts, models = models) ```

Technical details

Note that the pipeline will split each text sample to a maximum length of 512 characters. The sentiments are aggregated by adding up the scores and taking the largest value.

Owner

Name: AaltoRSE
Login: AaltoRSE
Kind: organization

Repositories: 38
Profile: https://github.com/AaltoRSE

Citation (citation.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Rantaharju"
  given-names: "Jarno"
  orcid: "https://orcid.org/0000-0002-0072-7707"
title: "multi-language-sentiment"
version: 0.1.0
doi: 10.5281/zenodo.10639831
date-released: 2024-02-09
url: "https://github.com/AaltoRSE/multi-language-sentiment"

GitHub Events

Total

Last Year

Dependencies

requirements.txt pypi

torch *
transformers *

pyproject.toml pypi

lingua-language-detector *
torch *
transformers *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

multi-language-sentiment

Science Score: 44.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

multi-language-sentiment

Usage

Supported language

Technical details

Owner

Citation (citation.cff)

GitHub Events

Total

Last Year

Dependencies