multi-language-sentiment
Pipeline for language agnostic sentiment analysis
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (6.1%) to scientific vocabulary
Repository
Pipeline for language agnostic sentiment analysis
Basic Info
- Host: GitHub
- Owner: AaltoRSE
- License: mit
- Language: Python
- Default Branch: main
- Size: 19.5 KB
Statistics
- Stars: 0
- Watchers: 5
- Forks: 0
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
multi-language-sentiment
A pipeline for sentiment analysis for texts with unknown language.
We use the lingua-language-detector to detect the language and run the text samples through an sentiment analysis appropriate pipeline for that language.
Usage
Basic usage for analysing a list of sentences:
``` python import multilanguagesentiment
texts = ["This is a positive sentence", "Tämä on ikävä juttu"] sentiments = multilanguagesentiment.sentiment(texts) print(sentiments) ```
This should print
[{'label': 'positive', 'score': 0.89024418592453}, {'label': 'negative', 'score': 0.8899219632148743}]
Supported language
The module currently supports the following langauges by default: English, Japanese, Arabic, German, Spanish, French, Chinese, Indonesian, Hindi, Italian, Malay, Portuguese, Swedish, and Finnish.
For other languages, you must supply a path for a HuggingFace sentiment analysis pipeline. To supply a pipelien for a new language, use the models parameter:
``` python import multilanguagesentiment from lingua import Language
texts = ["This is a positive sentence", "Tämä on ikävä juttu"] models = {Language.FINNISH: "fergusq/finbert-finnsentiment"} sentiments = multilanguagesentiment.sentiment(texts, models = models) ```
Technical details
Note that the pipeline will split each text sample to a maximum length of 512 characters. The sentiments are aggregated by adding up the scores and taking the largest value.
Owner
- Name: AaltoRSE
- Login: AaltoRSE
- Kind: organization
- Repositories: 38
- Profile: https://github.com/AaltoRSE
Citation (citation.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Rantaharju" given-names: "Jarno" orcid: "https://orcid.org/0000-0002-0072-7707" title: "multi-language-sentiment" version: 0.1.0 doi: 10.5281/zenodo.10639831 date-released: 2024-02-09 url: "https://github.com/AaltoRSE/multi-language-sentiment"
GitHub Events
Total
Last Year
Dependencies
- torch *
- transformers *
- lingua-language-detector *
- torch *
- transformers *