multi-language-sentiment

Pipeline for language agnostic sentiment analysis

https://github.com/aaltorse/multi-language-sentiment

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.1%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Pipeline for language agnostic sentiment analysis

Basic Info
  • Host: GitHub
  • Owner: AaltoRSE
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 19.5 KB
Statistics
  • Stars: 0
  • Watchers: 5
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Created about 2 years ago · Last pushed about 2 years ago
Metadata Files
Readme License Citation

README.md

multi-language-sentiment

A pipeline for sentiment analysis for texts with unknown language.

We use the lingua-language-detector to detect the language and run the text samples through an sentiment analysis appropriate pipeline for that language.

Usage

Basic usage for analysing a list of sentences:

``` python import multilanguagesentiment

texts = ["This is a positive sentence", "Tämä on ikävä juttu"] sentiments = multilanguagesentiment.sentiment(texts) print(sentiments) ```

This should print [{'label': 'positive', 'score': 0.89024418592453}, {'label': 'negative', 'score': 0.8899219632148743}]

Supported language

The module currently supports the following langauges by default: English, Japanese, Arabic, German, Spanish, French, Chinese, Indonesian, Hindi, Italian, Malay, Portuguese, Swedish, and Finnish.

For other languages, you must supply a path for a HuggingFace sentiment analysis pipeline. To supply a pipelien for a new language, use the models parameter:

``` python import multilanguagesentiment from lingua import Language

texts = ["This is a positive sentence", "Tämä on ikävä juttu"] models = {Language.FINNISH: "fergusq/finbert-finnsentiment"} sentiments = multilanguagesentiment.sentiment(texts, models = models) ```

Technical details

Note that the pipeline will split each text sample to a maximum length of 512 characters. The sentiments are aggregated by adding up the scores and taking the largest value.

Owner

  • Name: AaltoRSE
  • Login: AaltoRSE
  • Kind: organization

Citation (citation.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Rantaharju"
  given-names: "Jarno"
  orcid: "https://orcid.org/0000-0002-0072-7707"
title: "multi-language-sentiment"
version: 0.1.0
doi: 10.5281/zenodo.10639831
date-released: 2024-02-09
url: "https://github.com/AaltoRSE/multi-language-sentiment"

GitHub Events

Total
Last Year

Dependencies

requirements.txt pypi
  • torch *
  • transformers *
pyproject.toml pypi
  • lingua-language-detector *
  • torch *
  • transformers *