sidata

https://github.com/regretcode/sidata

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (4.3%) to scientific vocabulary

Last synced: 11 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: RegretCode
License: mit
Language: Python
Default Branch: main
Homepage: https://regretcode.github.io/SIDATA/
Size: 1.46 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created over 1 year ago · Last pushed over 1 year ago

Metadata Files

Readme License Citation

Sarcasm and Irony Dataset and Analysis

This repository aims to gather sarcasm and irony datasets in various languages, comparing how they were created, trained, the results achieved, and how these methodologies can be adapted to Brazilian Portuguese. Our goal is to provide an overview of current approaches and explore how these research findings can be expanded to the Portuguese language, with a focus on the cultural and linguistic adaptation of models.

Arabic

Bengali

adhirajghosh/irony-bengali-baseline

Chinese

English

French

Greek

charbgr/greek-irony-detection

Hindi

Portuguese

Spanish

LaSTUS-TALN-UPF/IroMovies

Turkish

teghub/Turkish-Irony-Dataset

Annotated Datasets for Sarcasm and Irony Detection

This repository aims to aggregate and organize datasets for sarcasm and irony detection across different languages, with a primary focus on Brazilian Portuguese. The goal is to provide resources and references for the automatic detection of sarcasm and irony, addressing the linguistic and cultural differences of Brazilian Portuguese.

The repository provides information on how the datasets were created, the training methods applied, the results obtained from different sarcasm and irony detection tasks, and how these methodologies can be adapted for Brazilian Portuguese.

Project Objective

This final year project aims to understand and automatically identify sarcasm and irony in Brazilian Portuguese. These two rhetorical devices are often ambiguous. Sarcasm expresses the opposite of what is intended, often in a biting manner, while irony is characterized by a discrepancy between the literal and intended meaning. Both are considered complex tasks in Natural Language Processing (NLP).

The primary goal of this project is to create an organized and accessible dataset for sarcasm and irony detection, focusing on Brazilian Portuguese. In addition to organizing existing resources in the literature, the project also aims to establish new collections of sarcasm and irony data in Portuguese. Furthermore, it seeks to evaluate the feasibility of adapting existing techniques from other languages for the Portuguese context.

How to Contribute

If you'd like to contribute to this repository, you can add new repositories, datasets, or results by following the guidelines below.

Guidelines

Results: While results from peer-reviewed papers are preferred, any relevant results can be included.
Datasets: Datasets should be used for evaluation, and references to any related publications can be added if available.
Training Methods Applied: If an implementation is available, include a link to it. If not, simply omit this section.

Adding a New Repository Reference

To add a new repository reference, follow these steps:

Navigate to the correct language folder:
- For example, if the repository is related to Portuguese, navigate to the SIDATA/portuguese folder.
Create a new .md file with the name of the repository:
- Name the file exactly as the repository name. For example, for the "Sarcasm-Detection" repository, create Sarcasm-Detection.md.
Fill in the repository details: Include the following information in the .md file:
- URL: Link to the repository.
- Description: A short description of what the repository does.
- Dataset: (Optional) Provide a link to the dataset if available.
- Results: (Optional) Provide results, such as accuracy or other evaluation metrics.
- Model: (Optional) Specify the model used, if known.

Here's an example of how the file might look:

```markdown # Sarcasm-Detection

URL: https://github.com/user/Sarcasm-Detection

Description: Repository for sarcasm detection using RNNs and word embeddings.

Dataset: Sarcasm Dataset

Results: - Training Accuracy: 92.3% - Validation Accuracy: 89.4%

Model: RNN with Word Embeddings. ```

Handle missing information: If some information is unavailable (such as results or model details), use a fallback template:

```markdown

Sarcasm-Detection

URL: https://github.com/user/Sarcasm-Detection

Description: Repository for sarcasm detection. Details about the model and results are not specified.

Results: Evaluation results not provided.

Model: Model not specified. ```
Save and commit the .md file to the repository.

```markdown

Adding a Reference to the README.md File

After creating the .md file, update the README.md file to include a reference to the newly added repository: ```
1. Open the README.md file at the root of the repository.
2. Add a new section or update an existing section with a link to the new repository. For example, if the repository is in Portuguese, add it to the "Repositories in Portuguese" section.
  
  Example of adding a repository reference:
  
```markdown

Repositories in Portuguese

Here are some repositories related to sarcasm and irony detection in Portuguese:
```
- [Sarcasm-Detection](SIDATA/portuguese/Sarcasm-Detection.md) - Repository for sarcasm detection using RNNs and word embeddings.
```
```
1. Commit your changes:
  - Stage and commit the changes to README.md and the new .md file. git add SIDATA/portuguese/Sarcasm-Detection.md git add README.md git commit -m "Add Sarcasm-Detection repository reference and create corresponding .md file"
2. Push the changes:
  
  git push

Alternative When Information is Missing

If some information about the repository is not available (e.g., missing model details or results), you can still add the repository with the available information, using the fallback template. This ensures that the repository is included in the project, even if it is incomplete.

Wishlist

These are tasks and datasets that are still missing or needed:

Sarcasm and irony detection datasets in Brazilian Portuguese
Tools for adapting sarcasm and irony detection models to Brazilian Portuguese
Datasets and techniques for sarcasm and irony detection in other languages (contributions in any language are welcome!)
Evaluation of sarcasm and irony detection models in various cultural contexts

Exporting to a Structured Format

You can extract all the data into a structured, machine-readable JSON format with parsed tasks, descriptions, and SOTA tables. Instructions are available in structured/README.md.

Instructions for Building the Site Locally

If you want to build the site locally using Jekyll, follow the instructions in jekyll_instructions.md.

Owner

Name: Ricardo Cordeiro
Login: RegretCode
Kind: user
Location: Brazil

Repositories: 1
Profile: https://github.com/RegretCode

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Cordeiro"
  given-names: "Ricardo"
title: "SIDATA"
version: 1.0.0
doi: 10.5281/zenodo.1234
date-released: 01-25-2025
url: "https://regretcode.github.io/SIDATA/"