semeval2024-boundary-detection
Solution for SemEval204-Task8-subtaskC. Our solution recieves best MAE score in accoradance with the leaderboard.
https://github.com/natriistorm/semeval2024-boundary-detection
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.0%) to scientific vocabulary
Repository
Solution for SemEval204-Task8-subtaskC. Our solution recieves best MAE score in accoradance with the leaderboard.
Basic Info
- Host: GitHub
- Owner: natriistorm
- License: mit
- Language: Python
- Default Branch: main
- Homepage: https://aclanthology.org/2024.semeval-1.257/
- Size: 1.67 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
DeepPavlov at SemEval-2024 Task 8: Leveraging Transfer Learning for Detecting Boundaries of Machine-Generated Texts
[Anastasia Voznyuk](https://github.com/natriistorm)1 :email: *, Vasily Konovalov1 1 Moscow Institute of Physics and Technology :email: Corresponding author: vozniuk.ae@phystech.edu [📝 Paper](https://aclanthology.org/2024.semeval-1.257/), [> Code](https://github.com/natriistorm/SemEval2024-boundary-detection/tree/main/src)💡 Abstract
The Multigenerator, Multidomain, and Multilingual Black-Box Machine-Generated Text Detection shared task in the SemEval-2024 competition aims to tackle the problem of misusing collaborative human-AI writing. Although there are a lot of existing detectors of AI content, they are often designed to give a binary answer and thus may not be suitable for more nuanced problem of finding the boundaries between human-written and machine-generated texts, while hybrid human-AI writing becomes more and more popular. In this paper, we address the boundary detection problem. Particularly, we present a pipeline for augmenting data for supervised fine-tuning of DeBERTaV3. We receive new best MAE score, according to the leaderboard of the competition, with this pipeline.
🔎 Overview
🛠️ Repository Structure
The repository is structured as follows:
- src: This directory contains the code used in the paper and for submission.
shell
Forecasting-fMRI-Images
├── LICENSE
├── README.md
└── code
├── run.sh # shell script to load transformer_baseline and start experiment
├── data_augmentation.py # main file for augmentation
├── transformer_baseline.py # file to run experiments
├── splitter.py # util file for splitting the texts
└── scorer.py # file to calculate MAE
🔎 Citation
@inproceedings{voznyuk-konovalov-2024-deeppavlov,
title = "{D}eep{P}avlov at {S}em{E}val-2024 Task 8: Leveraging Transfer Learning for Detecting Boundaries of Machine-Generated Texts",
author = "Voznyuk, Anastasia and
Konovalov, Vasily",
booktitle = "Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)",
month = jun,
year = "2024",
address = "Mexico City, Mexico",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.semeval-1.257",
pages = "1821--1829"
}
Owner
- Name: Anastasia Voznyuk
- Login: natriistorm
- Kind: user
- Repositories: 2
- Profile: https://github.com/natriistorm
Citation (CITATION.cff)
title: "DeepPavlov at SemEval-2024 Task 8: Leveraging Transfer Learning for
Detecting Boundaries of Machine-Generated Texts"
abstract: The Multigenerator, Multidomain, and Multilingual Black-Box
Machine-Generated Text Detection shared task in the SemEval-2024 competition
aims to tackle the problem of misusing collaborative human-AI writing.
Although there are a lot of existing detectors of AI content, they are often
designed to give a binary answer and thus may not be suitable for more nuanced
problem of finding the boundaries between human-written and machine-generated
texts, while hybrid human-AI writing becomes more and more popular. In this
paper, we address the boundary detection problem. Particularly, we present a
pipeline for augmenting data for supervised fine-tuning of DeBERTaV3. We
receive new best MAE score, according to the leaderboard of the competition,
with this pipeline.
authors:
- family-names: Voznyuk
given-names: Anastasia
- family-names: Konovalov
given-names: Vasily
cff-version: 1.2.0
date-released: 2024-06-28
identifiers:
- type: url
value: "https://aclanthology.org/2024.semeval-1.257"
description: Latest version
license: Apache-2.0
repository-code: https://github.com/natriistorm/SemEval2024-boundary-detection
preferred-citation:
title: "DeepPavlov at SemEval-2024 Task 8: Leveraging Transfer Learning for
Detecting Boundaries of Machine-Generated Texts"
type: conference-paper
authors:
- family-names: Voznyuk
given-names: Anastasia
- family-names: Konovalov
given-names: Vasily
collection-title: Proceedings of the 18th International Workshop on Semantic
Evaluation (SemEval-2024)
collection-type: proceedings
conference:
name: SemEval
start: 1821
end: 1829
year: 2024