alta2023_shared_task

Baseline model for the ALTA 2023 shared task

https://github.com/zhanhl316/alta2023_shared_task

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.8%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Baseline model for the ALTA 2023 shared task

Basic Info
  • Host: GitHub
  • Owner: zhanhl316
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Size: 19.2 MB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created almost 3 years ago · Last pushed almost 3 years ago
Metadata Files
Readme Contributing License Code of conduct Citation

README.md

ALTA 2023 Shared Task

Homepage

Basic Task Description

The recent advancements in Large Language Models (LLMs) represent a paradigm shift in the field of human-computer interactions. However, akin to any groundbreaking technology, LLMs are a double-edged sword for our society. Beyond disseminating distorted news, the potential misappropriation of LLMs may engender a myriad of social and ethical dilemmas, including academic malfeasance and election manipulation. This incident underscores the escalating urgency within scholarly communities to devise strategies for the detection and thorough scrutiny of synthetic text.

How to use this baseline?

Step 0: Requirements

  • Python 3.8
  • Pytorch 1.8.1
  • CUDA 10.1

Step 1: Installation

Please follow the steps to initialize your enviroment. bash conda create -n alta2023_baseline python=3.8 source activate alta2023_baseline git clone https://github.com/zhanhl316/ALTA2023_shared_task.git cd ALTA2023_shared_task pip install -r requirements.txt

Step 2: Data and Pretrained Model Preparation

(1) Data Preparation: Please follow the format of train/test .json file in the folder "data", and replace them with your own train/dev/test files.

(2) The baseline model is based on RoBERTa (large). Pretrained Model Preparation: Please download Reberta-large model files from Huggingface Repo, Roberta-large, and put these files in the folder "pretrained_model/roberta-large".

Step 3: Training

shell sh run_glue.sh

Step 4: Test

sh run_glue-test.sh

Have any questions?

Please contact Haolan Zhan through haolan.zhan@monash.edu

Owner

  • Name: Haolan Zhan
  • Login: zhanhl316
  • Kind: user

Research Track: Natural Language Processing, Deep Learning, Text Generation.

GitHub Events

Total
Last Year