https://github.com/centre-for-humanities-computing/llm-tweet-classification

Classifying tweets with large language models with zero- and few-shot learning.

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 3 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.3%) to scientific vocabulary

Last synced: 6 months ago · JSON representation

Repository

Classifying tweets with large language models with zero- and few-shot learning.

Basic Info

Host: GitHub
Owner: centre-for-humanities-computing
License: mit
Language: Python
Default Branch: main
Size: 2.65 MB

Statistics

Stars: 7
Watchers: 0
Forks: 1
Open Issues: 3
Releases: 1

Created over 2 years ago · Last pushed about 1 year ago

Metadata Files

Readme License

llm-tweet-classification

Classifying tweets with large language models with zero- and few-shot learning with custom and generic prompts, as well as supervised learning algorithms for comparison.

Our results on annotating tweets with labels `exemplar` and `political`:

Getting Started

Install all requirements for the LLM classification script. bash pip install -r requirements.txt

NB: This will only install a minimal set of requirements to create figures for reproducability sake with the code below. A more complete requirements file for running the full pipeline can be found in configs.

Inference

The repo contains a CLI script llm_classification.py. You can use it for running arbitrary classification tasks in .tsv or .csv files with Large Language models from either HuggingFace or OpenAI.

If you intend to use OpenAI models, you will have to specify your API key and ORG as environment variables.

bash export OPENAI_API_KEY="..." export OPENAI_ORG="..."

The script has one command-line argument, namely a config file of the following format:

``` [paths] infile="labelleddata.csv" out_dir="predictions/"

[system] seed=0 device="cpu"

[model] name="google/flan-t5-base" task="few-shot"

[inference] xcolumn="rawtext" ycolumn="exemplar" nexamples=5 ```

If you intend to use a custom prompt for a given model, you can save it in a txt file and add its path to the paths section of the config.

[paths] in_file="labelled_data.csv" out_dir="predictions/" prompt_file="custom_prompt.txt"

If you want to use hand-selected examples for few-shot learning, pass along a subset of the original data int the paths section of the config. Examples have to be in the same format as the data.

[paths] in_file="labelled_data.csv" out_dir="predictions/" examples="examples.csv"

You can run the CLI like this:

bash python3 llm_classification.py "config.cfg"

Config Documentation

Paths:
- in_file: str - Path to input file, either .csv or .tsv
- out_dir: str - Output directory. The script creates one if not already there.
System:
- seed: int - Random seed for selecting few-shot examples. Is ignored when task=="zero-shot"
- device: str - Device to run inference on. Change to cuda:0 if you want to run on GPU.
Model:
- name: str - Name of the model from OpenAI or HuggingFace.
- task: {"few-shot", "zero-shot"} - Indicates whether zero-shot or few-shot inference should be run.
Inference:
- x_column: str - Name of independent variable in the table.
- y_column: str - Name of dependent variable in the table.
- n_examples: int - Number of examples to give to few-shot models. Is ignored when task=="zero-shot"

OpenAI script

For ease of use we have developed a script that generates predictions for all OpenAI models in one run. We did this, because OpenAI inference can run on low performance instances, as such it isn't a problem if it takes a long time to run. Additionally since all instances access the same API, and there are rate limits, we could not start multiple instances and run them in parallel.

Paths in this script are hardcoded and you might need to adjust it for personal use.

bash python3 run_gpt_inference.py

Supervised Classification

For supervised models we made a separate script. This includes running and evaluating Glove-200d with logistic regression and finetuning DistilBert for classification.

This script requires different requirements, therefore you should install these from the appropriate file:

bash pip install -r supervised_requirements.txt

Paths in this script are hardcoded and you might need to adjust it for personal use.

bash python3 supervised_classification.py

Output

This will output a table with predictions added to the out_dir folder in the config.

The file name format is as follows:

python f"predictions/{task}_pred_{column}_{model}.csv"

Each table will have a pred_<y_column> and also a train_test_set column that is labelled train for all examples included in the prompt for few-shot learning and test everywhere else.

Evaluating results

To evaluate the performance of the model(s), you can run the CLI evaluation.py script. It has two command line arguments: --indir and --outdir. These, respectively, refer to the folder in which the predictions from the llmclassification.py script has been saved (i.e., your predictions folder), and the folder where the classification report(s) should be saved. --indir defaults to 'predictions/' and --out_dir defaults to 'output/' (which is a folder that is created if it does not exist already)

It can be run as follows:

python python3 evaluation.py --in_dir "your/data/path" --out_dir "your/out/path"

It expects the output file(s) from llm_classification.py in the specified file name format and placement. It will output two files to the specified out folder: - a txt file with the classification report for the test data for each of the files in the --in_dir folder. - a csv file with the same information as the txt file, but which can be used for plotting the results.

Plotting results

The plotting.py script takes the csv-file produced by the evaluation script and makes three plots: - accfigure.png: The accuracy for each of the 8 models on each outcome (political, exemplar) in each task (zero-shot, few-shot) with each prompt type (generic, custom). It's split into four quadrants, with the left side being the exemplar column, the right being political, the upper line being custom prompts and the lower column being generic prompts. - f1figure.png: The f1-score for positive labels for each model in each task – again split into political and exemplar + generic and custom prompt. - precrecfigure.png: Precision plotted against recall for each of the models, split into three rows and four columns. Rows indicate task (zero-shot, few-shot, supervised classification), columns indiciate label column (political, exemplar) and prompt type (generic, custom)

python python3 plotting.py

These are all saved in a figures/ folder.

Owner

Name: Center for Humanities Computing Aarhus
Login: centre-for-humanities-computing
Kind: organization
Email: chcaa@cas.au.dk
Location: Aarhus, Denmark

Website: https://chc.au.dk/
Repositories: 130
Profile: https://github.com/centre-for-humanities-computing

GitHub Events

Total

Release event: 2
Watch event: 4
Push event: 2
Create event: 1

Last Year

Release event: 2
Watch event: 4
Push event: 2
Create event: 1

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 3
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 1
Total pull request authors: 0
Average comments per issue: 0.33
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

KasperFyhn (3)

Pull Request Authors

Top Labels

Issue Labels

bug (2) documentation (1)

Pull Request Labels

Dependencies

pyproject.toml pypi

requirements.txt pypi

numpy >=1.23.5
pandas >=1.5.0
scikit-learn >=1.2.0
scikit-llm >=0.2.0
stormtrooper >=0.2.1
torch >=2.0.0

supervised_requirements.txt pypi

datasets >=2.14.5
embetter >=0.5.2
gensim >=4.2.0
numpy >=1.23.0
pandas >=2.0.0
scikit-learn >=1.2.0
torch >=2.0.1
tqdm >=4.66.0
transformers >=4.23.0

Dockerfile docker

nvidia/cuda 12.2.0-devel-ubuntu22.04 build

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/centre-for-humanities-computing/llm-tweet-classification

Science Score: 49.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

llm-tweet-classification

Our results on annotating tweets with labels `exemplar` and `political`:

Getting Started

Inference

Config Documentation

OpenAI script

Supervised Classification

Output

Evaluating results

Plotting results

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies

https://github.com/centre-for-humanities-computing/llm-tweet-classification

Science Score: 49.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

llm-tweet-classification

Our results on annotating tweets with labels exemplar and political:

Getting Started

Inference

Config Documentation

OpenAI script

Supervised Classification

Output

Evaluating results

Plotting results

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies

Our results on annotating tweets with labels `exemplar` and `political`: