llm_crowd_experts_annotation

https://github.com/digitalepidemiologylab/llm_crowd_experts_annotation

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (7.6%) to scientific vocabulary

Last synced: 11 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: digitalepidemiologylab
Language: Jupyter Notebook
Default Branch: main
Size: 25.5 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 3

Created over 3 years ago · Last pushed over 1 year ago

Metadata Files

Readme Citation

Use of large language models as a scalable approach to understanding public health discourse

This repository assesses the performance of LLMs (GPT versions 3.5 and 4, Mistral and Mixtral), and Amazon Mturk workers in comparison with experts when annotating tweets for public perception on vaccines.

Since Twitter/X data cannot be freely accessible, only certain data is available under the folder 'data', including the tweets id with at least partial agreement among experts.

For visualising the main results of the analysis, including a Shiny application, please do the following steps:

Open the R project.
Check that the working directory is "~/gpt_annotation". If not, change it to that path.
Open "scripts/main.R"
Source the code of the scripts with all data publicly available, indicated by "(public)"

Structure of this repository

R project: enables to have this repository as a portable, self-contained folder. Shiny app: web application to visualise some of the results of the study. data: folder with the publicly available data or aggregated data used in this study. scripts: folder with the R and python scripts used in the study to produce the results. Some of the scripts cannot be run since those are linked to restricted data that is not available in the repository. outputs: folder with the outputs produced by the scripts and included in the study.

Owner

Name: digitalepidemiologylab
Login: digitalepidemiologylab
Kind: organization
Location: Geneva

Website: http://www.digitalepidemiologylab.org
Repositories: 82
Profile: https://github.com/digitalepidemiologylab

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: >-
  Use of large language models as a scalable approach to understanding public health discourse
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Laura
    family-names: Espinosa
    orcid: 'https://orcid.org/0000-0003-0748-9657'
  - given-names: Marcel
    family-names: Salathé
    orcid: 'https://orcid.org/0000-0002-5079-7797'
repository-code: >-
  https://github.com/digitalepidemiologylab/llm_crowd_experts_annotation
license: EUPL-1.2
version: '2'
date-released: '2024-07-20'

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science