Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.6%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: digitalepidemiologylab
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 25.5 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 3
Created almost 3 years ago · Last pushed over 1 year ago
Metadata Files
Readme Citation

README.md

Use of large language models as a scalable approach to understanding public health discourse

This repository assesses the performance of LLMs (GPT versions 3.5 and 4, Mistral and Mixtral), and Amazon Mturk workers in comparison with experts when annotating tweets for public perception on vaccines.

Since Twitter/X data cannot be freely accessible, only certain data is available under the folder 'data', including the tweets id with at least partial agreement among experts.

For visualising the main results of the analysis, including a Shiny application, please do the following steps:

  1. Open the R project.
  2. Check that the working directory is "~/gpt_annotation". If not, change it to that path.
  3. Open "scripts/main.R"
  4. Source the code of the scripts with all data publicly available, indicated by "(public)"

Structure of this repository

R project: enables to have this repository as a portable, self-contained folder. Shiny app: web application to visualise some of the results of the study. data: folder with the publicly available data or aggregated data used in this study. scripts: folder with the R and python scripts used in the study to produce the results. Some of the scripts cannot be run since those are linked to restricted data that is not available in the repository. outputs: folder with the outputs produced by the scripts and included in the study.

Owner

  • Name: digitalepidemiologylab
  • Login: digitalepidemiologylab
  • Kind: organization
  • Location: Geneva

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: >-
  Use of large language models as a scalable approach to understanding public health discourse
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Laura
    family-names: Espinosa
    orcid: 'https://orcid.org/0000-0003-0748-9657'
  - given-names: Marcel
    family-names: Salathé
    orcid: 'https://orcid.org/0000-0002-5079-7797'
repository-code: >-
  https://github.com/digitalepidemiologylab/llm_crowd_experts_annotation
license: EUPL-1.2
version: '2'
date-released: '2024-07-20'

GitHub Events

Total
  • Push event: 1
Last Year
  • Push event: 1