Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.7%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: AntonioLiu97
  • Language: Jupyter Notebook
  • Default Branch: inPCA
  • Size: 293 MB
Statistics
  • Stars: 1
  • Watchers: 3
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme Citation

README.md

LLMICL InPCA

This repository contains the complementary codebase for the paper: Density estimation with LLMs: a geometric investigation of in-context learning trajectories

randomPDF_70B_snapshots LLaMA-2 70B estimating a randomly generated, multi-modal distribution from 400 data points

gaussian_0.1_llama_70b_KDE_hist

In-context density-estimation trajectories traversed by LLaMA-2 70B, Bayesian histogram, and kernel density estimator

Directory structure

  • /data: Contains functions for converting lists of sampled data $X1,X2,...,Xn \sim P(x)$ into 1D strings, which are then used to prompt LLMs. It contains `seriesgenerator.ipynb`, a Jupyter notebook for generating all distributions investigated in the paper: Gaussian, uniform, Student's t-distribution, and random PDFs.

  • /generated_series: This directory caches all prompts generated by series_generator.ipynb in the form of pickled dictionaries.

  • /models:

    • ICL.py implements essential packages like Hierarchy-PDF and its auxiliary functions.
    • generate_predictions.py prompts LLMs such as LLaMA, Mistral, and Gemma with the generated prompts and saves the estimated PDFs as pickled Hierarchy-PDFs.
    • baseline_models.py implements baseline density-estimation algorithms such as KDE and Bayesian histogram.
  • /processed_series: Stores the density estimation trajectories of LLMs.

  • /inPCA: Contains Jupyter notebooks for analyzing LLMs' DE trajectories with InPCA:

    • inPCA_multi_traj.ipynb simultaneously embeds multiple DE trajectories within the same inPCA visualization.
    • inPCA_multi_traj_kernel_nD_fit.ipynb simultaneously embeds multiple DE trajectories, as well as their bespoke KDE trajectories.
    • inPCA_multi_traj_kernel_nD_fit_meta_embed.ipynb performs meta-inPCA embeddings of multiple trajectories and their bespoke KDE imitations.
  • /figures: A repository for all figures generated through the analysis processes.

Owner

  • Name: Toni Liu
  • Login: AntonioLiu97
  • Kind: user
  • Company: Cornell University

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: ICL_inPCA
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Toni J.B.
    family-names: Liu
    email: jl3499@cornell.edu
    affiliation: 'Cornell University '
    orcid: 'https://orcid.org/0009-0001-3142-5402'
identifiers:
  - type: url
    value: 'https://arxiv.org/abs/2410.05218'
    description: ArXiv URL
repository-code: 'https://github.com/AntonioLiu97/LLMICL_inPCA'
url: 'https://github.com/AntonioLiu97/LLMICL_inPCA'
abstract: >-
  This is the codebase for the paper "Density estimation
  with LLMs: a geometric investigation of in-context
  learning trajectories"
keywords:
  - >-
    density estimation, DE, KDE, LLM, in-context learning,
    kernel methods, PCA, InPCA, multidimensional scaling
license: MIT
commit: Initial release
version: 1.0.0
date-released: '2025-02-14'

GitHub Events

Total
  • Watch event: 1
  • Push event: 5
  • Public event: 1
Last Year
  • Watch event: 1
  • Push event: 5
  • Public event: 1