https://github.com/alignmentresearch/deception-evasion-honesty

https://github.com/alignmentresearch/deception-evasion-honesty

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (3.2%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: AlignmentResearch
  • Language: Python
  • Default Branch: main
  • Size: 6.38 MB
Statistics
  • Stars: 2
  • Watchers: 3
  • Forks: 3
  • Open Issues: 2
  • Releases: 0
Created about 1 year ago · Last pushed 11 months ago
Metadata Files
Readme

README.md

This repository hosts the code for the paper Preference Learning with Lie Detectors can Induce Honesty or Evasion.

An example of a setup and a basic experimental run is given in run.sh. Different run configurations can be adjusted by setting the flags such as DO_DPO to true or false. The codebase has been tested on the pytorch/pytorch:2.5.1-cuda12.1-cudnn9-devel Docker image.

Owner

  • Name: FAR AI
  • Login: AlignmentResearch
  • Kind: organization
  • Email: hello@far.ai

FAR AI is an alignment research non-profit working to ensure AI systems are trustworthy and beneficial to society.

GitHub Events

Total
  • Issues event: 2
  • Member event: 1
  • Issue comment event: 5
  • Push event: 6
  • Public event: 1
  • Pull request event: 8
  • Fork event: 1
  • Create event: 1
Last Year
  • Issues event: 2
  • Member event: 1
  • Issue comment event: 5
  • Push event: 6
  • Public event: 1
  • Pull request event: 8
  • Fork event: 1
  • Create event: 1

Dependencies

Dockerfile docker
  • ${BASE_IMAGE} latest build
pyproject.toml pypi
  • bitsandbytes *
  • evaluate *
  • matplotlib *
  • openai *
  • pandas *
  • peft *
  • pre-commit *
  • scikit-learn *
  • seaborn *
  • torch *
  • torchvision *
  • transformers ==4.46
  • trl ==0.12
  • typeapi ==2.1.2
  • wandb *