https://github.com/alignmentresearch/deception-evasion-honesty

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (3.2%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: AlignmentResearch
Language: Python
Default Branch: main
Size: 6.38 MB

Statistics

Stars: 2
Watchers: 3
Forks: 3
Open Issues: 2
Releases: 0

Created about 1 year ago · Last pushed 11 months ago

Metadata Files

Readme

README.md

This repository hosts the code for the paper Preference Learning with Lie Detectors can Induce Honesty or Evasion.

An example of a setup and a basic experimental run is given in run.sh. Different run configurations can be adjusted by setting the flags such as DO_DPO to true or false. The codebase has been tested on the pytorch/pytorch:2.5.1-cuda12.1-cudnn9-devel Docker image.

Owner

Name: FAR AI
Login: AlignmentResearch
Kind: organization
Email: hello@far.ai

Website: https://far.ai
Repositories: 16
Profile: https://github.com/AlignmentResearch

FAR AI is an alignment research non-profit working to ensure AI systems are trustworthy and beneficial to society.

GitHub Events

Total

Issues event: 2
Member event: 1
Issue comment event: 5
Push event: 6
Public event: 1
Pull request event: 8
Fork event: 1
Create event: 1

Last Year

Issues event: 2
Member event: 1
Issue comment event: 5
Push event: 6
Public event: 1
Pull request event: 8
Fork event: 1
Create event: 1

Dependencies

Dockerfile docker

${BASE_IMAGE} latest build

pyproject.toml pypi

bitsandbytes *
evaluate *
matplotlib *
openai *
pandas *
peft *
pre-commit *
scikit-learn *
seaborn *
torch *
torchvision *
transformers ==4.46
trl ==0.12
typeapi ==2.1.2
wandb *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science