https://github.com/alignmentresearch/deception-evasion-honesty
https://github.com/alignmentresearch/deception-evasion-honesty
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (3.2%) to scientific vocabulary
Last synced: 10 months ago
·
JSON representation
Repository
Basic Info
- Host: GitHub
- Owner: AlignmentResearch
- Language: Python
- Default Branch: main
- Size: 6.38 MB
Statistics
- Stars: 2
- Watchers: 3
- Forks: 3
- Open Issues: 2
- Releases: 0
Created about 1 year ago
· Last pushed 11 months ago
Metadata Files
Readme
README.md
This repository hosts the code for the paper Preference Learning with Lie Detectors can Induce Honesty or Evasion.
An example of a setup and a basic experimental run is given in run.sh. Different run configurations can be adjusted by setting the flags such as DO_DPO to true or false. The codebase has been tested on the pytorch/pytorch:2.5.1-cuda12.1-cudnn9-devel Docker image.
Owner
- Name: FAR AI
- Login: AlignmentResearch
- Kind: organization
- Email: hello@far.ai
- Website: https://far.ai
- Repositories: 16
- Profile: https://github.com/AlignmentResearch
FAR AI is an alignment research non-profit working to ensure AI systems are trustworthy and beneficial to society.
GitHub Events
Total
- Issues event: 2
- Member event: 1
- Issue comment event: 5
- Push event: 6
- Public event: 1
- Pull request event: 8
- Fork event: 1
- Create event: 1
Last Year
- Issues event: 2
- Member event: 1
- Issue comment event: 5
- Push event: 6
- Public event: 1
- Pull request event: 8
- Fork event: 1
- Create event: 1
Dependencies
Dockerfile
docker
- ${BASE_IMAGE} latest build
pyproject.toml
pypi
- bitsandbytes *
- evaluate *
- matplotlib *
- openai *
- pandas *
- peft *
- pre-commit *
- scikit-learn *
- seaborn *
- torch *
- torchvision *
- transformers ==4.46
- trl ==0.12
- typeapi ==2.1.2
- wandb *