https://github.com/camtice/sandbagdetect

https://github.com/camtice/sandbagdetect

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.4%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: camtice
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 31.4 MB
Statistics
  • Stars: 6
  • Watchers: 3
  • Forks: 1
  • Open Issues: 1
  • Releases: 1
Created about 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme

README.md

README

The requirements.txt file lists all Python libraries that are required to run the various models. We specified the requirement for each model because sometimes packges are incompatible with each other.

Run

You only need to run the main script in the home directory and specify the settings in the 'config.py' file.

Settings

  • token_name (str): The authentication token for Hugging Face. (Sometimes required)
  • model_id (str): The model id of the model you want to test.
  • model_name (str): The name of the model you want to test. (required for system prompt)
  • datafilename (str): The name of the benchmark you want to test.
  • result_fname (str): The name of the file where the results should be saved.
  • percentages (list): The percentages of weights that should be tested.
  • noise_means (list): The means of the noise that should be tested.
  • noise_stds (list): The standard deviations of the noise that should be tested.
  • n_trials (int): The number of trials that should be run.
  • seeds (list or None): The seeds that should be used for the trials. If None random seeds are used.
  • n_prompts (int or None): The number of prompts between 0% and 100% sandbagging.
  • sample_method (str): The sampling method that should be used. Either "greedy" or "multinomial".

Advanced Settings for password locked models

These settings are special for the password locked models. Usually, you will need the standard settings.

  • PEFT (bool): True
  • adapter_id (str): None
  • bias (bool): False Forcing ABCD answers
  • password (str): None or any string that is the password for the model.

Other Settings

  • exclude_layers (list): The layers that should not receive any noise.
  • quiet (bool): True If you want to suppress the print outputs.
  • verbose (int): 1 - 4 The level of detail in the print outputs.

Contact

For questions reach out to Philipp Alexander Kreer philipp.a.kreer@outlook.de, Cameron Tice cwtice72@gmail.com, Prithvi Shahani victeldo@gmail.com, or... PUT HERE YOUR MAILS :)

Owner

  • Name: Cameron Tice
  • Login: camtice
  • Kind: user

Biomedical Sciences, Psychology, and Statistic student at Auburn University

GitHub Events

Total
  • Watch event: 1
  • Delete event: 18
  • Push event: 119
  • Pull request event: 4
  • Fork event: 1
  • Create event: 10
Last Year
  • Watch event: 1
  • Delete event: 18
  • Push event: 119
  • Pull request event: 4
  • Fork event: 1
  • Create event: 10