memprompt
A method to fix GPT-3 after deployment with user feedback, without re-training.
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.2%) to scientific vocabulary
Keywords
Repository
A method to fix GPT-3 after deployment with user feedback, without re-training.
Basic Info
Statistics
- Stars: 328
- Watchers: 10
- Forks: 13
- Open Issues: 2
- Releases: 0
Topics
Metadata Files
README.md
Memory-assisted prompt editing to improve GPT-3 after deployment
Code and data for our work memprompt, EMNLP 2022.

Checkout https://www.memprompt.com/ for more details!
Architecture

Running a job
Please note that you need to `export OPENAIAPIKEY="YOUROPENAIKEY"` before running the scripts.
The required libraries are listed in the requirements.txt file.
Notebook
- This notebook implements a version of memprompt (called grow-prompt in the paper) for simulating a Python terminal. The repo has a lot more going on, including memory-based querying, but this notebook captures one important aspect of the work (stateful interactions with few-shot models).
Streaming with memory
- To run a streaming job using memory, run the following command:
sh
python src/streaming/stream_with_memory.py --task_file ${FILE} \
--job_id ${JOB_ID} \
--getting_clarification_probability ${CLARIFICATION_PROB} \
--memory_type ${MEMORY_TYPE} \
--checkpoint_path ${CHECKPOINT_PATH} \
--prompt_path ${PROMPT_PATH}
Where:
FILEis the path to the task fileJOB_IDis the job IDCLARIFICATION_PROBis the probability of getting clarification from the userMEMORY_TYPEis the type of memory to use. Can beclosestorsemanticCHECKPOINT_PATHis the path to the trained memory checkpoint file, only needed ifMEMORY_TYPEissemanticPROMPT_PATHis the path to the prompt file. Currently, we have two prompts located indata/prompts/, one for the lexical QA tasks (data/prompts/prompt_lexical_qa.txt) and the other for the GPT-3 synthetic word tasks (data/prompts/prompt_gpt3_synthetic_tasks.txt)
For example, to run a job with 10 samples on the linguistic variation prompts with a clarification probability of 0.5 and closest memory, run:
sh
python src/streaming/stream_with_memory.py --task_file tasks/linguistic/tasks_10.jsonl \
--job_id linguistic \
--getting_clarification_probability 0.5 \
--memory_type closest
After the job finishes, it'll create a log file in
logs/in the form:logs/${task_type}_num_tasks=${num_samples}_clarification_prob=${clarification_prob}_job_id=${job_id}_${timestamp}.jsonl. For the sample script, the file should be:closest_memory_task_type=linguistic_num_tasks=10_clarification_prob=0.5_ts=TS.jsonl.To run a job with trained retriever, please download the checkpoint (link), and set the
CHECKPOINT_PATHto the path of the checkpoint. Please note that the checkpoint may be deleted by the hosting service after a while. We are sorry for the inconvenience, and promise to make the checkpoint available upon acceptance.The
tasksfolder provides several different task files of various sizes and types for you to try out.
Stream with growing prompt
- The usage is similar to the streaming with memory, but the prompt grows as the job proceeds.
sh
python src/streaming/stream_with_growing_prompt.py --task_file memprompt/tasks/linguistic/tasks_10.jsonl\
--job_id linguistic \
--getting_clarification_probability 0.5
- The default prompt is
prompt.txtin the current directory.
Run logs
- The log files created in
logsare jsonl, where each line is a json object which contains varuous details about the job.
js
{
"question": "what has a < sonny > like ring to it ?",
"expected_answer": "the homonym for sonny is sunny",
"generated_answer": " the homonym for sonny is stunny ",
"response": {
"id": "",
"object": "text_completion",
"created": 1637049359,
"model": "davinci:2020-05-03",
"choices": [{
"text": " the homonym for sonny is stun ",
"index": 0,
"logprobs": null,
"finish_reason": "stop"
}]
},
"is_correct": true,
"correct_so_far": 57.14,
"idx": 6,
"memory_metadata": {
"memory_used": true,
"memory_size": 1,
"content": "what has a like ring to it: clarification: when I ask for a word that has a similar ring to it , I want a homonym.",
"match_score": 0,
"query_for_memory_lookup": "what has a like ring to it",
"memory_key": "what has a like ring to it",
"memory_value": "clarification: when I ask for a word that has a similar ring to it , I want a homonym."
},
"elapsed_time": 707385024
}
Creating new tasks
- New tasks can be created by running the following command:
sh
python src/utils/task_handler.py --dump --n n --template_class task --raw_file raw_files
Where:
- `n` is the number of tasks to create
- `task` is the task type (hin, pun, or linguistic)
- `raw_files` is a comma-separated list of raw files to use. These files contain the actual examples that are included in the tasks.
The generated task files are stored in tasks/task_type/tasks_n.jsonl .
- For example, to create a file with 500 tasks of type
hin(Hindi prompts):
sh
python src/utils/src/utils/task_handler.py 500 hin
This creates a file tasks/hin/tasks_500.jsonl with 500 tasks.
- To create a file with 300 tasks of type
synthetic_gpt3:
sh
python src/utils/task_handler.py --dump --n 300 --template_class synthetic_gpt3 --task_files data/gpt3-word-tasks/raw.jsonl
This creates a file tasks/synthetic_gpt3/tasks_300.jsonl with 300 tasks.
Processing logs
To list the progressive scores in a matrix form that can loaded into some Google sheets to generate charts:
sh
python src/utils/log_utils.py --path memprompt/logs/ --pattern "task_type="
Where:
pathis the path to a single jsonl log file or a directory containing multiple files.patternmatches the files within a path directory to process logs.out_pathis the output path to store the results.
Task files
The directory structure is tasks/<task_name>/tasks_<num_tasks>.jsonl .
The current task files are:
tasks/
├── hin
│ ├── tasks_1000.jsonl
│ └── tasks_100.jsonl
├── linguistic
│ ├── tasks_1000.jsonl
│ ├── tasks_100.jsonl
│ └── tasks_10.jsonl
└── pun
├── tasks_1000.jsonl
└── tasks_100.jsonl
Owner
- Name: Aman Madaan
- Login: madaan
- Kind: user
- Location: Pittsburgh, PA
- Website: https://madaan.github.io
- Twitter: aman_madaan
- Repositories: 11
- Profile: https://github.com/madaan
PhD student at CMU
Citation (CITATION.cff)
@article{madaan2022memory,
title={Memory-assisted prompt editing to improve GPT-3 after deployment},
author={Madaan, Aman and Tandon, Niket and Clark, Peter and Yang, Yiming},
journal={arXiv preprint arXiv:2201.06009},
year={2022}
}
GitHub Events
Total
- Watch event: 7
Last Year
- Watch event: 7
Dependencies
- nltk ==3.6.2
- openai ==0.11.0
- pandas ==1.3.0
- python_Levenshtein ==0.12.2
- scikit_learn ==0.24.2
- sentence_transformers ==2.1.0
- spacy ==3.1.1
- torch ==1.9.0