Science Score: 18.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.6%) to scientific vocabulary
Repository
do no harm
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
See my modifications step-by-step in citations.md
Hippocratic AI Coding Assignment
Welcome to the Hippocratic AI coding assignment
Instructions
The attached code is a simple multiple-choice question taker. We have included sample questions. Your goal is to make this code "better" - Do not modify testbench.py - You may do anything you like inside hip_agent.py (or add more files) as long as the interface to testbench.py remains the same - You must use GPT 3.5 as the LLM (not gpt 4, palm 2, fine-tuned BERT, etc) - We included an openai api key. Please don't abuse it.
Rules
- This assignment is open-ended. Part of the test is seeing what you decide is important.
- You may use any resources you like with the following restrictions
- They must be resources that would be available to you if you worked here (so no other humans, no closed AIs, no unlicensed code, etc.)
- Allowed resources include but not limited to Stack overflow, random blogs, Chatgpt et al.
- You must cite the sources
- If you use an AI coding tool, in addition to citing the AI generated lines of code, also please include a transcript of the prompts and completions from chat gpt that you used
- The recommended time to spend on this assignment is 4 hours, but there are no restrictions.
- DO NOT PUSH THE API KEY TO GITHUB. OpenAI will automatically delete it.
- You may ask questions.
What does "Better" mean
You decide what better means, but here are some ideas to help get the brain-juices flowing!
- Improve the score using various well-studied methods
- Shots
- Chain of thought
- Introduce documents and retrieval augmented generation (we included one open source book, but you are welcome to add whatever you like)
- AutoGPT
- Improve the quality of the code
- Add a front end interface
- Add testbenches
How will I be evaluated
Good question. We want to know the following - Can you code - Can you understand and deconstruct a problem - Can you operate in an open-ended environment - Can you be creative - Do you understand what it means to deliver value versus check a box - Can you really code - Can you surprise us
Owner
- Login: abrarfrahman
- Kind: user
- Location: SF Bay Area | Madison, WI
- Company: Epic
- Website: https://scholar.google.com/citations?user=BZR-fHoAAAAJ&hl=en
- Repositories: 15
- Profile: https://github.com/abrarfrahman
gob ears 🐻 ...working on something new 🦈
Citation (citations.md)
# Changelog ## Initial Commit **Testbench: 9/20** ## v01: Implement RAG for textbook.txt **Testbench: 14/20** - Outline of RAG implementation: https://towardsdatascience.com/retrieval-augmented-generation-rag-from-theory-to-langchain-implementation-4e9bd5f6a4f2 - Resolve error with Pydantic: https://stackoverflow.com/questions/76313592/import-langchain-error-typeerror-issubclass-arg-1-must-be-a-class - I had an old version of Langchain and they've moved a bunch of packages around: https://python.langchain.com/docs/expression_language/how_to/passthrough - Chunks were too large: https://github.com/langchain-ai/langchain/discussions/3786 - Using a different package for OpenAI client: https://stackoverflow.com/questions/77505030/openai-api-error-you-tried-to-access-openai-chatcompletion-but-this-is-no-lon ## v02: More complex matching **Testbench: 16/20** - The closest multiple choice answer to the generated response would be "could help make informed choices about medical treatment" - This fails because OpenAI added extra text to the response - Also failing due to capitalization ## v03: Add error handling and logging **Testbench: 16/20** - Copied hip_agent.py into GPT and typed "Can you add logging and error handling?" - Bot added try-catch and logging imports ## v04: One-shot prompt enhancement **Testbench: 17/20** - Noticed we were struggling with "all of the above" type answers, it would just return the first correct option. ## v05 (INCOMPLETE): Front-end **Testbench: 17/20** - I have made significant progress in working towards a front-end for this. I have paths for uploading other testbenches and even testing a single questions. - This is a straightforward Flask application. - CSS is GPT generated: "Can you add a css file to make the form pretty?" - Right now, this is still a work in progress. Given more time, I would have fixed the API calls (right now the server is returning an error message instead of valid JSON) and spun up a demo page on Vercel, but I'm pushing the 4 hour recommended limit and want to show an accurate reflection of what I can do in that timespan. <img width="527" alt="Screenshot 2024-03-28 at 11 30 22 AM" src="https://github.com/abrarfrahman/hippocratic-th/assets/28537119/860f0bd4-ae5a-4297-8090-f0ddf5da8c4a">