Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.5%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: cafelabai
- Language: Jupyter Notebook
- Default Branch: main
- Size: 2.08 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Overview
This repository houses a comprehensive approach to developing an AI assistant tailored for K12 STEM education, with a strong emphasis on culturally responsive teaching practices. Our project spans several interconnected components: the generation of a culturally responsive dataset, prompt engineering for an AI assistant, and the implementation of a Retrieval-Augmented Generation (RAG) model using PostgreSQL and pgvector.
Repository Structure
Dataset Preparation
- Focuses on generating a culturally responsive dataset for fine-tuning a Large Language Model (LLM).
- Techniques: PDF text extraction, segmentation, and Q&A generation.
- Tools:
UnstructuredPDFLoader,OllamaFunctionsLLM model. - Output: JSON file with Q&A pairs that reflect diverse voices and inclusive practices.
Prompt Engineering
- Details the creation and refinement of prompts to develop an AI assistant that supports culturally relevant K12 STEM education.
- Techniques: System prompt design, iterative feedback, and testing.
- Tools: Llama 3.1 model on the Ollama platform.
- Focus: Inclusivity, engagement, cultural relevance, and continuous improvement.
RAG Implementation
- Implements a Retrieval-Augmented Generation (RAG) model that utilizes PostgreSQL and pgvector for vector embedding storage.
- Techniques: Document chunking, vector embedding generation, cosine similarity retrieval.
- Tools: PostgreSQL with pgvector, LLaMA-3 model.
- Output: Contextually accurate and relevant responses to user queries based on retrieved text chunks.
1. Objective and Scope
Objective: To develop an AI assistant and tools that help educators create culturally relevant, engaging, and inclusive lesson plans for K12 STEM education.
Scope: - Design and test initial prompts for culturally responsive AI. - Generate a culturally responsive dataset for fine-tuning. - Implement and refine a RAG model for efficient information retrieval and augmentation.
2. Detailed Components
2.1 Dataset Preparation
- Objective: Generate a dataset that reflects diverse voices and inclusive practices in STEM education.
- Techniques:
- Text Extraction: Extract meaningful text segments from PDFs using
UnstructuredPDFLoader. - Text Segmentation: Apply fixed-size and sliding window segmentation to maintain context and structure.
- Q&A Generation: Use
OllamaFunctionsto create Q&A pairs focused on culturally responsive teaching. - Data Cleaning and Export: Structure the dataset into JSON, with options to convert to CSV for easier integration.
- Text Extraction: Extract meaningful text segments from PDFs using
2.2 Prompt Engineering
- Objective: Design prompts that guide the AI assistant to generate culturally relevant lesson plans.
- Approach:
- System Prompt: Emphasizes inclusivity, engagement, and cultural relevance.
- Testing and Feedback: Continuous refinement based on feedback to improve cultural nuance handling and engagement.
- Enhancements:
- Updated prompts for clarity and specificity.
- Revised structures for broader cultural relevance.
- Ongoing evaluation to identify and close gaps.
2.3 RAG Model Implementation
- Objective: Implement a RAG model using PostgreSQL and pgvector for efficient retrieval of contextually relevant information.
- Techniques:
- Vector Embedding Generation: Extract and store embeddings in PostgreSQL with pgvector.
- Cosine Similarity Retrieval: Retrieve the most relevant text chunks based on user queries.
- Augmentation: Use retrieved chunks to provide contextually accurate responses with the LLaMA-3 model.
3. Future Plans
- Continuous Feedback Loop: Engage with educators and experts for ongoing review and improvement.
- Fine-Tuning: Create and refine a dataset for further fine-tuning based on identified gaps.
- Expanded Testing: Continue testing and documenting results to close remaining gaps.
Owner
- Name: CAFE Lab
- Login: cafelabai
- Kind: organization
- Location: United States of America
- Repositories: 1
- Profile: https://github.com/cafelabai
Community AI For Education Lab @ IUI
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Jadhav" given-names: "Kshitija Suresh" - family-names: "Chauhan" given-names: "Ankit Singh" - family-names: "Chakraborty" given-names: "Sunandan" orcid: "https://orcid.org/0000-0002-3331-6082" - family-names: "Price" given-names: "Jeremy F" orcid: "https://orcid.org/0000-0002-6506-3526" title: "CATpc" date-released: 2024-11-07 url: "https://github.com/cafelabai/CATpc"
GitHub Events
Total
- Member event: 2
- Push event: 6
- Create event: 3
Last Year
- Member event: 2
- Push event: 6
- Create event: 3
Dependencies
- SQLAlchemy *
- bitsandbytes *
- huggingface_hub *
- langchain *
- langchain-community *
- langchain_huggingface *
- langchain_postgres *
- llama-index *
- llama-index-vector-stores-postgres *
- pgvector *
- psycopg2 *
- psycopg2-binary *
- python-docx *
- sentence-transformers *
- sqlalchemy *
- transformers *