catpc

https://github.com/cafelabai/catpc

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (8.5%) to scientific vocabulary

Last synced: 11 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: cafelabai
Language: Jupyter Notebook
Default Branch: main
Size: 2.08 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created over 1 year ago · Last pushed over 1 year ago

Metadata Files

Readme Citation

Overview

This repository houses a comprehensive approach to developing an AI assistant tailored for K12 STEM education, with a strong emphasis on culturally responsive teaching practices. Our project spans several interconnected components: the generation of a culturally responsive dataset, prompt engineering for an AI assistant, and the implementation of a Retrieval-Augmented Generation (RAG) model using PostgreSQL and pgvector.

Repository Structure

Dataset Preparation
- Focuses on generating a culturally responsive dataset for fine-tuning a Large Language Model (LLM).
- Techniques: PDF text extraction, segmentation, and Q&A generation.
- Tools: UnstructuredPDFLoader, OllamaFunctions LLM model.
- Output: JSON file with Q&A pairs that reflect diverse voices and inclusive practices.
Prompt Engineering
- Details the creation and refinement of prompts to develop an AI assistant that supports culturally relevant K12 STEM education.
- Techniques: System prompt design, iterative feedback, and testing.
- Tools: Llama 3.1 model on the Ollama platform.
- Focus: Inclusivity, engagement, cultural relevance, and continuous improvement.
RAG Implementation
- Implements a Retrieval-Augmented Generation (RAG) model that utilizes PostgreSQL and pgvector for vector embedding storage.
- Techniques: Document chunking, vector embedding generation, cosine similarity retrieval.
- Tools: PostgreSQL with pgvector, LLaMA-3 model.
- Output: Contextually accurate and relevant responses to user queries based on retrieved text chunks.

1. Objective and Scope

Objective: To develop an AI assistant and tools that help educators create culturally relevant, engaging, and inclusive lesson plans for K12 STEM education.

Scope: - Design and test initial prompts for culturally responsive AI. - Generate a culturally responsive dataset for fine-tuning. - Implement and refine a RAG model for efficient information retrieval and augmentation.

2. Detailed Components

2.1 Dataset Preparation

Objective: Generate a dataset that reflects diverse voices and inclusive practices in STEM education.
Techniques:
- Text Extraction: Extract meaningful text segments from PDFs using UnstructuredPDFLoader.
- Text Segmentation: Apply fixed-size and sliding window segmentation to maintain context and structure.
- Q&A Generation: Use OllamaFunctions to create Q&A pairs focused on culturally responsive teaching.
- Data Cleaning and Export: Structure the dataset into JSON, with options to convert to CSV for easier integration.

2.2 Prompt Engineering

Objective: Design prompts that guide the AI assistant to generate culturally relevant lesson plans.
Approach:
- System Prompt: Emphasizes inclusivity, engagement, and cultural relevance.
- Testing and Feedback: Continuous refinement based on feedback to improve cultural nuance handling and engagement.
Enhancements:
- Updated prompts for clarity and specificity.
- Revised structures for broader cultural relevance.
- Ongoing evaluation to identify and close gaps.

2.3 RAG Model Implementation

Objective: Implement a RAG model using PostgreSQL and pgvector for efficient retrieval of contextually relevant information.
Techniques:
- Vector Embedding Generation: Extract and store embeddings in PostgreSQL with pgvector.
- Cosine Similarity Retrieval: Retrieve the most relevant text chunks based on user queries.
- Augmentation: Use retrieved chunks to provide contextually accurate responses with the LLaMA-3 model.

3. Future Plans

Continuous Feedback Loop: Engage with educators and experts for ongoing review and improvement.
Fine-Tuning: Create and refine a dataset for further fine-tuning based on identified gaps.
Expanded Testing: Continue testing and documenting results to close remaining gaps.

Owner

Name: CAFE Lab
Login: cafelabai
Kind: organization
Location: United States of America

Repositories: 1
Profile: https://github.com/cafelabai

Community AI For Education Lab @ IUI

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Jadhav"
  given-names: "Kshitija Suresh"
- family-names: "Chauhan"
  given-names: "Ankit Singh"
- family-names: "Chakraborty"
  given-names: "Sunandan"
  orcid: "https://orcid.org/0000-0002-3331-6082"
- family-names: "Price"
  given-names: "Jeremy F"
  orcid: "https://orcid.org/0000-0002-6506-3526"
title: "CATpc"
date-released: 2024-11-07
url: "https://github.com/cafelabai/CATpc"

GitHub Events

Total

Member event: 2
Push event: 6
Create event: 3

Last Year

Member event: 2
Push event: 6
Create event: 3

Dependencies

RAG Implementation/requirements.txt pypi

SQLAlchemy *
bitsandbytes *
huggingface_hub *
langchain *
langchain-community *
langchain_huggingface *
langchain_postgres *
llama-index *
llama-index-vector-stores-postgres *
pgvector *
psycopg2 *
psycopg2-binary *
python-docx *
sentence-transformers *
sqlalchemy *
transformers *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science