jarvis_ai
AI Platform with multi LLM Chat and Agent creation capabilities
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (5.7%) to scientific vocabulary
Repository
AI Platform with multi LLM Chat and Agent creation capabilities
Basic Info
- Host: GitHub
- Owner: mantanz
- Language: Python
- Default Branch: main
- Size: 6.26 MB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
🚀 Modern RAG Pipeline - React + FastAPI
A complete transformation of the RAG pipeline from Streamlit to a modern React frontend with FastAPI backend. This implementation provides a robust, scalable, and maintainable architecture for document processing and intelligent querying.
🏗️ ARCHITECTURE OVERVIEW
┌─────────────────────────────────────────────────────────────────┐
│ Frontend (React) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Chat │ │ Document │ │ PDF │ │
│ │ Interface │ │ Manager │ │ Viewer │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │
│ React Context │
│ (State) │
└─────────────────────────┼───────────────────────────────────────┘
│ HTTP/REST API
┌─────────────────────────┼───────────────────────────────────────┐
│ Backend (FastAPI) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Document │ │ RAG │ │ Citation │ │
│ │ Processing │ │ Pipeline │ │ Management │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────┼───────────────────────────────────────┘
│
┌─────────────────────────┼───────────────────────────────────────┐
│ Data Layer │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ ChromaDB │ │ Vector │ │ PDF │ │
│ │ Database │ │ Embeddings │ │ Storage │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────────┘
✨ KEY FEATURES
🎯 Frontend (React)
- Modern UI/UX: Beautiful, responsive interface with Tailwind CSS
- Real-time Chat: Interactive chat interface with typing indicators
- Drag & Drop Upload: Intuitive document upload with progress tracking
- Smart Citations: Interactive citations with hover tooltips and navigation
- PDF Integration: Embedded PDF viewer with chunk highlighting
- State Management: Centralized state with React Context
- Error Handling: Comprehensive error boundaries and user feedback
⚡ Backend (FastAPI)
- RESTful API: Clean, documented API endpoints
- Async Processing: Non-blocking document processing
- Auto Documentation: Interactive API docs at
/docs - CORS Support: Configured for frontend integration
- File Management: Efficient PDF storage and retrieval
- Citation Enhancement: Advanced citation processing and navigation
🧠 RAG Pipeline
- Advanced Chunking: Paragraph-aware document processing
- Vector Search: ChromaDB integration for semantic search
- Citation Tracking: Source attribution with page references
- LLM Integration: Ollama support for local inference
- Smart Renumbering: Dynamic citation reorganization
🚀 QUICK START
Option 1: Full Stack (Recommended)
```bash
Start both backend and frontend
./startfullapp.sh ```
Option 2: Individual Services
```bash
Terminal 1: Start Backend
./start_backend.sh
Terminal 2: Start Frontend
./start_frontend.sh ```
Access Points
- 🌐 Frontend App: http://localhost:3000
- 📖 Backend API: http://localhost:8000
- 📚 API Documentation: http://localhost:8000/docs
📋 PREREQUISITES
System Requirements
- Python: 3.8+ (for backend)
- Node.js: 16+ (for frontend)
- npm: 8+ (for package management)
External Services
- Ollama: Local LLM inference
bash # Install Ollama from https://ollama.com/ ollama pull llama3.2:latest ollama serve
📦 INSTALLATION
1. Backend Setup
```bash
Create virtual environment
python -m venv ragenv source ragenv/bin/activate # Windows: rag_env\Scripts\activate
Install dependencies
pip install -r requirements.txt ```
2. Frontend Setup
```bash
Install Node.js dependencies
npm install ```
3. Environment Configuration
Create environment variables: ```bash
For React (can be set in scripts)
export REACTAPPAPI_URL=http://localhost:8000 ```
🏗️ PROJECT STRUCTURE
📁 Docs_RAG/
├── 🐍 Backend (Python/FastAPI)
│ ├── main.py # FastAPI application
│ ├── processing.py # Document processing
│ ├── query_data.py # RAG pipeline
│ ├── citation_manager.py # Citation management
│ ├── document_service.py # Document utilities
│ └── requirements.txt # Python dependencies
│
├── ⚛️ Frontend (React)
│ ├── src/
│ │ ├── components/
│ │ │ ├── chat/ # Chat interface
│ │ │ │ ├── ChatInterface.js
│ │ │ │ ├── MessageBubble.js
│ │ │ │ └── CitationTooltip.js
│ │ │ ├── ui/ # Reusable UI components
│ │ │ │ ├── Button.js
│ │ │ │ ├── LoadingSpinner.js
│ │ │ │ └── DocumentUploader.js
│ │ │ └── MainInterface.js # Main application layout
│ │ ├── contexts/
│ │ │ └── AppContext.js # Global state management
│ │ ├── services/
│ │ │ └── api.js # API communication
│ │ └── App.js # Root component
│ ├── public/
│ └── package.json # Node.js dependencies
│
├── 🚀 Scripts
│ ├── start_backend.sh # Backend startup
│ ├── start_frontend.sh # Frontend startup
│ └── start_full_app.sh # Full stack startup
│
└── 📊 Data
├── data/ # PDF document storage
└── chroma/ # Vector database
🔧 COMPONENT ARCHITECTURE
React Components
Core Layout
MainInterface: Main application shell with sidebar and tabsChatInterface: Real-time chat with RAG pipelineDocumentUploader: Drag & drop file upload with progress
UI Components
Button: Configurable button with variants and loading statesLoadingSpinner: Reusable loading indicatorsMessageBubble: Chat messages with citation integration
Citation System
CitationTooltip: Interactive citation previews- Smart parsing of
[Source X]patterns - Click-to-navigate functionality
API Endpoints
Document Management
http
POST /documents/upload # Upload and process PDFs
GET /documents # List all documents
GET /documents/{id}/info # Get document details
DELETE /documents/clear # Clear all documents
RAG Operations
http
POST /query # Perform RAG query
GET /citations/{id}/navigate # Get citation navigation
Utilities
http
GET /health # Health check
GET /documents/{id}/base64 # Get PDF as base64
🎨 UI/UX FEATURES
Modern Design
- Tailwind CSS: Utility-first styling
- Framer Motion: Smooth animations and transitions
- Responsive Layout: Mobile-friendly design
- Collapsible Sidebar: Space-efficient navigation
User Experience
- Real-time Feedback: Loading states and progress indicators
- Toast Notifications: Success/error messages
- Keyboard Shortcuts: Cmd/Ctrl+Enter to send messages
- Auto-scroll: Chat automatically scrolls to new messages
Accessibility
- Semantic HTML: Proper ARIA labels and roles
- Focus Management: Keyboard navigation support
- Screen Reader: Compatible with assistive technologies
📚 USAGE GUIDE
1. Document Upload
- Navigate to the "Upload" tab
- Drag & drop PDF files or click to browse
- Monitor upload progress
- Files are automatically processed and vectorized
2. Querying Documents
- Switch to "Chat" tab
- Type your question in the input field
- Press Enter or click Send
- View results with interactive citations
3. Citation Navigation
- Click on
[Source X]tags in responses - View citation details in tooltips
- Navigate to specific document pages
- Explore source content in PDF viewer
4. Document Management
- Use "Documents" tab to view uploaded files
- Click "View" to open PDF viewer
- Clear individual or all documents
- Monitor document status and metadata
🔧 CONFIGURATION
Backend Configuration
```python
main.py - CORS settings
app.addmiddleware( CORSMiddleware, alloworigins=["http://localhost:3000"], allowcredentials=True, allowmethods=[""], allow_headers=[""], ) ```
Frontend Configuration
javascript
// src/services/api.js - API base URL
const API_BASE_URL = process.env.REACT_APP_API_URL || 'http://localhost:8000';
Processing Parameters
```python
processing.py - Chunking configuration
textsplitter = RecursiveCharacterTextSplitter( chunksize=800, # Adjust for your documents chunk_overlap=80, # Maintain context overlap separators=["\n\n", "\n"] # Paragraph-aware splitting ) ```
🚀 DEPLOYMENT
Development
```bash
Full development stack
./startfullapp.sh ```
Production
```bash
Backend (production)
uvicorn main:app --host 0.0.0.0 --port 8000
Frontend (build)
npm run build npx serve -s build -l 3000 ```
🔍 TROUBLESHOOTING
Common Issues
Backend Connection
```bash
Check if FastAPI is running
curl http://localhost:8000/health
Restart backend
./start_backend.sh ```
Frontend Issues
```bash
Clear node modules and reinstall
rm -rf node_modules package-lock.json npm install
Restart development server
npm start ```
Ollama Problems
```bash
Ensure Ollama is running
ollama serve
Pull/update model
ollama pull llama3.2:latest ```
Performance Optimization
Backend
- Adjust chunk sizes for your document types
- Optimize embedding model selection
- Configure ChromaDB persistence settings
Frontend
- Enable React production build
- Implement code splitting for large components
- Optimize image and asset loading
🌟 ADVANCED FEATURES
Citation Enhancement
- Source Tracking: Full document lineage
- Page References: Exact page and paragraph locations
- Relevance Scoring: ML-based relevance metrics
- Content Previews: Rich tooltip content
PDF Integration
- Text Layer Highlighting: Precise text selection
- Chunk Navigation: Jump to specific content sections
- Cross-platform Support: Works on all modern browsers
- Fallback Strategies: Multiple highlighting approaches
State Management
- Centralized Store: React Context with useReducer
- Optimistic Updates: Immediate UI feedback
- Error Recovery: Graceful error handling
- Connection Monitoring: Real-time backend status
🧪 TESTING
Backend Testing
```bash
Test API health
curl http://localhost:8000/health
Test document upload
curl -X POST -F "files=@document.pdf" http://localhost:8000/documents/upload
Test query
curl -X POST -H "Content-Type: application/json" \ -d '{"query": "test question"}' \ http://localhost:8000/query ```
Frontend Testing
```bash
Run React tests
npm test
Manual testing
1. Upload documents via UI
2. Send queries in chat
3. Test citation navigation
```
📖 API DOCUMENTATION
Visit http://localhost:8000/docs for interactive API documentation with:
- Request/Response Schemas
- Try It Out functionality
- Model Definitions
- Error Codes reference
🤝 CONTRIBUTING
- Fork the repository
- Create a feature branch
- Implement your changes
- Test thoroughly
- Submit a pull request
Development Guidelines
- Follow React hooks patterns
- Use TypeScript for new components (future enhancement)
- Maintain FastAPI async patterns
- Add comprehensive error handling
📄 LICENSE
MIT License - see LICENSE file for details.
🙏 ACKNOWLEDGMENTS
- React Team: For the amazing frontend framework
- FastAPI: For the modern Python web framework
- ChromaDB: For vector database capabilities
- Tailwind CSS: For utility-first styling
- Framer Motion: For smooth animations
- Ollama: For local LLM integration
- Langchain: For document processing utilities
🚀 Ready to explore intelligent document analysis with modern web technologies!
🎯 TRANSFORMATION SUMMARY
This project successfully transforms a Streamlit-based RAG pipeline into a modern, scalable React + FastAPI architecture:
Before (Streamlit)
- ❌ Server-side rendering
- ❌ Limited customization
- ❌ Monolithic architecture
- ❌ Basic UI components
After (React + FastAPI)
- ✅ Client-side React application
- ✅ Fully customizable UI/UX
- ✅ Microservices architecture
- ✅ Modern component system
- ✅ Real-time interactions
- ✅ Professional deployment ready
The new architecture maintains all original RAG functionality while providing a superior user experience and development workflow.
Owner
- Name: Manish Taneja
- Login: mantanz
- Kind: organization
- Location: India
- Repositories: 2
- Profile: https://github.com/mantanz
Citation (citation_manager.py)
# citation_manager.py
import re
from typing import List, Tuple
from langchain_core.documents import Document
from citation_models import Citation, RenumberedCitation, ProcessedLLMResponse
from citation_utils import strip_html_tags # Assuming you have this helper
class CitationManager:
"""
Manages the creation, formatting, and processing of citations for the RAG pipeline.
"""
def __init__(self, search_results: List[Tuple[Document, float]], k_chunks: int):
self.k_chunks = k_chunks
self.all_citations: List[Citation] = self._create_initial_citations(search_results)
self.lookup = {c.source_num: c for c in self.all_citations}
def _create_initial_citations(self, search_results: List[Tuple[Document, float]]) -> List[Citation]:
"""Creates the initial list of Citation objects from ChromaDB results."""
citations = []
for i, (doc, score) in enumerate(search_results, 1):
source_id = doc.metadata.get("id", "Unknown")
source_parts = source_id.split(":")
# Parse the new format: file:page:paragraph:chunk
if len(source_parts) >= 4:
file_path, page_num, paragraph_num, chunk_num = source_parts[0], source_parts[1], source_parts[2], source_parts[3]
filename = file_path.split("/")[-1] if "/" in file_path else file_path
# Create a more informative page reference
page_ref = f"{page_num} (¶{paragraph_num}.{chunk_num})"
elif len(source_parts) >= 3:
file_path, page_num, paragraph_num = source_parts[0], source_parts[1], source_parts[2]
filename = file_path.split("/")[-1] if "/" in file_path else file_path
page_ref = f"{page_num} (¶{paragraph_num})"
elif len(source_parts) >= 2:
file_path, page_num = source_parts[0], source_parts[1]
filename = file_path.split("/")[-1] if "/" in file_path else file_path
page_ref = page_num
else:
filename, page_ref = "Unknown Document", "N/A"
clean_content = strip_html_tags(doc.page_content)
# Remove PDF filename references from content
clean_content = self._remove_filename_references(clean_content, filename)
citations.append(
Citation(
source_num=i,
filename=filename,
page=page_ref, # Now includes paragraph info
source_id=source_id,
relevance_score=round(1 - score, 3) if score is not None else None,
content=clean_content
)
)
return citations
def _remove_filename_references(self, content: str, filename: str) -> str:
"""Remove filename and common document metadata from content."""
import re
# Remove file extension and get base name
base_filename = filename.replace('.pdf', '').replace('.txt', '').replace('.docx', '')
# Remove various patterns that might include filename
patterns_to_remove = [
# Exact filename matches at end of content
rf'\s*{re.escape(filename)}\s*$',
rf'\s*{re.escape(base_filename)}\s*$',
# Common patterns with numbers (like "effective headline 1311")
rf'\s*{re.escape(base_filename)}\s*\d+\s*$',
# Remove trailing numbers that might be page numbers or file references
r'\s*\d{3,4}\s*$', # Remove 3-4 digit numbers at end
# Remove common document footers
r'\s*Page \d+.*$',
r'\s*p\.\s*\d+.*$',
r'\s*\d+/\d+\s*$', # Page numbers like "1/10"
# Remove trailing whitespace and cleanup
r'\s+$'
]
cleaned_content = content
for pattern in patterns_to_remove:
cleaned_content = re.sub(pattern, '', cleaned_content, flags=re.IGNORECASE)
return cleaned_content.strip()
def get_llm_context(self) -> str:
"""Formats the context string to be passed to the LLM."""
context_parts = [f"[Source {c.source_num}] {c.content}" for c in self.all_citations]
return "\n\n---\n\n".join(context_parts)
def process_response(self, response_text: str) -> ProcessedLLMResponse:
"""
Parses the LLM response, identifies used citations, renumbers them,
and returns a structured result.
"""
# Find all cited source numbers, preserving order of appearance
cited_nums_str = re.findall(r'\[Source (\d+)\]', response_text)
cited_original_nums = [int(num) for num in cited_nums_str]
# Get unique, valid citations in order of first appearance
used_citations_ordered = []
seen = set()
for num in cited_original_nums:
if num not in seen and 1 <= num <= self.k_chunks:
seen.add(num)
used_citations_ordered.append(self.lookup[num])
if not used_citations_ordered:
return ProcessedLLMResponse(renumbered_response_text=response_text, used_citations=[])
# Create a mapping from original numbers to new sequential numbers (1, 2, 3...)
renumber_map = {citation.source_num: new_num for new_num, citation in enumerate(used_citations_ordered, 1)}
# Renumber the response text using a safe, two-step replacement
renumbered_text = response_text
# Step 1: Replace valid citations with temporary, unique placeholders
for original_num, new_num in renumber_map.items():
renumbered_text = re.sub(
f'\[Source {original_num}\]',
f"__TEMP_SOURCE_{new_num}__",
renumbered_text
)
# Step 2: Remove any remaining invalid citations (outside our valid range)
# This handles cases where LLM generates citations beyond k_chunks
for num in cited_original_nums:
if num < 1 or num > self.k_chunks:
renumbered_text = re.sub(f'\[Source {num}\]', '', renumbered_text)
# Step 2.5: Remove original citations from source documents (like [23], [12], etc.)
# These are citations that existed in the original documents, not our RAG citations
renumbered_text = re.sub(r'\[(\d+)\]', '', renumbered_text)
# Step 3: Replace placeholders with final, renumbered citation format
for original_num, new_num in renumber_map.items():
renumbered_text = re.sub(
f"__TEMP_SOURCE_{new_num}__",
f"[Source {new_num}]",
renumbered_text
)
# Step 4: Clean up any extra whitespace left by removed citations
renumbered_text = re.sub(r'\s+', ' ', renumbered_text).strip()
# Create the final list of renumbered citation objects
final_citations = []
for new_num, original_citation in enumerate(used_citations_ordered, 1):
final_citations.append(
RenumberedCitation(
new_source_num=new_num,
original_source_num=original_citation.source_num,
filename=original_citation.filename,
page=original_citation.page,
relevance_score=original_citation.relevance_score,
content=original_citation.content
)
)
return ProcessedLLMResponse(
renumbered_response_text=renumbered_text,
used_citations=final_citations
)
GitHub Events
Total
- Member event: 2
- Push event: 5
- Create event: 2
Last Year
- Member event: 2
- Push event: 5
- Create event: 2
Dependencies
- 1330 dependencies
- @types/react ^18.2.15 development
- @types/react-dom ^18.2.7 development
- @testing-library/jest-dom ^5.16.4
- @testing-library/react ^13.3.0
- @testing-library/user-event ^13.5.0
- autoprefixer ^10.4.14
- lucide-react ^0.263.1
- pdfjs-dist ^3.11.174
- postcss ^8.4.24
- react ^18.2.0
- react-dom ^18.2.0
- react-pdf ^7.5.1
- react-router-dom ^6.8.1
- react-scripts 5.0.1
- tailwindcss ^3.3.2
- chromadb ==0.4.18
- ollama ==0.1.7
- pathlib2 >=2.3.0
- pdfplumber ==0.10.3
- psutil ==5.9.6
- sentence-transformers ==2.2.2
- torch >=1.9.0
- tqdm ==4.66.1
- transformers >=4.21.0
- urllib3 >=1.26.0