jarvis_ai

AI Platform with multi LLM Chat and Agent creation capabilities

https://github.com/mantanz/jarvis_ai

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (5.7%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

AI Platform with multi LLM Chat and Agent creation capabilities

Basic Info

Host: GitHub
Owner: mantanz
Language: Python
Default Branch: main
Size: 6.26 MB

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0

Created about 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme Citation

🚀 Modern RAG Pipeline - React + FastAPI

A complete transformation of the RAG pipeline from Streamlit to a modern React frontend with FastAPI backend. This implementation provides a robust, scalable, and maintainable architecture for document processing and intelligent querying.

🏗️ ARCHITECTURE OVERVIEW

┌─────────────────────────────────────────────────────────────────┐ │ Frontend (React) │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ Chat │ │ Document │ │ PDF │ │ │ │ Interface │ │ Manager │ │ Viewer │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │ │ │ React Context │ │ (State) │ └─────────────────────────┼───────────────────────────────────────┘ │ HTTP/REST API ┌─────────────────────────┼───────────────────────────────────────┐ │ Backend (FastAPI) │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ Document │ │ RAG │ │ Citation │ │ │ │ Processing │ │ Pipeline │ │ Management │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ └─────────────────────────┼───────────────────────────────────────┘ │ ┌─────────────────────────┼───────────────────────────────────────┐ │ Data Layer │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ ChromaDB │ │ Vector │ │ PDF │ │ │ │ Database │ │ Embeddings │ │ Storage │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ └─────────────────────────────────────────────────────────────────┘

✨ KEY FEATURES

🎯 Frontend (React)

Modern UI/UX: Beautiful, responsive interface with Tailwind CSS
Real-time Chat: Interactive chat interface with typing indicators
Drag & Drop Upload: Intuitive document upload with progress tracking
Smart Citations: Interactive citations with hover tooltips and navigation
PDF Integration: Embedded PDF viewer with chunk highlighting
State Management: Centralized state with React Context
Error Handling: Comprehensive error boundaries and user feedback

⚡ Backend (FastAPI)

RESTful API: Clean, documented API endpoints
Async Processing: Non-blocking document processing
Auto Documentation: Interactive API docs at /docs
CORS Support: Configured for frontend integration
File Management: Efficient PDF storage and retrieval
Citation Enhancement: Advanced citation processing and navigation

🧠 RAG Pipeline

Advanced Chunking: Paragraph-aware document processing
Vector Search: ChromaDB integration for semantic search
Citation Tracking: Source attribution with page references
LLM Integration: Ollama support for local inference
Smart Renumbering: Dynamic citation reorganization

🚀 QUICK START

Option 1: Full Stack (Recommended)

```bash

Start both backend and frontend

./startfullapp.sh ```

Option 2: Individual Services

```bash

Terminal 1: Start Backend

./start_backend.sh

Terminal 2: Start Frontend

./start_frontend.sh ```

Access Points

🌐 Frontend App: http://localhost:3000
📖 Backend API: http://localhost:8000
📚 API Documentation: http://localhost:8000/docs

📋 PREREQUISITES

System Requirements

Python: 3.8+ (for backend)
Node.js: 16+ (for frontend)
npm: 8+ (for package management)

External Services

Ollama: Local LLM inference bash # Install Ollama from https://ollama.com/ ollama pull llama3.2:latest ollama serve

📦 INSTALLATION

1. Backend Setup

```bash

Create virtual environment

python -m venv ragenv source ragenv/bin/activate # Windows: rag_env\Scripts\activate

Install dependencies

pip install -r requirements.txt ```

2. Frontend Setup

```bash

Install Node.js dependencies

npm install ```

3. Environment Configuration

Create environment variables: ```bash

For React (can be set in scripts)

export REACTAPPAPI_URL=http://localhost:8000 ```

🏗️ PROJECT STRUCTURE

📁 Docs_RAG/ ├── 🐍 Backend (Python/FastAPI) │ ├── main.py # FastAPI application │ ├── processing.py # Document processing │ ├── query_data.py # RAG pipeline │ ├── citation_manager.py # Citation management │ ├── document_service.py # Document utilities │ └── requirements.txt # Python dependencies │ ├── ⚛️ Frontend (React) │ ├── src/ │ │ ├── components/ │ │ │ ├── chat/ # Chat interface │ │ │ │ ├── ChatInterface.js │ │ │ │ ├── MessageBubble.js │ │ │ │ └── CitationTooltip.js │ │ │ ├── ui/ # Reusable UI components │ │ │ │ ├── Button.js │ │ │ │ ├── LoadingSpinner.js │ │ │ │ └── DocumentUploader.js │ │ │ └── MainInterface.js # Main application layout │ │ ├── contexts/ │ │ │ └── AppContext.js # Global state management │ │ ├── services/ │ │ │ └── api.js # API communication │ │ └── App.js # Root component │ ├── public/ │ └── package.json # Node.js dependencies │ ├── 🚀 Scripts │ ├── start_backend.sh # Backend startup │ ├── start_frontend.sh # Frontend startup │ └── start_full_app.sh # Full stack startup │ └── 📊 Data ├── data/ # PDF document storage └── chroma/ # Vector database

🔧 COMPONENT ARCHITECTURE

React Components

Core Layout

MainInterface: Main application shell with sidebar and tabs
ChatInterface: Real-time chat with RAG pipeline
DocumentUploader: Drag & drop file upload with progress

UI Components

Button: Configurable button with variants and loading states
LoadingSpinner: Reusable loading indicators
MessageBubble: Chat messages with citation integration

Citation System

CitationTooltip: Interactive citation previews
Smart parsing of [Source X] patterns
Click-to-navigate functionality

API Endpoints

Document Management

http POST /documents/upload # Upload and process PDFs GET /documents # List all documents GET /documents/{id}/info # Get document details DELETE /documents/clear # Clear all documents

RAG Operations

http POST /query # Perform RAG query GET /citations/{id}/navigate # Get citation navigation

Utilities

http GET /health # Health check GET /documents/{id}/base64 # Get PDF as base64

🎨 UI/UX FEATURES

Modern Design

Tailwind CSS: Utility-first styling
Framer Motion: Smooth animations and transitions
Responsive Layout: Mobile-friendly design
Collapsible Sidebar: Space-efficient navigation

User Experience

Real-time Feedback: Loading states and progress indicators
Toast Notifications: Success/error messages
Keyboard Shortcuts: Cmd/Ctrl+Enter to send messages
Auto-scroll: Chat automatically scrolls to new messages

Accessibility

Semantic HTML: Proper ARIA labels and roles
Focus Management: Keyboard navigation support
Screen Reader: Compatible with assistive technologies

📚 USAGE GUIDE

1. Document Upload

Navigate to the "Upload" tab
Drag & drop PDF files or click to browse
Monitor upload progress
Files are automatically processed and vectorized

2. Querying Documents

Switch to "Chat" tab
Type your question in the input field
Press Enter or click Send
View results with interactive citations

3. Citation Navigation

Click on [Source X] tags in responses
View citation details in tooltips
Navigate to specific document pages
Explore source content in PDF viewer

4. Document Management

Use "Documents" tab to view uploaded files
Click "View" to open PDF viewer
Clear individual or all documents
Monitor document status and metadata

🔧 CONFIGURATION

Backend Configuration

```python

main.py - CORS settings

app.addmiddleware( CORSMiddleware, alloworigins=["http://localhost:3000"], allowcredentials=True, allowmethods=[""], allow_headers=[""], ) ```

Frontend Configuration

javascript // src/services/api.js - API base URL const API_BASE_URL = process.env.REACT_APP_API_URL || 'http://localhost:8000';

Processing Parameters

```python

processing.py - Chunking configuration

textsplitter = RecursiveCharacterTextSplitter( chunksize=800, # Adjust for your documents chunk_overlap=80, # Maintain context overlap separators=["\n\n", "\n"] # Paragraph-aware splitting ) ```

🚀 DEPLOYMENT

Development

```bash

Full development stack

./startfullapp.sh ```

Production

```bash

Backend (production)

uvicorn main:app --host 0.0.0.0 --port 8000

Frontend (build)

npm run build npx serve -s build -l 3000 ```

🔍 TROUBLESHOOTING

Common Issues

Backend Connection

```bash

Check if FastAPI is running

curl http://localhost:8000/health

Restart backend

./start_backend.sh ```

Frontend Issues

```bash

Clear node modules and reinstall

rm -rf node_modules package-lock.json npm install

Restart development server

npm start ```

Ollama Problems

```bash

Ensure Ollama is running

ollama serve

Pull/update model

ollama pull llama3.2:latest ```

Performance Optimization

Backend

Adjust chunk sizes for your document types
Optimize embedding model selection
Configure ChromaDB persistence settings

Frontend

Enable React production build
Implement code splitting for large components
Optimize image and asset loading

🌟 ADVANCED FEATURES

Citation Enhancement

Source Tracking: Full document lineage
Page References: Exact page and paragraph locations
Relevance Scoring: ML-based relevance metrics
Content Previews: Rich tooltip content

PDF Integration

Text Layer Highlighting: Precise text selection
Chunk Navigation: Jump to specific content sections
Cross-platform Support: Works on all modern browsers
Fallback Strategies: Multiple highlighting approaches

State Management

Centralized Store: React Context with useReducer
Optimistic Updates: Immediate UI feedback
Error Recovery: Graceful error handling
Connection Monitoring: Real-time backend status

🧪 TESTING

Backend Testing

```bash

Test API health

curl http://localhost:8000/health

Test document upload

curl -X POST -F "files=@document.pdf" http://localhost:8000/documents/upload

Test query

curl -X POST -H "Content-Type: application/json" \ -d '{"query": "test question"}' \ http://localhost:8000/query ```

Frontend Testing

```bash

Run React tests

npm test

Manual testing

1. Upload documents via UI

2. Send queries in chat

3. Test citation navigation

```

📖 API DOCUMENTATION

Visit http://localhost:8000/docs for interactive API documentation with: - Request/Response Schemas - Try It Out functionality - Model Definitions - Error Codes reference

🤝 CONTRIBUTING

Fork the repository
Create a feature branch
Implement your changes
Test thoroughly
Submit a pull request

Development Guidelines

Follow React hooks patterns
Use TypeScript for new components (future enhancement)
Maintain FastAPI async patterns
Add comprehensive error handling

📄 LICENSE

MIT License - see LICENSE file for details.

🙏 ACKNOWLEDGMENTS

React Team: For the amazing frontend framework
FastAPI: For the modern Python web framework
ChromaDB: For vector database capabilities
Tailwind CSS: For utility-first styling
Framer Motion: For smooth animations
Ollama: For local LLM integration
Langchain: For document processing utilities

🚀 Ready to explore intelligent document analysis with modern web technologies!

🎯 TRANSFORMATION SUMMARY

This project successfully transforms a Streamlit-based RAG pipeline into a modern, scalable React + FastAPI architecture:

Before (Streamlit)

❌ Server-side rendering
❌ Limited customization
❌ Monolithic architecture
❌ Basic UI components

After (React + FastAPI)

✅ Client-side React application
✅ Fully customizable UI/UX
✅ Microservices architecture
✅ Modern component system
✅ Real-time interactions
✅ Professional deployment ready

The new architecture maintains all original RAG functionality while providing a superior user experience and development workflow.

Owner

Name: Manish Taneja
Login: mantanz
Kind: organization
Location: India

Repositories: 2
Profile: https://github.com/mantanz

Citation (citation_manager.py)

# citation_manager.py

import re
from typing import List, Tuple
from langchain_core.documents import Document

from citation_models import Citation, RenumberedCitation, ProcessedLLMResponse
from citation_utils import strip_html_tags # Assuming you have this helper

class CitationManager:
    """
    Manages the creation, formatting, and processing of citations for the RAG pipeline.
    """
    def __init__(self, search_results: List[Tuple[Document, float]], k_chunks: int):
        self.k_chunks = k_chunks
        self.all_citations: List[Citation] = self._create_initial_citations(search_results)
        self.lookup = {c.source_num: c for c in self.all_citations}

    def _create_initial_citations(self, search_results: List[Tuple[Document, float]]) -> List[Citation]:
        """Creates the initial list of Citation objects from ChromaDB results."""
        citations = []
        for i, (doc, score) in enumerate(search_results, 1):
            source_id = doc.metadata.get("id", "Unknown")
            source_parts = source_id.split(":")
            
            # Parse the new format: file:page:paragraph:chunk
            if len(source_parts) >= 4:
                file_path, page_num, paragraph_num, chunk_num = source_parts[0], source_parts[1], source_parts[2], source_parts[3]
                filename = file_path.split("/")[-1] if "/" in file_path else file_path
                # Create a more informative page reference
                page_ref = f"{page_num} (¶{paragraph_num}.{chunk_num})"
            elif len(source_parts) >= 3:
                file_path, page_num, paragraph_num = source_parts[0], source_parts[1], source_parts[2]
                filename = file_path.split("/")[-1] if "/" in file_path else file_path
                page_ref = f"{page_num} (¶{paragraph_num})"
            elif len(source_parts) >= 2:
                file_path, page_num = source_parts[0], source_parts[1]
                filename = file_path.split("/")[-1] if "/" in file_path else file_path
                page_ref = page_num
            else:
                filename, page_ref = "Unknown Document", "N/A"
            
            clean_content = strip_html_tags(doc.page_content)
            
            # Remove PDF filename references from content
            clean_content = self._remove_filename_references(clean_content, filename)
            
            citations.append(
                Citation(
                    source_num=i,
                    filename=filename,
                    page=page_ref,  # Now includes paragraph info
                    source_id=source_id,
                    relevance_score=round(1 - score, 3) if score is not None else None,
                    content=clean_content
                )
            )
        return citations

    def _remove_filename_references(self, content: str, filename: str) -> str:
        """Remove filename and common document metadata from content."""
        import re
        
        # Remove file extension and get base name
        base_filename = filename.replace('.pdf', '').replace('.txt', '').replace('.docx', '')
        
        # Remove various patterns that might include filename
        patterns_to_remove = [
            # Exact filename matches at end of content
            rf'\s*{re.escape(filename)}\s*$',
            rf'\s*{re.escape(base_filename)}\s*$',
            # Common patterns with numbers (like "effective headline 1311")
            rf'\s*{re.escape(base_filename)}\s*\d+\s*$',
            # Remove trailing numbers that might be page numbers or file references
            r'\s*\d{3,4}\s*$',  # Remove 3-4 digit numbers at end
            # Remove common document footers
            r'\s*Page \d+.*$',
            r'\s*p\.\s*\d+.*$',
            r'\s*\d+/\d+\s*$',  # Page numbers like "1/10"
            # Remove trailing whitespace and cleanup
            r'\s+$'
        ]
        
        cleaned_content = content
        for pattern in patterns_to_remove:
            cleaned_content = re.sub(pattern, '', cleaned_content, flags=re.IGNORECASE)
        
        return cleaned_content.strip()

    def get_llm_context(self) -> str:
        """Formats the context string to be passed to the LLM."""
        context_parts = [f"[Source {c.source_num}] {c.content}" for c in self.all_citations]
        return "\n\n---\n\n".join(context_parts)

    def process_response(self, response_text: str) -> ProcessedLLMResponse:
        """
        Parses the LLM response, identifies used citations, renumbers them,
        and returns a structured result.
        """
        # Find all cited source numbers, preserving order of appearance
        cited_nums_str = re.findall(r'\[Source (\d+)\]', response_text)
        cited_original_nums = [int(num) for num in cited_nums_str]
        
        # Get unique, valid citations in order of first appearance
        used_citations_ordered = []
        seen = set()
        for num in cited_original_nums:
            if num not in seen and 1 <= num <= self.k_chunks:
                seen.add(num)
                used_citations_ordered.append(self.lookup[num])

        if not used_citations_ordered:
            return ProcessedLLMResponse(renumbered_response_text=response_text, used_citations=[])

        # Create a mapping from original numbers to new sequential numbers (1, 2, 3...)
        renumber_map = {citation.source_num: new_num for new_num, citation in enumerate(used_citations_ordered, 1)}

        # Renumber the response text using a safe, two-step replacement
        renumbered_text = response_text
        
        # Step 1: Replace valid citations with temporary, unique placeholders
        for original_num, new_num in renumber_map.items():
            renumbered_text = re.sub(
                f'\[Source {original_num}\]', 
                f"__TEMP_SOURCE_{new_num}__", 
                renumbered_text
            )
        
        # Step 2: Remove any remaining invalid citations (outside our valid range)
        # This handles cases where LLM generates citations beyond k_chunks
        for num in cited_original_nums:
            if num < 1 or num > self.k_chunks:
                renumbered_text = re.sub(f'\[Source {num}\]', '', renumbered_text)
        
        # Step 2.5: Remove original citations from source documents (like [23], [12], etc.)
        # These are citations that existed in the original documents, not our RAG citations
        renumbered_text = re.sub(r'\[(\d+)\]', '', renumbered_text)
        
        # Step 3: Replace placeholders with final, renumbered citation format
        for original_num, new_num in renumber_map.items():
            renumbered_text = re.sub(
                f"__TEMP_SOURCE_{new_num}__",
                f"[Source {new_num}]",
                renumbered_text
            )
        
        # Step 4: Clean up any extra whitespace left by removed citations
        renumbered_text = re.sub(r'\s+', ' ', renumbered_text).strip()
        
        # Create the final list of renumbered citation objects
        final_citations = []
        for new_num, original_citation in enumerate(used_citations_ordered, 1):
            final_citations.append(
                RenumberedCitation(
                    new_source_num=new_num,
                    original_source_num=original_citation.source_num,
                    filename=original_citation.filename,
                    page=original_citation.page,
                    relevance_score=original_citation.relevance_score,
                    content=original_citation.content
                )
            )
        
        return ProcessedLLMResponse(
            renumbered_response_text=renumbered_text,
            used_citations=final_citations
        )

GitHub Events

Total

Member event: 2
Push event: 5
Create event: 2

Last Year

Member event: 2
Push event: 5
Create event: 2

Dependencies

package-lock.json npm

1330 dependencies

package.json npm

@types/react ^18.2.15 development
@types/react-dom ^18.2.7 development
@testing-library/jest-dom ^5.16.4
@testing-library/react ^13.3.0
@testing-library/user-event ^13.5.0
autoprefixer ^10.4.14
lucide-react ^0.263.1
pdfjs-dist ^3.11.174
postcss ^8.4.24
react ^18.2.0
react-dom ^18.2.0
react-pdf ^7.5.1
react-router-dom ^6.8.1
react-scripts 5.0.1
tailwindcss ^3.3.2

requirements.txt pypi

chromadb ==0.4.18
ollama ==0.1.7
pathlib2 >=2.3.0
pdfplumber ==0.10.3
psutil ==5.9.6
sentence-transformers ==2.2.2
torch >=1.9.0
tqdm ==4.66.1
transformers >=4.21.0
urllib3 >=1.26.0

jarvis_ai

Science Score: 44.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

🚀 Modern RAG Pipeline - React + FastAPI

🏗️ ARCHITECTURE OVERVIEW

✨ KEY FEATURES

🎯 Frontend (React)

⚡ Backend (FastAPI)

🧠 RAG Pipeline

🚀 QUICK START

Option 1: Full Stack (Recommended)

Start both backend and frontend

Option 2: Individual Services

Terminal 1: Start Backend

Terminal 2: Start Frontend

Access Points

📋 PREREQUISITES

System Requirements

External Services

📦 INSTALLATION

1. Backend Setup

Create virtual environment

Install dependencies

2. Frontend Setup

Install Node.js dependencies

3. Environment Configuration

For React (can be set in scripts)

🏗️ PROJECT STRUCTURE

🔧 COMPONENT ARCHITECTURE

React Components

Core Layout

UI Components

Citation System

API Endpoints

Document Management

RAG Operations

Utilities

🎨 UI/UX FEATURES

Modern Design

User Experience

Accessibility

📚 USAGE GUIDE

1. Document Upload

2. Querying Documents

3. Citation Navigation

4. Document Management

🔧 CONFIGURATION

Backend Configuration

main.py - CORS settings

Frontend Configuration

Processing Parameters

processing.py - Chunking configuration

🚀 DEPLOYMENT

Development

Full development stack

Production

Backend (production)

Frontend (build)

🔍 TROUBLESHOOTING

Common Issues

Backend Connection

Check if FastAPI is running

Restart backend

Frontend Issues

Clear node modules and reinstall

Restart development server

Ollama Problems

Ensure Ollama is running

Pull/update model

Performance Optimization

Backend

Frontend

🌟 ADVANCED FEATURES

Citation Enhancement

PDF Integration

State Management