jarvis_ai

AI Platform with multi LLM Chat and Agent creation capabilities

https://github.com/mantanz/jarvis_ai

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (5.7%) to scientific vocabulary
Last synced: 9 months ago · JSON representation ·

Repository

AI Platform with multi LLM Chat and Agent creation capabilities

Basic Info
  • Host: GitHub
  • Owner: mantanz
  • Language: Python
  • Default Branch: main
  • Size: 6.26 MB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 11 months ago · Last pushed 11 months ago
Metadata Files
Readme Citation

README.md

🚀 Modern RAG Pipeline - React + FastAPI

A complete transformation of the RAG pipeline from Streamlit to a modern React frontend with FastAPI backend. This implementation provides a robust, scalable, and maintainable architecture for document processing and intelligent querying.

🏗️ ARCHITECTURE OVERVIEW

┌─────────────────────────────────────────────────────────────────┐ │ Frontend (React) │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ Chat │ │ Document │ │ PDF │ │ │ │ Interface │ │ Manager │ │ Viewer │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │ │ │ React Context │ │ (State) │ └─────────────────────────┼───────────────────────────────────────┘ │ HTTP/REST API ┌─────────────────────────┼───────────────────────────────────────┐ │ Backend (FastAPI) │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ Document │ │ RAG │ │ Citation │ │ │ │ Processing │ │ Pipeline │ │ Management │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ └─────────────────────────┼───────────────────────────────────────┘ │ ┌─────────────────────────┼───────────────────────────────────────┐ │ Data Layer │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ ChromaDB │ │ Vector │ │ PDF │ │ │ │ Database │ │ Embeddings │ │ Storage │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ └─────────────────────────────────────────────────────────────────┘

KEY FEATURES

🎯 Frontend (React)

  • Modern UI/UX: Beautiful, responsive interface with Tailwind CSS
  • Real-time Chat: Interactive chat interface with typing indicators
  • Drag & Drop Upload: Intuitive document upload with progress tracking
  • Smart Citations: Interactive citations with hover tooltips and navigation
  • PDF Integration: Embedded PDF viewer with chunk highlighting
  • State Management: Centralized state with React Context
  • Error Handling: Comprehensive error boundaries and user feedback

Backend (FastAPI)

  • RESTful API: Clean, documented API endpoints
  • Async Processing: Non-blocking document processing
  • Auto Documentation: Interactive API docs at /docs
  • CORS Support: Configured for frontend integration
  • File Management: Efficient PDF storage and retrieval
  • Citation Enhancement: Advanced citation processing and navigation

🧠 RAG Pipeline

  • Advanced Chunking: Paragraph-aware document processing
  • Vector Search: ChromaDB integration for semantic search
  • Citation Tracking: Source attribution with page references
  • LLM Integration: Ollama support for local inference
  • Smart Renumbering: Dynamic citation reorganization

🚀 QUICK START

Option 1: Full Stack (Recommended)

```bash

Start both backend and frontend

./startfullapp.sh ```

Option 2: Individual Services

```bash

Terminal 1: Start Backend

./start_backend.sh

Terminal 2: Start Frontend

./start_frontend.sh ```

Access Points

  • 🌐 Frontend App: http://localhost:3000
  • 📖 Backend API: http://localhost:8000
  • 📚 API Documentation: http://localhost:8000/docs

📋 PREREQUISITES

System Requirements

  • Python: 3.8+ (for backend)
  • Node.js: 16+ (for frontend)
  • npm: 8+ (for package management)

External Services

  • Ollama: Local LLM inference bash # Install Ollama from https://ollama.com/ ollama pull llama3.2:latest ollama serve

📦 INSTALLATION

1. Backend Setup

```bash

Create virtual environment

python -m venv ragenv source ragenv/bin/activate # Windows: rag_env\Scripts\activate

Install dependencies

pip install -r requirements.txt ```

2. Frontend Setup

```bash

Install Node.js dependencies

npm install ```

3. Environment Configuration

Create environment variables: ```bash

For React (can be set in scripts)

export REACTAPPAPI_URL=http://localhost:8000 ```

🏗️ PROJECT STRUCTURE

📁 Docs_RAG/ ├── 🐍 Backend (Python/FastAPI) │ ├── main.py # FastAPI application │ ├── processing.py # Document processing │ ├── query_data.py # RAG pipeline │ ├── citation_manager.py # Citation management │ ├── document_service.py # Document utilities │ └── requirements.txt # Python dependencies │ ├── ⚛️ Frontend (React) │ ├── src/ │ │ ├── components/ │ │ │ ├── chat/ # Chat interface │ │ │ │ ├── ChatInterface.js │ │ │ │ ├── MessageBubble.js │ │ │ │ └── CitationTooltip.js │ │ │ ├── ui/ # Reusable UI components │ │ │ │ ├── Button.js │ │ │ │ ├── LoadingSpinner.js │ │ │ │ └── DocumentUploader.js │ │ │ └── MainInterface.js # Main application layout │ │ ├── contexts/ │ │ │ └── AppContext.js # Global state management │ │ ├── services/ │ │ │ └── api.js # API communication │ │ └── App.js # Root component │ ├── public/ │ └── package.json # Node.js dependencies │ ├── 🚀 Scripts │ ├── start_backend.sh # Backend startup │ ├── start_frontend.sh # Frontend startup │ └── start_full_app.sh # Full stack startup │ └── 📊 Data ├── data/ # PDF document storage └── chroma/ # Vector database

🔧 COMPONENT ARCHITECTURE

React Components

Core Layout

  • MainInterface: Main application shell with sidebar and tabs
  • ChatInterface: Real-time chat with RAG pipeline
  • DocumentUploader: Drag & drop file upload with progress

UI Components

  • Button: Configurable button with variants and loading states
  • LoadingSpinner: Reusable loading indicators
  • MessageBubble: Chat messages with citation integration

Citation System

  • CitationTooltip: Interactive citation previews
  • Smart parsing of [Source X] patterns
  • Click-to-navigate functionality

API Endpoints

Document Management

http POST /documents/upload # Upload and process PDFs GET /documents # List all documents GET /documents/{id}/info # Get document details DELETE /documents/clear # Clear all documents

RAG Operations

http POST /query # Perform RAG query GET /citations/{id}/navigate # Get citation navigation

Utilities

http GET /health # Health check GET /documents/{id}/base64 # Get PDF as base64

🎨 UI/UX FEATURES

Modern Design

  • Tailwind CSS: Utility-first styling
  • Framer Motion: Smooth animations and transitions
  • Responsive Layout: Mobile-friendly design
  • Collapsible Sidebar: Space-efficient navigation

User Experience

  • Real-time Feedback: Loading states and progress indicators
  • Toast Notifications: Success/error messages
  • Keyboard Shortcuts: Cmd/Ctrl+Enter to send messages
  • Auto-scroll: Chat automatically scrolls to new messages

Accessibility

  • Semantic HTML: Proper ARIA labels and roles
  • Focus Management: Keyboard navigation support
  • Screen Reader: Compatible with assistive technologies

📚 USAGE GUIDE

1. Document Upload

  1. Navigate to the "Upload" tab
  2. Drag & drop PDF files or click to browse
  3. Monitor upload progress
  4. Files are automatically processed and vectorized

2. Querying Documents

  1. Switch to "Chat" tab
  2. Type your question in the input field
  3. Press Enter or click Send
  4. View results with interactive citations

3. Citation Navigation

  1. Click on [Source X] tags in responses
  2. View citation details in tooltips
  3. Navigate to specific document pages
  4. Explore source content in PDF viewer

4. Document Management

  1. Use "Documents" tab to view uploaded files
  2. Click "View" to open PDF viewer
  3. Clear individual or all documents
  4. Monitor document status and metadata

🔧 CONFIGURATION

Backend Configuration

```python

main.py - CORS settings

app.addmiddleware( CORSMiddleware, alloworigins=["http://localhost:3000"], allowcredentials=True, allowmethods=[""], allow_headers=[""], ) ```

Frontend Configuration

javascript // src/services/api.js - API base URL const API_BASE_URL = process.env.REACT_APP_API_URL || 'http://localhost:8000';

Processing Parameters

```python

processing.py - Chunking configuration

textsplitter = RecursiveCharacterTextSplitter( chunksize=800, # Adjust for your documents chunk_overlap=80, # Maintain context overlap separators=["\n\n", "\n"] # Paragraph-aware splitting ) ```

🚀 DEPLOYMENT

Development

```bash

Full development stack

./startfullapp.sh ```

Production

```bash

Backend (production)

uvicorn main:app --host 0.0.0.0 --port 8000

Frontend (build)

npm run build npx serve -s build -l 3000 ```

🔍 TROUBLESHOOTING

Common Issues

Backend Connection

```bash

Check if FastAPI is running

curl http://localhost:8000/health

Restart backend

./start_backend.sh ```

Frontend Issues

```bash

Clear node modules and reinstall

rm -rf node_modules package-lock.json npm install

Restart development server

npm start ```

Ollama Problems

```bash

Ensure Ollama is running

ollama serve

Pull/update model

ollama pull llama3.2:latest ```

Performance Optimization

Backend

  • Adjust chunk sizes for your document types
  • Optimize embedding model selection
  • Configure ChromaDB persistence settings

Frontend

  • Enable React production build
  • Implement code splitting for large components
  • Optimize image and asset loading

🌟 ADVANCED FEATURES

Citation Enhancement

  • Source Tracking: Full document lineage
  • Page References: Exact page and paragraph locations
  • Relevance Scoring: ML-based relevance metrics
  • Content Previews: Rich tooltip content

PDF Integration

  • Text Layer Highlighting: Precise text selection
  • Chunk Navigation: Jump to specific content sections
  • Cross-platform Support: Works on all modern browsers
  • Fallback Strategies: Multiple highlighting approaches

State Management

  • Centralized Store: React Context with useReducer
  • Optimistic Updates: Immediate UI feedback
  • Error Recovery: Graceful error handling
  • Connection Monitoring: Real-time backend status

🧪 TESTING

Backend Testing

```bash

Test API health

curl http://localhost:8000/health

Test document upload

curl -X POST -F "files=@document.pdf" http://localhost:8000/documents/upload

Test query

curl -X POST -H "Content-Type: application/json" \ -d '{"query": "test question"}' \ http://localhost:8000/query ```

Frontend Testing

```bash

Run React tests

npm test

Manual testing

1. Upload documents via UI

2. Send queries in chat

3. Test citation navigation

```

📖 API DOCUMENTATION

Visit http://localhost:8000/docs for interactive API documentation with: - Request/Response Schemas - Try It Out functionality - Model Definitions - Error Codes reference

🤝 CONTRIBUTING

  1. Fork the repository
  2. Create a feature branch
  3. Implement your changes
  4. Test thoroughly
  5. Submit a pull request

Development Guidelines

  • Follow React hooks patterns
  • Use TypeScript for new components (future enhancement)
  • Maintain FastAPI async patterns
  • Add comprehensive error handling

📄 LICENSE

MIT License - see LICENSE file for details.

🙏 ACKNOWLEDGMENTS

  • React Team: For the amazing frontend framework
  • FastAPI: For the modern Python web framework
  • ChromaDB: For vector database capabilities
  • Tailwind CSS: For utility-first styling
  • Framer Motion: For smooth animations
  • Ollama: For local LLM integration
  • Langchain: For document processing utilities

🚀 Ready to explore intelligent document analysis with modern web technologies!

🎯 TRANSFORMATION SUMMARY

This project successfully transforms a Streamlit-based RAG pipeline into a modern, scalable React + FastAPI architecture:

Before (Streamlit)

  • ❌ Server-side rendering
  • ❌ Limited customization
  • ❌ Monolithic architecture
  • ❌ Basic UI components

After (React + FastAPI)

  • ✅ Client-side React application
  • ✅ Fully customizable UI/UX
  • ✅ Microservices architecture
  • ✅ Modern component system
  • ✅ Real-time interactions
  • ✅ Professional deployment ready

The new architecture maintains all original RAG functionality while providing a superior user experience and development workflow.

Owner

  • Name: Manish Taneja
  • Login: mantanz
  • Kind: organization
  • Location: India

Citation (citation_manager.py)

# citation_manager.py

import re
from typing import List, Tuple
from langchain_core.documents import Document

from citation_models import Citation, RenumberedCitation, ProcessedLLMResponse
from citation_utils import strip_html_tags # Assuming you have this helper

class CitationManager:
    """
    Manages the creation, formatting, and processing of citations for the RAG pipeline.
    """
    def __init__(self, search_results: List[Tuple[Document, float]], k_chunks: int):
        self.k_chunks = k_chunks
        self.all_citations: List[Citation] = self._create_initial_citations(search_results)
        self.lookup = {c.source_num: c for c in self.all_citations}

    def _create_initial_citations(self, search_results: List[Tuple[Document, float]]) -> List[Citation]:
        """Creates the initial list of Citation objects from ChromaDB results."""
        citations = []
        for i, (doc, score) in enumerate(search_results, 1):
            source_id = doc.metadata.get("id", "Unknown")
            source_parts = source_id.split(":")
            
            # Parse the new format: file:page:paragraph:chunk
            if len(source_parts) >= 4:
                file_path, page_num, paragraph_num, chunk_num = source_parts[0], source_parts[1], source_parts[2], source_parts[3]
                filename = file_path.split("/")[-1] if "/" in file_path else file_path
                # Create a more informative page reference
                page_ref = f"{page_num} (¶{paragraph_num}.{chunk_num})"
            elif len(source_parts) >= 3:
                file_path, page_num, paragraph_num = source_parts[0], source_parts[1], source_parts[2]
                filename = file_path.split("/")[-1] if "/" in file_path else file_path
                page_ref = f"{page_num} (¶{paragraph_num})"
            elif len(source_parts) >= 2:
                file_path, page_num = source_parts[0], source_parts[1]
                filename = file_path.split("/")[-1] if "/" in file_path else file_path
                page_ref = page_num
            else:
                filename, page_ref = "Unknown Document", "N/A"
            
            clean_content = strip_html_tags(doc.page_content)
            
            # Remove PDF filename references from content
            clean_content = self._remove_filename_references(clean_content, filename)
            
            citations.append(
                Citation(
                    source_num=i,
                    filename=filename,
                    page=page_ref,  # Now includes paragraph info
                    source_id=source_id,
                    relevance_score=round(1 - score, 3) if score is not None else None,
                    content=clean_content
                )
            )
        return citations

    def _remove_filename_references(self, content: str, filename: str) -> str:
        """Remove filename and common document metadata from content."""
        import re
        
        # Remove file extension and get base name
        base_filename = filename.replace('.pdf', '').replace('.txt', '').replace('.docx', '')
        
        # Remove various patterns that might include filename
        patterns_to_remove = [
            # Exact filename matches at end of content
            rf'\s*{re.escape(filename)}\s*$',
            rf'\s*{re.escape(base_filename)}\s*$',
            # Common patterns with numbers (like "effective headline 1311")
            rf'\s*{re.escape(base_filename)}\s*\d+\s*$',
            # Remove trailing numbers that might be page numbers or file references
            r'\s*\d{3,4}\s*$',  # Remove 3-4 digit numbers at end
            # Remove common document footers
            r'\s*Page \d+.*$',
            r'\s*p\.\s*\d+.*$',
            r'\s*\d+/\d+\s*$',  # Page numbers like "1/10"
            # Remove trailing whitespace and cleanup
            r'\s+$'
        ]
        
        cleaned_content = content
        for pattern in patterns_to_remove:
            cleaned_content = re.sub(pattern, '', cleaned_content, flags=re.IGNORECASE)
        
        return cleaned_content.strip()

    def get_llm_context(self) -> str:
        """Formats the context string to be passed to the LLM."""
        context_parts = [f"[Source {c.source_num}] {c.content}" for c in self.all_citations]
        return "\n\n---\n\n".join(context_parts)

    def process_response(self, response_text: str) -> ProcessedLLMResponse:
        """
        Parses the LLM response, identifies used citations, renumbers them,
        and returns a structured result.
        """
        # Find all cited source numbers, preserving order of appearance
        cited_nums_str = re.findall(r'\[Source (\d+)\]', response_text)
        cited_original_nums = [int(num) for num in cited_nums_str]
        
        # Get unique, valid citations in order of first appearance
        used_citations_ordered = []
        seen = set()
        for num in cited_original_nums:
            if num not in seen and 1 <= num <= self.k_chunks:
                seen.add(num)
                used_citations_ordered.append(self.lookup[num])

        if not used_citations_ordered:
            return ProcessedLLMResponse(renumbered_response_text=response_text, used_citations=[])

        # Create a mapping from original numbers to new sequential numbers (1, 2, 3...)
        renumber_map = {citation.source_num: new_num for new_num, citation in enumerate(used_citations_ordered, 1)}

        # Renumber the response text using a safe, two-step replacement
        renumbered_text = response_text
        
        # Step 1: Replace valid citations with temporary, unique placeholders
        for original_num, new_num in renumber_map.items():
            renumbered_text = re.sub(
                f'\[Source {original_num}\]', 
                f"__TEMP_SOURCE_{new_num}__", 
                renumbered_text
            )
        
        # Step 2: Remove any remaining invalid citations (outside our valid range)
        # This handles cases where LLM generates citations beyond k_chunks
        for num in cited_original_nums:
            if num < 1 or num > self.k_chunks:
                renumbered_text = re.sub(f'\[Source {num}\]', '', renumbered_text)
        
        # Step 2.5: Remove original citations from source documents (like [23], [12], etc.)
        # These are citations that existed in the original documents, not our RAG citations
        renumbered_text = re.sub(r'\[(\d+)\]', '', renumbered_text)
        
        # Step 3: Replace placeholders with final, renumbered citation format
        for original_num, new_num in renumber_map.items():
            renumbered_text = re.sub(
                f"__TEMP_SOURCE_{new_num}__",
                f"[Source {new_num}]",
                renumbered_text
            )
        
        # Step 4: Clean up any extra whitespace left by removed citations
        renumbered_text = re.sub(r'\s+', ' ', renumbered_text).strip()
        
        # Create the final list of renumbered citation objects
        final_citations = []
        for new_num, original_citation in enumerate(used_citations_ordered, 1):
            final_citations.append(
                RenumberedCitation(
                    new_source_num=new_num,
                    original_source_num=original_citation.source_num,
                    filename=original_citation.filename,
                    page=original_citation.page,
                    relevance_score=original_citation.relevance_score,
                    content=original_citation.content
                )
            )
        
        return ProcessedLLMResponse(
            renumbered_response_text=renumbered_text,
            used_citations=final_citations
        )

GitHub Events

Total
  • Member event: 2
  • Push event: 5
  • Create event: 2
Last Year
  • Member event: 2
  • Push event: 5
  • Create event: 2

Dependencies

package-lock.json npm
  • 1330 dependencies
package.json npm
  • @types/react ^18.2.15 development
  • @types/react-dom ^18.2.7 development
  • @testing-library/jest-dom ^5.16.4
  • @testing-library/react ^13.3.0
  • @testing-library/user-event ^13.5.0
  • autoprefixer ^10.4.14
  • lucide-react ^0.263.1
  • pdfjs-dist ^3.11.174
  • postcss ^8.4.24
  • react ^18.2.0
  • react-dom ^18.2.0
  • react-pdf ^7.5.1
  • react-router-dom ^6.8.1
  • react-scripts 5.0.1
  • tailwindcss ^3.3.2
requirements.txt pypi
  • chromadb ==0.4.18
  • ollama ==0.1.7
  • pathlib2 >=2.3.0
  • pdfplumber ==0.10.3
  • psutil ==5.9.6
  • sentence-transformers ==2.2.2
  • torch >=1.9.0
  • tqdm ==4.66.1
  • transformers >=4.21.0
  • urllib3 >=1.26.0