https://github.com/carnotresearch/new-qdoc

https://github.com/carnotresearch/new-qdoc

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.1%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: carnotresearch
  • Language: Python
  • Default Branch: main
  • Size: 497 KB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 1 year ago · Last pushed 11 months ago
Metadata Files
Readme

readme.md

icarKno - Document Intelligence Platform

An AI-powered document analysis platform that enables intelligent querying, summarization, and knowledge extraction from various document formats.

Python Flask License

✨ Features

📄 Document Processing

  • Multi-format Support: PDF, DOCX, DOC, TXT, PPTX, CSV, XLSX
  • Intelligent Chunking: Hierarchical text segmentation for optimal retrieval
  • Vector Embeddings: Advanced semantic search using OpenAI embeddings

🔍 Smart Search & Retrieval

  • Hybrid Search: Combines keyword and vector search for best results
  • Contextual Answers: AI-powered responses with source citations
  • Creative Reasoning: Advanced multi-step analysis for complex queries

🧠 AI Capabilities

  • Document Summarization: Extractive and abstractive summaries
  • Question Answering: Natural language queries with source attribution
  • Knowledge Graphs: Visual representation of document relationships
  • Multi-language: Support for 23+ languages including Hindi, Tamil, Bengali

📊 Data Integration

  • Structured Data: Query CSV/Excel files using natural language
  • SQL Generation: Automatic conversion of questions to database queries
  • Hybrid Analysis: Combine document insights with data analytics

🌐 Integration & Access

  • REST API: Complete programmatic access
  • WhatsApp Bot: Chat with your documents via WhatsApp
  • Web Interface: User-friendly document upload and query interface
  • Authentication: JWT-based secure access with subscription tiers

🚀 Quick Start

Prerequisites

  • Python 3.8 or higher
  • Elasticsearch cluster (cloud or self-hosted)
  • OpenAI API access
  • MySQL database
  • MongoDB instance

Installation

  1. Clone the repository

bash git clone https://github.com/your-org/icarkno.git cd icarkno

  1. Create virtual environment

bash python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate

  1. Install dependencies

bash pip install -r requirements.txt

  1. Environment setup

bash cp .env.example .env # Edit .env with your configuration

  1. Configure environment variables

```env # Elasticsearch ESCLOUDID=your-elasticsearch-cloud-id ESAPIKEY=your-elasticsearch-api-key

# OpenAI OPENAIAPIKEY=your-openai-api-key

# Database MONGOURL=mongodb://localhost:27017/icarkno MYSQLHOST=localhost MYSQLUSERNAME=your-username MYSQLPASSWORD=your-password

# Security JWTSECRETKEY=your-jwt-secret SECRET_KEY=your-flask-secret

# Neo4j (for knowledge graphs) NEO4JURI=bolt://localhost:7687 NEO4JUSERNAME=neo4j NEO4J_PASSWORD=your-password ```

  1. Run the application bash python run.py

The API will be available at http://localhost:5000

📖 Usage

Basic Document Processing

```bash

Upload documents

curl -X POST http://localhost:5000/upload \ -F "files=@document.pdf" \ -F "token=your-jwt-token" \ -F "sessionId=session123"

Query documents

curl -X POST http://localhost:5000/ask \ -H "Content-Type: application/json" \ -d '{ "token": "your-jwt-token", "sessionId": "session123", "message": "What are the key findings?", "context": true }' ```

Free Trial Mode

```bash

Upload for trial users

curl -X POST http://localhost:5000/freeTrial \ -F "files=@document.pdf" \ -F "fingerprint=unique-browser-fingerprint"

Query in trial mode

curl -X POST http://localhost:5000/trialAsk \ -H "Content-Type: application/json" \ -d '{ "fingerprint": "unique-browser-fingerprint", "message": "Summarize this document" }' ```

Advanced Features

  • Creative Mode: Add "mode": "creative" for multi-step reasoning
  • Knowledge Graphs: Use /create_graph endpoint for visual relationships
  • Multi-language: Set "outputLanguage": 1 for Hindi responses
  • Data Queries: Upload CSV/Excel and query with natural language

🏗️ Architecture

┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐ │ Client Apps │ │ Flask API │ │ AI Services │ │ │────│ │────│ │ │ • Web Interface │ │ • Authentication │ │ • OpenAI GPT │ │ • WhatsApp Bot │ │ • Rate Limiting │ │ • Embeddings │ │ • Mobile App │ │ • File Processing│ │ • Summarization │ └─────────────────┘ └──────────────────┘ └─────────────────┘ │ ┌────────┴────────┐ │ Data Layer │ │ │ │ • Elasticsearch │ │ • MongoDB │ │ • MySQL │ │ • Neo4j │ └─────────────────┘

📁 Project Structure

icarkno/ ├── app/ # Flask application │ ├── __init__.py # App factory │ ├── api/ # API blueprints │ ├── services/ # Business logic │ └── config.py # Configuration ├── controllers/ # Legacy controllers ├── elastic/ # Elasticsearch integration ├── utils/ # Utilities ├── webhook/ # WhatsApp integration ├── requirements.txt # Dependencies ├── run.py # Application entry point └── README.md # This file

🔧 Configuration

Environment Variables

| Variable | Description | Required | | ---------------- | ------------------------- | -------- | | ES_CLOUD_ID | Elasticsearch Cloud ID | Yes | | ES_API_KEY | Elasticsearch API Key | Yes | | OPENAI_API_KEY | OpenAI API Key | Yes | | MONGO_URL | MongoDB connection string | Yes | | MYSQL_HOST | MySQL host | Yes | | JWT_SECRET_KEY | JWT signing key | Yes | | NEO4J_URI | Neo4j connection URI | Optional |

Supported File Formats

  • Documents: PDF, DOCX, DOC, TXT, PPTX
  • Data: CSV, XLSX, XLS
  • Max file size: 50MB per file
  • Supported languages: 23+ languages

🔌 API Reference

See API Documentation for detailed endpoint information.

Key Endpoints

  • POST /upload - Upload documents
  • POST /ask - Query documents
  • POST /freeTrial - Trial mode upload
  • POST /trialAsk - Trial mode queries
  • POST /updatepayment - Manage subscriptions
  • GET /healthcheck - Health status

🧪 Testing

Unit Tests

bash python -m pytest tests/

Integration Tests

```bash

Test document processing

python uploadtoelastic.py --file test.pdf --index test_index

Test querying

python queryelastic.py --index testindex --query "test question" ```

🚀 Deployment

Docker (Recommended)

bash docker build -t icarkno . docker run -p 5000:5000 --env-file .env icarkno

Production Setup

```bash

Using Gunicorn

gunicorn --bind 0.0.0.0:5000 --workers 4 run:app

With SSL

gunicorn --bind 0.0.0.0:443 --certfile cert.pem --keyfile key.pem run:app ```

📊 Performance

  • Processing Speed: ~2-5 seconds per page
  • Concurrent Users: Supports 100+ simultaneous users
  • Storage: Elasticsearch scales horizontally
  • Languages: 23+ supported with translation API
  • Rate Limits: Configurable per user tier

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit changes (git commit -m 'Add amazing feature')
  4. Push to branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

📝 License

This project is proprietary software. See LICENSE for details.

🆘 Support

🙏 Acknowledgments


Made with ❤️ by Carnot Research

Owner

  • Name: Carnot Research Pvt. Ltd.
  • Login: carnotresearch
  • Kind: organization

GitHub Events

Total
  • Member event: 2
  • Push event: 17
  • Create event: 2
Last Year
  • Member event: 2
  • Push event: 17
  • Create event: 2

Committers

Last synced: 12 months ago

All Time
  • Total Commits: 18
  • Total Committers: 1
  • Avg Commits per committer: 18.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 18
  • Committers: 1
  • Avg Commits per committer: 18.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Pranav p****e@g****m 18

Issues and Pull Requests

Last synced: about 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

requirements.txt pypi
  • Flask ==2.3.2
  • PyJWT ==2.8.0
  • beautifulsoup4 ==4.12.2
  • bert-extractive-summarizer ==0.10.1
  • docx2txt ==0.8
  • elasticsearch ==8.11.0
  • fastapi ==0.104.1
  • flask-cors ==4.0.0
  • flask-limiter ==2.7.0
  • langchain ==0.1.0
  • langchain-community ==0.0.13
  • langchain-elasticsearch ==0.0.2
  • langchain-experimental ==0.0.52
  • langchain-openai ==0.0.5
  • mysql-connector-python ==8.2.0
  • openai ==1.1.9
  • pdfplumber ==0.10.2
  • pydantic ==2.4.2
  • pymongo ==4.6.1
  • pypdf ==3.15.1
  • pypdfium2 ==4.10.0
  • python-docx ==0.8.11
  • python-dotenv ==1.0.0
  • pyvis ==0.3.2
  • requests ==2.31.0
  • sqlalchemy ==2.0.23
  • torch ==2.1.1
  • uvicorn ==0.27.0