https://github.com/carnotresearch/new-qdoc
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.1%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: carnotresearch
- Language: Python
- Default Branch: main
- Size: 497 KB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
readme.md
icarKno - Document Intelligence Platform
An AI-powered document analysis platform that enables intelligent querying, summarization, and knowledge extraction from various document formats.
✨ Features
📄 Document Processing
- Multi-format Support: PDF, DOCX, DOC, TXT, PPTX, CSV, XLSX
- Intelligent Chunking: Hierarchical text segmentation for optimal retrieval
- Vector Embeddings: Advanced semantic search using OpenAI embeddings
🔍 Smart Search & Retrieval
- Hybrid Search: Combines keyword and vector search for best results
- Contextual Answers: AI-powered responses with source citations
- Creative Reasoning: Advanced multi-step analysis for complex queries
🧠 AI Capabilities
- Document Summarization: Extractive and abstractive summaries
- Question Answering: Natural language queries with source attribution
- Knowledge Graphs: Visual representation of document relationships
- Multi-language: Support for 23+ languages including Hindi, Tamil, Bengali
📊 Data Integration
- Structured Data: Query CSV/Excel files using natural language
- SQL Generation: Automatic conversion of questions to database queries
- Hybrid Analysis: Combine document insights with data analytics
🌐 Integration & Access
- REST API: Complete programmatic access
- WhatsApp Bot: Chat with your documents via WhatsApp
- Web Interface: User-friendly document upload and query interface
- Authentication: JWT-based secure access with subscription tiers
🚀 Quick Start
Prerequisites
- Python 3.8 or higher
- Elasticsearch cluster (cloud or self-hosted)
- OpenAI API access
- MySQL database
- MongoDB instance
Installation
- Clone the repository
bash
git clone https://github.com/your-org/icarkno.git
cd icarkno
- Create virtual environment
bash
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies
bash
pip install -r requirements.txt
- Environment setup
bash
cp .env.example .env
# Edit .env with your configuration
- Configure environment variables
```env # Elasticsearch ESCLOUDID=your-elasticsearch-cloud-id ESAPIKEY=your-elasticsearch-api-key
# OpenAI OPENAIAPIKEY=your-openai-api-key
# Database MONGOURL=mongodb://localhost:27017/icarkno MYSQLHOST=localhost MYSQLUSERNAME=your-username MYSQLPASSWORD=your-password
# Security JWTSECRETKEY=your-jwt-secret SECRET_KEY=your-flask-secret
# Neo4j (for knowledge graphs) NEO4JURI=bolt://localhost:7687 NEO4JUSERNAME=neo4j NEO4J_PASSWORD=your-password ```
- Run the application
bash python run.py
The API will be available at http://localhost:5000
📖 Usage
Basic Document Processing
```bash
Upload documents
curl -X POST http://localhost:5000/upload \ -F "files=@document.pdf" \ -F "token=your-jwt-token" \ -F "sessionId=session123"
Query documents
curl -X POST http://localhost:5000/ask \ -H "Content-Type: application/json" \ -d '{ "token": "your-jwt-token", "sessionId": "session123", "message": "What are the key findings?", "context": true }' ```
Free Trial Mode
```bash
Upload for trial users
curl -X POST http://localhost:5000/freeTrial \ -F "files=@document.pdf" \ -F "fingerprint=unique-browser-fingerprint"
Query in trial mode
curl -X POST http://localhost:5000/trialAsk \ -H "Content-Type: application/json" \ -d '{ "fingerprint": "unique-browser-fingerprint", "message": "Summarize this document" }' ```
Advanced Features
- Creative Mode: Add
"mode": "creative"for multi-step reasoning - Knowledge Graphs: Use
/create_graphendpoint for visual relationships - Multi-language: Set
"outputLanguage": 1for Hindi responses - Data Queries: Upload CSV/Excel and query with natural language
🏗️ Architecture
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Client Apps │ │ Flask API │ │ AI Services │
│ │────│ │────│ │
│ • Web Interface │ │ • Authentication │ │ • OpenAI GPT │
│ • WhatsApp Bot │ │ • Rate Limiting │ │ • Embeddings │
│ • Mobile App │ │ • File Processing│ │ • Summarization │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│
┌────────┴────────┐
│ Data Layer │
│ │
│ • Elasticsearch │
│ • MongoDB │
│ • MySQL │
│ • Neo4j │
└─────────────────┘
📁 Project Structure
icarkno/
├── app/ # Flask application
│ ├── __init__.py # App factory
│ ├── api/ # API blueprints
│ ├── services/ # Business logic
│ └── config.py # Configuration
├── controllers/ # Legacy controllers
├── elastic/ # Elasticsearch integration
├── utils/ # Utilities
├── webhook/ # WhatsApp integration
├── requirements.txt # Dependencies
├── run.py # Application entry point
└── README.md # This file
🔧 Configuration
Environment Variables
| Variable | Description | Required |
| ---------------- | ------------------------- | -------- |
| ES_CLOUD_ID | Elasticsearch Cloud ID | Yes |
| ES_API_KEY | Elasticsearch API Key | Yes |
| OPENAI_API_KEY | OpenAI API Key | Yes |
| MONGO_URL | MongoDB connection string | Yes |
| MYSQL_HOST | MySQL host | Yes |
| JWT_SECRET_KEY | JWT signing key | Yes |
| NEO4J_URI | Neo4j connection URI | Optional |
Supported File Formats
- Documents: PDF, DOCX, DOC, TXT, PPTX
- Data: CSV, XLSX, XLS
- Max file size: 50MB per file
- Supported languages: 23+ languages
🔌 API Reference
See API Documentation for detailed endpoint information.
Key Endpoints
POST /upload- Upload documentsPOST /ask- Query documentsPOST /freeTrial- Trial mode uploadPOST /trialAsk- Trial mode queriesPOST /updatepayment- Manage subscriptionsGET /healthcheck- Health status
🧪 Testing
Unit Tests
bash
python -m pytest tests/
Integration Tests
```bash
Test document processing
python uploadtoelastic.py --file test.pdf --index test_index
Test querying
python queryelastic.py --index testindex --query "test question" ```
🚀 Deployment
Docker (Recommended)
bash
docker build -t icarkno .
docker run -p 5000:5000 --env-file .env icarkno
Production Setup
```bash
Using Gunicorn
gunicorn --bind 0.0.0.0:5000 --workers 4 run:app
With SSL
gunicorn --bind 0.0.0.0:443 --certfile cert.pem --keyfile key.pem run:app ```
📊 Performance
- Processing Speed: ~2-5 seconds per page
- Concurrent Users: Supports 100+ simultaneous users
- Storage: Elasticsearch scales horizontally
- Languages: 23+ supported with translation API
- Rate Limits: Configurable per user tier
🤝 Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
📝 License
This project is proprietary software. See LICENSE for details.
🆘 Support
- Documentation: API Docs
- Issues: GitHub Issues
- Email: support@carnotresearch.com
🙏 Acknowledgments
- OpenAI for GPT and embedding models
- Elasticsearch for search and analytics
- LangChain for LLM orchestration
- Flask for the web framework
Made with ❤️ by Carnot Research
Owner
- Name: Carnot Research Pvt. Ltd.
- Login: carnotresearch
- Kind: organization
- Website: https://www.carnotresearch.com/
- Repositories: 8
- Profile: https://github.com/carnotresearch
GitHub Events
Total
- Member event: 2
- Push event: 17
- Create event: 2
Last Year
- Member event: 2
- Push event: 17
- Create event: 2
Issues and Pull Requests
Last synced: about 1 year ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- Flask ==2.3.2
- PyJWT ==2.8.0
- beautifulsoup4 ==4.12.2
- bert-extractive-summarizer ==0.10.1
- docx2txt ==0.8
- elasticsearch ==8.11.0
- fastapi ==0.104.1
- flask-cors ==4.0.0
- flask-limiter ==2.7.0
- langchain ==0.1.0
- langchain-community ==0.0.13
- langchain-elasticsearch ==0.0.2
- langchain-experimental ==0.0.52
- langchain-openai ==0.0.5
- mysql-connector-python ==8.2.0
- openai ==1.1.9
- pdfplumber ==0.10.2
- pydantic ==2.4.2
- pymongo ==4.6.1
- pypdf ==3.15.1
- pypdfium2 ==4.10.0
- python-docx ==0.8.11
- python-dotenv ==1.0.0
- pyvis ==0.3.2
- requests ==2.31.0
- sqlalchemy ==2.0.23
- torch ==2.1.1
- uvicorn ==0.27.0