https://github.com/arash-keshavarz/visualsearchengine-cv
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.9%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: Arash-Keshavarz
- Language: Python
- Default Branch: main
- Size: 11.9 MB
Statistics
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
🔍 Visual Search Engine
A powerful visual search engine that uses deep learning models (CLIP and ViT) to find similar images in a custom dataset. Built with Python, PyTorch, FAISS, and Gradio.
🎯 Features
- Dual Model Support: CLIP for semantic similarity, ViT for visual features
- Fast Similarity Search: FAISS indexing for efficient retrieval
- Web Interface: Beautiful Gradio UI for easy interaction
- Scalable: Handles large image datasets efficiently
- Cross-Platform: Works on macOS, Linux, and Windows
🏗️ Architecture
The system consists of several key components:
- Image Collection: Downloads images from multiple sources (Google, Bing, Baidu)
- Feature Extraction: Extracts deep features using CLIP and ViT models
- Database Storage: SQLite database for metadata and feature vectors
- FAISS Indexing: High-performance similarity search indexing
- Web Interface: Gradio-based UI for image search
🚀 Quick Start
Prerequisites
- Python 3.8+
- Conda (recommended)
- Git
Installation
Clone the repository
bash git clone https://github.com/yourusername/VisualSearchEngine-CV.git cd VisualSearchEngine-CVCreate conda environment
bash conda create -n VisualEngine python=3.9 conda activate VisualEngineInstall dependencies
bash pip install -r requirements.txtSet environment variable (for macOS)
bash export KMP_DUPLICATE_LIB_OK=TRUE
Usage
Option 1: CLIP Model (Recommended for semantic similarity)
bash
conda activate VisualEngine
export KMP_DUPLICATE_LIB_OK=TRUE
python gradio_app_clip_only.py
Option 2: ViT Model (For visual feature matching)
bash
conda activate VisualEngine
export KMP_DUPLICATE_LIB_OK=TRUE
python gradio_app_vit_safe.py
Open your browser to http://localhost:7860 to access the web interface.
📹 Demo
Watch the demo video to see the visual search engine in action:
🧠 Models
CLIP (Contrastive Language-Image Pre-training)
- Use Case: Semantic similarity, understanding image content
- Features: 512-dimensional feature vectors
- Strengths: Better for understanding image meaning and context
- App:
gradio_app_clip_only.py
ViT (Vision Transformer)
- Use Case: Visual feature matching, detailed image analysis
- Features: 768-dimensional feature vectors
- Strengths: Better for visual pattern recognition
- App:
gradio_app_vit_only.py
📁 Project Structure
VisualSearchEngine-CV/
├── data/
│ └── datasets/
│ ├── dataset/ # Image dataset
│ └── visual_search_dataset.db # SQLite database
├── feature_extractor/
│ ├── base_extractor.py # Base feature extractor class
│ ├── clip_extractor.py # CLIP feature extractor
│ └── vit_extractor.py # ViT feature extractor
├── scripts/
│ ├── download_images.py # Image downloader
│ ├── dataset_to_db.py # Database creation
│ └── run_feature_extraction.py # Feature extraction
├── utils/
│ └── database_utils.py # Database utilities
├── gradio_app_clip_only.py # CLIP-only web interface
├── gradio_app_vit_safe.py # ViT-only web interface
├── visual_search_engine.py # Core search engine
├── requirements.txt # Python dependencies
└── README.md # This file
🔧 Configuration
Database
The system uses SQLite for storing image metadata and feature vectors: - Images table: Image metadata (filename, path, size, category) - Categories table: Image categories - Image_features table: Feature vectors for each model
FAISS Index
- Index Type: Flat index with cosine similarity
- Normalization: L2 normalization for accurate similarity scores
- Storage: Pickled index files (
faiss_index_clip.pkl,faiss_index_vit.pkl)
📊 Performance
- Feature Extraction: ~2-3 seconds per image
- Search Speed: <100ms for similarity search
- Index Size: ~933 images with both CLIP and ViT features
- Memory Usage: ~2GB for both models
🛠️ Development
Adding New Models
- Create a new feature extractor in
feature_extractor/ - Inherit from
BaseFeatureExtractor - Implement
extract_features()method - Add to the search engine
Extending the Dataset
- Add images to
data/datasets/dataset/ - Run
scripts/dataset_to_db.pyto update database - Run
scripts/run_feature_extraction.pyto extract features
🤝 Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
📝 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- CLIP by OpenAI
- ViT by Google Research
- FAISS by Facebook Research
- Gradio for the web interface
- Hugging Face Transformers for model loading
📞 Contact
For questions or support, please open an issue on GitHub or contact me directly via email.
Note: This project is designed to work with separate apps for each model to avoid memory conflicts and ensure stability. CLIP works best with MPS (Apple Silicon), while ViT is optimized for CPU usage.
Owner
- Name: Arash
- Login: Arash-Keshavarz
- Kind: user
- Repositories: 1
- Profile: https://github.com/Arash-Keshavarz
GitHub Events
Total
- Watch event: 1
- Push event: 4
- Create event: 2
Last Year
- Watch event: 1
- Push event: 4
- Create event: 2
Dependencies
- Pillow *
- accelerate *
- faiss-cpu *
- gradio *
- matplotlib *
- numpy *
- opencv-python *
- pandas *
- seaborn *
- sqlite3 *
- torch *
- torchvision *
- tqdm *
- transformers *