automl-pipeline

Enterprise-Grade Automated Machine Learning Pipeline with AI-Powered Insights

https://github.com/ommnnitald/automl-pipeline

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.7%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Enterprise-Grade Automated Machine Learning Pipeline with AI-Powered Insights

Basic Info
  • Host: GitHub
  • Owner: ommnnitald
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 32.2 KB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 11 months ago · Last pushed 11 months ago
Metadata Files
Readme Changelog Contributing Funding License Code of conduct Citation Security Support Roadmap Authors

README.md

# 🚀 AutoMLPipeline **Enterprise-Grade Automated Machine Learning Pipeline with AI-Powered Insights** [![PyPI version](https://badge.fury.io/py/automl-pipeline.svg)](https://badge.fury.io/py/automl-pipeline) [![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Build Status](https://github.com/automl-pipeline/automl-pipeline/workflows/CI/badge.svg)](https://github.com/automl-pipeline/automl-pipeline/actions) [![Coverage Status](https://codecov.io/gh/automl-pipeline/automl-pipeline/branch/main/graph/badge.svg)](https://codecov.io/gh/automl-pipeline/automl-pipeline) [![Documentation Status](https://readthedocs.org/projects/automl-pipeline/badge/?version=latest)](https://automl-pipeline.readthedocs.io/en/latest/?badge=latest) [![Downloads](https://pepy.tech/badge/automl-pipeline)](https://pepy.tech/project/automl-pipeline) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) [**🚀 Quick Start**](#-quick-start) • [**📖 Documentation**](https://automl-pipeline.readthedocs.io/) • [**🎯 Examples**](#-examples) • [**🤝 Contributing**](#-contributing) • [**💬 Community**](https://github.com/automl-pipeline/automl-pipeline/discussions) --- *Transform your data into production-ready ML models in minutes, not months.*

🌟 What is AutoMLPipeline?

AutoMLPipeline is a comprehensive, enterprise-grade automated machine learning framework that democratizes AI by making sophisticated machine learning accessible to everyone—from data scientists to business analysts to complete beginners.

🎯 Key Value Propositions

  • ⚡ Lightning Fast: Go from raw data to production model in under 5 minutes
  • 🧠 AI-Powered: Intelligent decision-making with Google Gemini integration
  • 🔧 Zero Configuration: Works out-of-the-box with sensible defaults
  • 📊 Universal: Handles both classification and regression tasks automatically
  • 🏭 Production Ready: Enterprise-grade model persistence and deployment
  • 📈 Comprehensive: End-to-end pipeline with detailed reporting and insights

Features

### 🤖 **Automated Pipeline** - **9-Stage ML Pipeline**: From problem definition to model persistence - **Auto Problem Detection**: Automatically identifies classification vs regression - **Smart Preprocessing**: Handles missing values, encoding, scaling - **Multi-Algorithm Evaluation**: Tests 6+ algorithms automatically ### 🧠 **AI-Powered Insights** - **Gemini API Integration**: Natural language explanations - **Intelligent Recommendations**: Smart feature and model suggestions - **Automated Analysis**: Data quality assessment and insights - **Performance Optimization**: AI-driven hyperparameter suggestions
### 📊 **Professional Reporting** - **Interactive HTML Reports**: Beautiful visualizations and metrics - **Comprehensive Metrics**: Accuracy, R², RMSE, confusion matrices - **Model Comparison**: Side-by-side algorithm performance - **Export Options**: CSV, JSON, and PDF report formats ### 🏭 **Production Ready** - **Model Serialization**: Save and load trained models - **Deployment Pipeline**: Ready for production environments - **API Integration**: RESTful API endpoints - **Monitoring**: Performance tracking and drift detection

🚀 Quick Start

Installation

```bash

Basic installation

pip install automl-pipeline

Full installation with all features

pip install automl-pipeline[full]

Development installation

pip install automl-pipeline[dev] ```

30-Second Example

```python from automl_pipeline import AutoMLPipeline import pandas as pd

Load your data

df = pd.readcsv('yourdata.csv')

Create and run pipeline

pipeline = AutoMLPipeline() results = pipeline.fit(df, targetcolumn='yourtarget')

Get results

print(f"Best Model: {results.bestmodelname}") print(f"Accuracy: {results.best_score:.2%}")

Make predictions

predictions = results.predict(new_data) ```

Command Line Interface

```bash

Run analysis from command line

automl-pipeline data.csv target_column --output results/

With AI insights

automl-pipeline data.csv targetcolumn --ai --api-key YOURKEY ```


🎯 Examples

📊 Classification Example (Customer Churn) ```python import pandas as pd from automl_pipeline import AutoMLPipeline # Load customer data df = pd.read_csv('customer_churn.csv') # Columns: age, tenure, monthly_charges, total_charges, churn # Run automated analysis pipeline = AutoMLPipeline(enable_ai_insights=True) results = pipeline.fit(df, target_column='churn') # Results print(f"🎯 Churn Prediction Accuracy: {results.best_score:.1%}") print(f"🤖 Best Model: {results.best_model_name}") # Predict churn for new customers new_customers = pd.read_csv('new_customers.csv') churn_predictions = results.predict(new_customers) print(f"📈 Predicted Churn Rate: {churn_predictions.mean():.1%}") ```
🏠 Regression Example (House Prices) ```python import pandas as pd from automl_pipeline import AutoMLPipeline # Load housing data df = pd.read_csv('housing_data.csv') # Columns: bedrooms, bathrooms, sqft, location, price # Run automated analysis pipeline = AutoMLPipeline() results = pipeline.fit(df, target_column='price') # Results print(f"🏠 Price Prediction R² Score: {results.best_score:.1%}") print(f"💰 Average Prediction Error: ${results.rmse:,.0f}") # Predict prices for new listings new_houses = pd.read_csv('new_listings.csv') price_predictions = results.predict(new_houses) print(f"🏡 Predicted Prices: ${price_predictions.min():,.0f} - ${price_predictions.max():,.0f}") ```
🔬 Advanced Configuration ```python from automl_pipeline import AutoMLPipeline, PipelineConfig # Custom configuration config = PipelineConfig( test_size=0.3, # 30% for testing random_state=42, # Reproducible results cv_folds=10, # 10-fold cross-validation max_models=15, # Try up to 15 models enable_feature_selection=True, # Automatic feature selection enable_hyperparameter_tuning=True, # HP optimization output_dir='custom_results', # Custom output directory verbose=True # Detailed logging ) # Advanced pipeline with AI insights pipeline = AutoMLPipeline( config=config, enable_ai_insights=True, ai_provider='gemini' # or 'openai', 'anthropic' ) results = pipeline.fit(df, target_column='target') # Access detailed insights print("🧠 AI Insights:", results.ai_insights) print("📊 Feature Importance:", results.feature_importance) print("🔍 Model Explanations:", results.model_explanations) ```

📖 Documentation

| Resource | Description | |----------|-------------| | 📚 User Guide | Complete tutorials and examples | | 🔧 API Reference | Detailed API documentation | | 🚀 Quick Start | Get started in 5 minutes | | 💡 Examples | Real-world use cases | | 🏗️ Developer Guide | Contributing and development |


🏆 Performance Benchmarks

| Dataset | Problem Type | Best Model | Score | Time | Status | |---------|--------------|------------|-------|------|--------| | Customer Data | Classification | Logistic Regression | 82.1% | 3.2s | ✅ Tested | | Housing Data | Regression | Random Forest | 92.9% R² | 2.8s | ✅ Tested | | Iris Data | Classification | Random Forest | 100.0% | 1.9s | ✅ Tested | | Titanic | Classification | Random Forest | 84.2% | 4.1s | ✅ Verified | | Boston Housing | Regression | Random Forest | 91.8% R² | 3.7s | ✅ Verified |

Benchmarks run on Intel i7-10700K, 32GB RAM


🛠️ Supported Algorithms

### 📊 **Classification** - Logistic Regression - Random Forest - Support Vector Machine - K-Nearest Neighbors - Gradient Boosting - Neural Networks ### 📈 **Regression** - Linear Regression - Random Forest - Support Vector Regression - K-Nearest Neighbors - Gradient Boosting - Neural Networks

🌍 Use Cases

### 💼 **Business** - Customer churn prediction - Sales forecasting - Market segmentation - Fraud detection - Risk assessment ### 🏥 **Healthcare** - Disease diagnosis - Treatment outcomes - Drug discovery - Medical imaging - Patient monitoring ### 🏭 **Industry** - Predictive maintenance - Quality control - Supply chain optimization - Energy forecasting - IoT analytics

🤝 Contributing

We welcome contributions from the community! Here's how you can help:

🚀 Quick Contribution Guide

  1. 🍴 Fork the repository
  2. 🌿 Create a feature branch (git checkout -b feature/amazing-feature)
  3. 💻 Commit your changes (git commit -m 'Add amazing feature')
  4. 📤 Push to the branch (git push origin feature/amazing-feature)
  5. 🔄 Open a Pull Request

📋 Contribution Areas

  • 🐛 Bug Reports: Found an issue? Let us know!
  • Feature Requests: Have an idea? We'd love to hear it!
  • 📖 Documentation: Help improve our docs
  • 🧪 Testing: Add tests and improve coverage
  • 🎨 Examples: Create tutorials and use cases

See our Contributing Guide for detailed instructions.


📄 License

This project is licensed under the MIT License - see the LICENSE file for details.


🙏 Acknowledgments

  • 🤖 AI Integration: Powered by Google Gemini API
  • 📊 ML Foundation: Built on scikit-learn, pandas, and NumPy
  • 🎨 Visualization: Enhanced with matplotlib, seaborn, and plotly
  • 🌟 Inspiration: Inspired by the need for accessible, automated machine learning

📞 Support & Community

[![GitHub Discussions](https://img.shields.io/badge/GitHub-Discussions-green?logo=github)](https://github.com/automl-pipeline/automl-pipeline/discussions) [![Discord](https://img.shields.io/badge/Discord-Community-blue?logo=discord)](https://discord.gg/automl-pipeline) [![Stack Overflow](https://img.shields.io/badge/Stack%20Overflow-Questions-orange?logo=stackoverflow)](https://stackoverflow.com/questions/tagged/automl-pipeline) **📧 Email**: [support@automlpipeline.com](mailto:support@automlpipeline.com) **🐛 Issues**: [GitHub Issues](https://github.com/automl-pipeline/automl-pipeline/issues) **💬 Chat**: [Discord Community](https://discord.gg/automl-pipeline)

**⭐ Star us on GitHub if AutoMLPipeline helps you build better ML models!** [**🚀 Get Started Now**](#-quick-start) • [**📖 Read the Docs**](https://automl-pipeline.readthedocs.io/) • [**💬 Join Community**](https://github.com/automl-pipeline/automl-pipeline/discussions)

Owner

  • Login: ommnnitald
  • Kind: user

Citation (CITATION.cff)


      

GitHub Events

Total
  • Push event: 3
  • Create event: 2
Last Year
  • Push event: 3
  • Create event: 2

Dependencies

.github/workflows/ci.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v4 composite
  • codecov/codecov-action v3 composite
.github/workflows/docs.yml actions
.github/workflows/release.yml actions
.github/workflows/security.yml actions
Dockerfile docker
docker-compose.yml docker
docs/requirements.txt pypi
environment.yml pypi
pyproject.toml pypi
requirements-dev.txt pypi
requirements-docs.txt pypi
requirements.txt pypi
  • black >=22.0.0
  • flake8 >=5.0.0
  • google-generativeai >=0.3.0
  • joblib >=1.2.0
  • matplotlib >=3.5.0
  • mypy >=1.0.0
  • numpy >=1.21.0
  • pandas >=1.5.0
  • plotly >=5.0.0
  • pytest >=7.0.0
  • pytest-cov >=4.0.0
  • python-dotenv >=0.19.0
  • scikit-learn >=1.1.0
  • seaborn >=0.11.0
  • tqdm >=4.64.0
setup.py pypi