Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.7%) to scientific vocabulary
Repository
Docker Image for Machine Learning & Data Science
Basic Info
- Host: GitHub
- Owner: humbertolvarona
- License: cc-by-4.0
- Language: Shell
- Default Branch: main
- Size: 269 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
🧠 AI Lab - Docker Image for Machine Learning & Data Science
📘 Overview
AI Lab is a ready-to-use Docker image providing a full environment for data science, machine learning, and deep learning. It includes support for:
- Training on CPU or GPU (CUDA / ROCm)
- Jupyter Notebook + nbextensions + ipywidgets
- Visualization and statistical modeling
- Time series and date/time analysis
- Structured and unstructured data modeling
- Support for NetCDF, Excel, CSV, and more
🚀 Key Features
- ✅ Ready for TensorFlow, PyTorch, Ultralytics
- ✅ Includes tools like
Transformers,Optuna,MLFlow,Ray,DVC - ✅ Full support for
ipywidgetsinteractive elements - ✅ Full LaTeX & Markdown rendering in Jupyter
- ✅ Reads
.csv,.xlsx,.xls,.nc(NetCDF),HDFfiles - ✅ Includes statistical and time-series packages
📚 Included Packages
A comprehensive and organized list of essential Python packages used in Data Science, Machine Learning, Deep Learning, Computer Vision, Natural Language Processing, Data Visualization, Model Deployment, Database Interaction, and more.
🧮 Data Science & Machine Learning
🔹 Core Libraries
| Package | Description | | ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | numpy | Fundamental package for scientific computing in Python. Provides support for large, multi-dimensional arrays and matrices with optimized performance under the hood using C/Fortran. Essential for numerical operations. | | pandas | Powerful data manipulation library built on NumPy. Introduces DataFrame and Series objects for structured data analysis, including tools for cleaning, transforming, and exploring datasets. | | scikit-learn | Comprehensive machine learning library offering classification, regression, clustering, and preprocessing tools. Features a consistent API for model training, evaluation, and pipelines. |
🔥 Deep Learning
| Package | Description | | -------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | tensorflow | End-to-end platform for ML and deep learning developed by Google. Supports both high-level Keras API and low-level customization. Ideal for production systems and scalable ML solutions. | | torch | Flexible deep learning framework with dynamic computation graphs, preferred in research environments. Developed by Facebook’s AI Research lab. Offers GPU acceleration and an intuitive Pythonic interface. |
🧠 Specialized ML
| Package | Description | | ---------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | transformers | State-of-the-art NLP library by Hugging Face. Includes thousands of pretrained models like BERT, GPT, T5 for text classification, translation, summarization, and generation. Compatible with PyTorch and TensorFlow. | | ultralytics | High-performance computer vision library specializing in object detection and segmentation using YOLO models. Known for real-time inference speed and production-ready deployment tools. |
📈 Data Visualization
📊 Basic Plotting
| Package | Description | | -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | matplotlib | Foundational plotting library for creating static, animated, and interactive visualizations. Highly customizable and serves as the base for many other visualization libraries. | | seaborn | High-level statistical visualization library built on matplotlib. Simplifies creation of visually appealing plots for categorical data, distributions, and regression trends. |
💡 Interactive Visualization
| Package | Description | | ------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | plotly | Library for creating interactive, publication-quality charts. Supports 3D plots, geographic maps, and financial charts. Integrates with Dash for building analytical dashboards. | | streamlit | Fast framework to build interactive web apps from Python scripts. Ideal for turning ML models and data exploration tools into shareable interfaces with minimal effort. |
⚙️ Utilities & Preprocessing
🖼️ Data Processing
| Package | Description | | ----------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | opencv-python | Open Source Computer Vision library. Contains algorithms for image/video processing, facial recognition, augmented reality, and more. Used across robotics, surveillance, and medical imaging. | | Pillow | Friendly fork of PIL for image processing. Supports opening, modifying, and saving various image formats. Useful for basic image transformations and filters. |
🛠️ Workflow Tools
| Package | Description | | ---------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- | | tqdm | Lightweight progress bar for loops and iterables. Helps monitor long-running operations with minimal overhead. Supports Jupyter and custom formatting. | | joblib | Lightweight pipelining library for caching expensive computations and parallel execution. Commonly used in scikit-learn for model persistence and optimization. |
🗃️ Databases & Storage
🗄️ SQL Databases
| Package | Description | | ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | psycopg2-binary | PostgreSQL adapter for Python. Implements DB API 2.0 and supports advanced PostgreSQL features like asynchronous notifications and COPY commands. Binary version avoids compilation requirements. | | PyMySQL | Pure-Python MySQL client that doesn’t require external dependencies. Implements Python DB API v2.0 with support for prepared statements and connection pooling. |
📦 NoSQL Databases
| Package | Description | | ----------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | pymongo | Official MongoDB driver for Python. Enables working with documents and collections using an intuitive API. Supports aggregation, GridFS, and change streams. | | redis | Python interface to Redis, a fast key-value store. Supports transactions, pub/sub messaging, Lua scripting, and is widely used for caching and real-time applications. |
🤖 Model Deployment & APIs
🌐 Web Frameworks
| Package | Description | | ----------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | fastapi | Modern, high-performance web framework for building APIs using Python 3.7+. Based on type hints with automatic data validation and OpenAPI/Swagger documentation. Excellent for deploying ML models. | | gradio | Easy-to-use library for creating interactive UI around ML models. Great for demos, testing, and sharing models with non-technical users. Supports inputs like images, audio, and video. |
🏭 MLOps
| Package | Description | | ----------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | mlflow | Open source platform for managing the end-to-end ML lifecycle: experiment tracking, reproducible runs, and model deployment. Works with multiple frameworks. | | bentoml | Framework for serving, deploying, and monitoring ML models. Packages models with dependencies and provides high-performance serving with adaptive batching. Integrates with Kubernetes and cloud platforms. |
📁 File Formats & I/O
📦 Specialized Formats
| Package | Description | | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | pyarrow | Python implementation of Apache Arrow. Provides a fast, language-independent columnar memory format for analytics. Enables efficient data sharing between Python, R, Spark, and more. | | h5py | Python interface to HDF5 binary data format. Efficient for storing and manipulating large numerical datasets. Supports hierarchical organization of data in groups and datasets. |
🔄 Data Transfer
| Package | Description | | ---------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | kaggle | Official Kaggle API client. Enables downloading datasets and competition entries programmatically. Useful for automating data science workflows. | | wandb | Weights & Biases library for tracking experiments, metrics, and hyperparameters during model training. Provides team dashboards and integrates with most ML frameworks. |
🎮 Reinforcement Learning
| Package | Description | | ------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | gym | Toolkit for developing and comparing reinforcement learning algorithms. Provides standardized environments ranging from classic control problems to Atari games. Maintained by OpenAI. |
📝 Other Useful Packages
🧱 3D Processing
| Package | Description | | ---------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- | | open3d | Modern library for 3D data processing. Supports point clouds, RGB-D images, and meshes. Includes algorithms for registration, reconstruction, and visualization. |
📝 NLP Utilities
| Package | Description | | ------------ | --------------------------------------------------------------------------------------------------------------------------------------------------- | | tiktoken | Fast BPE tokenizer used by OpenAI models. Optimized for tokenizing text for GPT and similar transformer-based models. Extremely fast and efficient. |
🧰 System Tools
| Package | Description | | --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | blosc | High-performance compression library for binary data. Multi-threaded and faster than traditional compressors for numerical arrays. Often used as a backend for scientific computing. |
🧪 Building the Image
Clone the repo and build:
bash
docker build -t ai-lab .
▶️ Running the Container
💻 CPU Mode (default)
bash
docker run -d -e TZ=America/Recife \
-p 8888:8888 \
-v $(pwd)/workspace:/workspace ai-lab
⚡ GPU Mode - CUDA (NVIDIA)
bash
docker run -d --gpus all \
-e TZ=America/Recife \
-e ENABLE_GPU=yes \
-e GPU_TYPE=CUDA \
-p 8888:8888 -v $(pwd)/notebooks:/workspace ai-lab
🔷 GPU Mode - ROCm (AMD)
bash
docker run -d \
-e TZ=America/Recife \
-e ENABLE_GPU=yes \
-e GPU_TYPE=ROCM \
-p 8888:8888 -v $(pwd)/notebooks:/workspace ai-lab
docker-compose.yml
``` version: '3'
services: ai-lab: build: . containername: ai-lab ports: - "8888:8888" volumes: - ./workspace:/workspace environment: ENABLEGPU: ${ENABLEGPU:-no} GPUTYPE: ${GPU_TYPE:-CUDA} profiles: - cpu - gpu-cuda - gpu-rocm deploy: resources: reservations: devices: - driver: nvidia capabilities: [gpu]
Configuración por perfil
profiles: cpu: description: "Ejecución en modo CPU (por defecto)" gpu-cuda: description: "Ejecución con soporte GPU NVIDIA (CUDA)" services: ai-jupyter: runtime: nvidia environment: ENABLEGPU: "yes" GPUTYPE: "CUDA" TZ: "America/Recife"
gpu-rocm: description: "Ejecución con soporte GPU AMD (ROCm)" services: ai-jupyter: environment: ENABLEGPU: "yes" GPUTYPE: "ROCM" TZ: "America/Recife"
```
Run as
💻 CPU Mode (default)
bash
docker compose --profile cpu up --build
⚡ GPU Mode - CUDA (NVIDIA)
bash
docker compose --profile gpu-cuda up --build
🔷 GPU Mode - ROCm (AMD)
bash
docker compose --profile gpu-rocm up --build
📓 Jupyter Access
Once running, open your browser:
http://localhost:8888
No token or password required.
🧩 Extensions Enabled
ipywidgets: sliders, buttons, inputstoc2: table of contentscode_prettify: code formatterLaTeX: full math support ($\alpha + \beta = \gamma$)
🧪 Recommended Notebook Tests
Run demo.ipynb at http://localhost:8888 or http://localhost:8888
📄 License
This project is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). You are free to use, share, and adapt the material, provided that appropriate credit is given to the original author.
📦 Software Repository and Citation Notice This software is published under the Creative Commons Attribution 4.0 International License (CC BY 4.0). You are free to use, modify, and distribute this code for any purpose, provided that proper credit is given to the original author.
Please cite this software using the following reference:
Author: H. L. Varona Title: AI Lab: Docker Image for Machine Learning & Data Science Zenodo DOI: https://doi.org/10.5281/zenodo.15353983
BibTeX citation format:
latex
@software{hlvarona-ailab-v1,
author = {H. L. Varona},
title = {AI Lab: Docker Image for Machine Learning & Data Science},
year = 2025,
publisher = {Zenodo},
version = {v1.0},
doi = {110.5281/zenodo.15353983}
}
👤 Author
HL Varona
📧 humberto.varona@gmail.com
🔧 Project: VaronaTech
Owner
- Name: Humberto L. Varona
- Login: humbertolvarona
- Kind: user
- Location: Brazil
- Repositories: 2
- Profile: https://github.com/humbertolvarona
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: "AI Lab: Docker Image for Machine Learning & Data Science"
version: "v1.0"
doi: "10.5281/zenodo.15353983"
date-released: 2025-05-03
authors:
- family-names: Varona
given-names: Humberto L.
name: H. L. Varona
license: "CC-BY-4.0"
preferred-citation:
type: software
title: "AI Lab: Docker Image for Machine Learning & Data Science"
version: "v1.0"
doi: "10.5281/zenodo.15353983"
authors:
- name: H. L. Varona
publisher: "Zenodo"
year: 2025
GitHub Events
Total
- Push event: 7
- Create event: 2
Last Year
- Push event: 7
- Create event: 2
Dependencies
- python 3.10-slim build
- Pillow *
- PyMySQL *
- albumentations *
- altair *
- arrow *
- asteroid *
- asyncpg *
- autokeras *
- azure-cosmos *
- bentoml *
- blosc *
- bokeh *
- boto3 *
- captum *
- cassandra-driver *
- catboost *
- clickhouse-connect *
- cloudpickle *
- couchdb *
- cx_Oracle *
- darts *
- dash *
- datasets *
- datatable *
- dateparser *
- diffusers *
- dvc *
- elasticsearch *
- espnet *
- fastai *
- fastapi *
- feather-format *
- flaml *
- google-cloud-bigquery *
- google-cloud-firestore *
- gradio *
- gym *
- h5py *
- hdfs *
- holoviews *
- huggingface_hub *
- ignite *
- imgaug *
- ipywidgets *
- joblib *
- jupyter *
- jupyter_contrib_nbextensions *
- jupyterlab *
- kaggle *
- keras *
- keras-cv *
- keras-nlp *
- keras-tuner *
- langchain *
- lightgbm *
- lime *
- llama-index *
- matplotlib *
- mlflow *
- mysql-connector-python *
- netCDF4 *
- neuralprophet *
- nltk *
- notebook *
- numpy *
- onnx *
- onnxruntime *
- open3d *
- openai *
- opencv-python *
- opendatasets *
- openpyxl *
- openvino-dev *
- optuna *
- pandas *
- pandas-profiling *
- pendulum *
- pickle5 *
- pingouin *
- plotly *
- pmdarima *
- pretrainedmodels *
- prophet *
- psycopg2-binary *
- pyarrow *
- pymongo *
- pyodbc *
- pyspark *
- pystan *
- pytorch-lightning *
- pytorchcv *
- ray *
- redis *
- requests *
- scikit-learn *
- scipy *
- seaborn *
- sentence-transformers *
- shap *
- sktime *
- spacy *
- speechbrain *
- statsmodels *
- streamlit *
- sweetviz *
- tables *
- tensorflow *
- tensorflow-cpu *
- tensorflow-datasets *
- tensorflow-probability *
- tensorflow-rocm *
- tensorflow_hub *
- textattack *
- tiktoken *
- timm *
- tinydb *
- torch *
- torch-geometric *
- torchaudio *
- torchinfo *
- torchmetrics *
- torchsummary *
- torchtext *
- torchvision *
- tqdm *
- transformers *
- trimesh *
- ts *
- tsfresh *
- ultralytics *
- vaex *
- vedo *
- wandb *
- whisper *
- widgetsnbextension *
- xarray *
- xgboost *
- xlrd *
- yellowbrick *
- zstandard *