t4ai-signature-detect-server

This project provides a pipeline for deploying and performing inference with the YOLOv8 object detection model using the Triton Inference Server. It supports integration with local systems, Docker-based setups, or Google Cloud’s Vertex AI. The repository includes scripts for automated deployment, benchmarks and GUI inference.

https://github.com/tech4ai/t4ai-signature-detect-server

Keywords

deep-learning python triton-inference-server yolov8

Last synced: 9 months ago · JSON representation ·

Repository

This project provides a pipeline for deploying and performing inference with the YOLOv8 object detection model using the Triton Inference Server. It supports integration with local systems, Docker-based setups, or Google Cloud’s Vertex AI. The repository includes scripts for automated deployment, benchmarks and GUI inference.

Basic Info

Host: GitHub
Owner: tech4ai
License: apache-2.0
Language: Jupyter Notebook
Default Branch: develop
Homepage:
Size: 151 MB

Statistics

Stars: 12
Watchers: 3
Forks: 1
Open Issues: 0
Releases: 1

Topics

deep-learning python triton-inference-server yolov8

Created over 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme Contributing License Code of conduct Citation Codeowners Security

README.md

Object Detection with Triton Inference Server

This project provides a pipeline for deploying and performing inference with a YOLOv8 object detection model using Triton Inference Server on Google Cloud's Vertex AI, locally or Docker based systems. The repository includes scripts for automating the deployment process, a graphical user interface for inference, and performance analysis tools for optimizing the model's performance.

📁 Project Structure

Key Files

requirements.txt: Lists the external libraries and dependencies required for the project.
server/: Contains scripts for deploying the model to Triton Inference Server.
- local/: Scripts for running the Triton Inference Server locally.
- vertexai/: Scripts for deploying the model to Vertex AI Endpoint.
signature-detection/: Contains scripts for performing inference with the YOLOv8 model.
- analyzer/: Contains results and configuration for performance analysis using Triton Model Analyzer.
- inference/: Scripts for performing inference using Triton Client, Vertex AI, or locally and GUI for visualization.
  - inference_onnx.py: Script for performing inference with ONNX runtime locally.
  - inference_pipeline.py: Script for performing inference on images using different methods.
  - predictors.py: Contains the predictor classes for different inference methods. You can add new predictors for custom inference methods.
- gui/: Contains the Gradio interface for interacting with the deployed model. The inference_gui.py script can be used to test the model in real time. The UI has built-in examples and plots of results and performance.
- models/: Contains the Model Repository for Triton Server, including the YOLOv8 model and pre/post-processing scripts in a Ensemble Model.
- data/: Contains the datasets and data processing scripts.
- utils/: Scripts for uploading/download the model to/from Google Cloud Storage or Azure Stoage and exporting the model to ONNX/TensorRT format.
Dockerfile: Contains the configuration for building the Docker image for Triton Inference Server.
- Dockerfile.dev: Contains the configuration for building the Docker image for local development.
- docker-compose.yml: Contains the configuration for running Dockerfile.dev.
entrypoint.sh: Script for initializing the Triton Inference Server with the required configurations.
LICENSE: The license for the project.

🛠️ Features

Seamless Model Deployment: Automates the deployment of the YOLOv8 model using Triton Inference Server.
Multi-Backend Support: Allows inference locally, on Vertex AI, or directly with Triton Client.
Optimized Performance: Utilizes Triton's features like dynamic batching, OpenVINO backend and Ensemble Model for efficient inference.
GUI for Easy Inference: Provides an intuitive Gradio interface for interacting with the deployed model.
Automated Scripts: Includes scripts for model uploading, server startup, and resource cleanup.

💻 Installation

Clone the repository: bash git clone https://github.com/your-username/t4ai-triton-server.git
Install dependencies (Optional: Create a virtual environment): bash pip install -r requirements.txt
Configure your environment: Set up Google Cloud credentials and env file (See .env.example).
Build and deploy:
- Vertex AI: Follow the instructions in deploy_vertex_ai.sh to deploy the model to Vertex AI Endpoint. Or programmatically using nvidia_triton_custom_container_prediction.ipynb.
- Docker: Run the Triton Inference Server using the provided Dockerfile. The serve_triton_local_.py script can be used to start the server locally.
- docker compose: You can use the provided docker-compose.yml.
Run inference: The scripts in signature-detection/inference can be used to perform inference on images using differents methods (requests, triton client, vertex ai).
- GUI: Use the inference_gui.py to test the deployed model and visualize the results.
- CLI: Use the inference_pipeline.py script to select predictor and perform inference on test dataset images.
- ONNX: Use the inference_onnx.py script to perform inference with the ONNX runtime locally.

🧩 Ensemble Model

The repository includes an Ensemble Model for the YOLOv8 object detection model. The Ensemble Model combines the YOLOv8 model with pre and post-processing scripts to perform inference on images. The model repository is located in the models/ directory.

```mermaid flowchart TB subgraph "Triton Inference Server" direction TB subgraph "Ensemble Model Pipeline" direction TB subgraph Input raw["rawimage (UINT8, [-1])"] conf["confidencethreshold (FP16, [1])"] iou["iou_threshold (FP16, [1])"] end

        subgraph "Preprocess Py-Backend"
            direction TB
            pre1["Decode Image
                BGR to RGB"]
            pre2["Resize (640x640)"]
            pre3["Normalize (/255.0)"]
            pre4["Transpose
            [H,W,C]->[C,H,W]"]
            pre1 --> pre2 --> pre3 --> pre4
        end

        subgraph "YOLOv8 Model ONNX Backend"
            yolo["Inference YOLOv8s"]
        end

        subgraph "Postproces Python Backend"
            direction TB
            post1["Transpose
               Outputs"]
            post2["Filter Boxes (confidence_threshold)"]
            post3["NMS (iou_threshold)"]
            post4["Format Results [x,y,w,h,score]"]
            post1 --> post2 --> post3 --> post4
        end

        subgraph Output
            result["detection_result
                (FP16, [-1,5])"]
        end

        raw --> pre1
        pre4 --> |"preprocessed_image (FP32, [3,-1,-1])"| yolo
        yolo --> |"output0"| post1
        conf --> post2
        iou --> post3
        post4 --> result
    end
end

subgraph Client
    direction TB
    client_start["Client Application"]
    response["Detections Result
            [x,y,w,h,score]"]
end

client_start -->|"HTTP/gRPC Request
      with raw image
      confidence_threshold
      iou_threshold"| raw
result -->|"HTTP/gRPC Response with detections"| response

style Client fill:#e6f3ff,stroke:#333
style Input fill:#f9f,stroke:#333
style Output fill:#9ff,stroke:#333

```

⚡ Inference

The inference module allows you to perform image analysis using different methods, leveraging both local and cloud-based solutions. The pipeline is designed to be flexible and supports multiple prediction methods, making it easy to experiment and deploy in different environments.

Available Methods

The pipeline supports the following inference methods:

Triton Client: Inference using the Triton Inference Server SDK.
Vertex AI: Inference using Google Cloud's Vertex AI Endpoint.
HTTP: Inference using HTTP requests to the Triton Inference Server.

How To Use

The inference module provides both a graphical user interface (GUI) and command-line tools for performing inference.

1. Graphical User Interface (GUI)

The GUI allows you to interactively test the deployed model and visualize the results in real-time.

Script: inference_gui.py
Usage: Run the script to launch the GUI interface.

bash python signature-detection/gui/inference_gui.py --triton-url {triton_url}

https://github.com/user-attachments/assets/d41a45a1-8783-41a6-b963-b315d0e994b4

2. Command-Line Interface (CLI)

The CLI tool provides a flexible way to perform inference on a dataset using different predictors.

Script: inference_pipeline.py
Usage: The script will show a menu to select a predictor and perform inference on the test dataset.

bash python signature-detection/inference/inference_pipeline.py

💡

This script calculates metrics of inference time and gives you a tabulated final report like this:

        
+-----------------------+----------------------+
| Métrica               | Valor                |
+=======================+======================+
| Tempo médio (ms)      | 141.20447635650635   |
+----------------------------+-----------------+
| Desvio padrão (ms)    | 17.0417248165512     |
+----------------------------+-----------------+
| Tempo máximo (ms)     | 175.67205429077148   |
+----------------------------+-----------------+
| Tempo mínimo (ms)     | 125.48470497131348   |
+----------------------------+-----------------+
| Tempo total (min)     | 00:02:541            |
+----------------------------+-----------------+
| Número de inferências | 18                   |
+----------------------------+-----------------+

3. ONNX Runtime

For local inference without relying on external services, you can use the ONNX runtime.

Script: inference_onnx.py
Usage: Perform inference with the ONNX runtime locally.

bash python signature-detection/inference/inference_onnx.py \ --model_path {onnx_model_path} \ --img './input/test_image.jpg' \ --conf-thres 0.5 \ --iou-thres 0.5

All arguments are optional, the default values are:
- --model_path: signature-detection/models/yolov8s.onnx
- --img: Random image from the test dataset
- --conf-thres: 0.5
- --iou-thres: 0.5

Extending the Pipeline

If you need to extend the inference pipeline or add custom prediction methods, you can:

Create a new predictor class that inherits from BasePredictor.
Implement the required methods (request, format_response, etc.).
Update the InferencePipeline to support the new predictor.

Class Diagram

The inference pipeline is built around a modular class structure that allows for easy extension and customization. Here's the class hierarchy:

```mermaid classDiagram class ABC { } class BasePredictor { +init() +request(input) +formatresponse(response) +predict(input) } class HttpPredictor { +init(url) ~createpayload(image) +request(input) +formatresponse(response) } class VertexAIPredictor { +init(url, accesstoken) ~getgoogleaccesstoken() } class TritonClientPredictor { +init(url, endpoint, scheme) +request() +formatresponse(response) } class InferencePipeline { +init(predictor) +run(imagepath) ~process_response(response) }

ABC <|-- BasePredictor
BasePredictor <|-- HttpPredictor
HttpPredictor <|-- VertexAIPredictor
BasePredictor <|-- TritonClientPredictor
InferencePipeline --> BasePredictor : uses

```

🔒 Limit Endpoint Access

To control access to specific server protocols, the server uses the --http-restricted-api and --grpc-restricted-protocol flags. These flags ensure that only requests containing the required admin-key header with the correct value will have access to restricted endpoints.

Checkout the triton documentation for more information on Inference Protocols and APIs

In this project, the entrance configuration restricts access to the following endpoints via both HTTP and GRPC protocols:

Restricted Endpoints:

model-repository
model-config
shared-memory
statistics
trace

Entry Point Configuration

The entrypoint.sh script is configured to restrict access to the server's administrative endpoints. The access control is enforced via both HTTP and GRPC protocols, ensuring that only requests containing the admin-key header with the correct value will be allowed.

bash tritonserver \ --model-repository=${TRITON_MODEL_REPOSITORY} \ --model-control-mode=explicit \ --load-model=* \ --log-verbose=1 \ --allow-metrics=false \ --allow-grpc=true \ --grpc-restricted-protocol=model-repository,model-config,shared-memory,statistics,trace:admin-key=${TRITON_ADMIN_KEY} \ --http-restricted-api=model-repository,model-config,shared-memory,statistics,trace:admin-key=${TRITON_ADMIN_KEY}

Key Points:

Inference Access: The server allows inference requests from any user.
Admin Access: Access to the restricted endpoints (model-repository, model-config, etc.) is limited to requests that include the admin-key header with the correct value defined in the .env file.
GRPC Protocol: The GRPC protocol is enabled and restricted in the same way as HTTP, providing consistent security across both protocols.

This configuration ensures that sensitive operations and configurations are protected, while still allowing regular inference requests to proceed without restrictions.

📊 Model Analyzer

The Triton Model Analyzer can be used to profile the model and generate performance reports. The metrics-model-inference.csv file contains performance metrics for various configurations of the YOLOv8 model.

You can run the Model Analyzer using the following command: bash docker run -it \ -v /var/run/docker.sock:/var/run/docker.sock \ -v $(pwd)/signature-detection/models:/signature-detection/models \ --net=host nvcr.io/nvidia/tritonserver:24.11-py3-sdk

bash model-analyzer profile -f perf.yaml \ --triton-launch-mode=remote --triton-http-endpoint=localhost:8000 \ --output-model-repository-path /signature-detection/analyzer/configs \ --export-path profile_results --override-output-model-repository \ --collect-cpu-metrics --monitoring-interval=5

bash model-analyzer report --report-model-configs yolov8s_config_0,yolov8s_config_12,yolov8s_config_4,yolov8s_config_8 ... --export-path /workspace --config-file perf.yaml

You can modify the perf.yaml file to experiment with different configurations and analyze the performance of the model in your deployment environment. See the Triton Model Analyzer documentation for more details.

🤗 Model & Dataset Resources

This project uses a custom-trained YOLOv8 model for signature detection. All model weights, training artifacts, and the dataset are hosted on Hugging Face to comply with Ultralytics' YOLO licensing requirements and to ensure proper versioning and documentation.

Model Repository: Contains the trained model weights, ONNX exports, and comprehensive model card detailing the training process, performance metrics, and usage guidelines.

Dataset Repository: Includes the training dataset, validation splits, and detailed documentation about data collection and preprocessing steps.

Demo Space: Provides a live demo space for testing the model and dataset using the Hugging Spaces.

🧰 Utils

The utils/ folder contains scripts designed to simplify interactions with cloud storage providers and the process of exporting machine learning models. Below is an overview of the available scripts and their usage examples.

1. Downloading Models from Cloud Storage

The download_from_cloud.py script allows you to download models or other files from Google Cloud Storage (GCP) or Azure Blob Storage. Use the appropriate arguments to specify the provider, storage credentials, and paths.

Google Cloud Storage (GCP): bash python signature-detection/utils/download_from_cloud.py --provider gcp --bucket-name <your-bucket-name>
Azure Blob Storage: bash python signature-detection/utils/download_from_cloud.py --provider az --container-name <your-container-name> --connection-string "<your-connection-string>"

Arguments: - --provider: Specify the cloud provider (gcp or az). - --bucket-name: GCP bucket name (required for gcp). - --container-name: Azure container name (required for az). - --connection-string: Azure connection string (required for az). - --local-folder: Local folder to save downloaded files (default: models folder). - --remote-folder: Remote folder path in the cloud (default: triton-server/image/signature-detection/models).

2. Uploading Models to Cloud Storage

The upload_models_to_cloud.py script allows you to upload models or files from a local directory to either GCP or Azure storage.

Google Cloud Storage (GCP): bash python signature-detection/utils/upload_models_to_cloud.py --provider gcp --bucket-name <your-bucket-name>
Azure Blob Storage: bash python signature-detection/utils/upload_models_to_cloud.py --provider az --container-name <your-container-name> --connection-string "<your-connection-string>"

Arguments: - --provider: Specify the cloud provider (gcp or az). - --bucket-name: GCP bucket name (required for gcp). - --container-name: Azure container name (required for az). - --connection-string: Azure connection string (required for az). - --local-folder: Local folder containing files to upload (default: models folder). - --remote-folder: Remote folder path in the cloud (default: triton-server/image/signature-detection/models).

3. Exporting Models

The export_model.py script simplifies the process of exporting YOLOv8 models to either ONNX or TensorRT formats. This is useful for deploying models in environments requiring specific formats.

Export to ONNX: bash python signature-detection/utils/export_model.py --model-path /path/to/yolov8s.pt --output-path model.onnx --format onnx
Export to TensorRT: bash python signature-detection/utils/export_model.py --model-path /path/to yolov8s.pt --output-path model.engine --format tensorrt

Arguments: - --model-path: Path to the input model file (e.g., YOLOv8 .pt file). - --output-path: Path to save the exported model. - --format: Export format (onnx or tensorrt).

🤝 Contributors

_{Samuel Lima Braz}	_{Jorge Willians}	_{Nixon Silva}	_{ronaldobalzi-tech4h}
Add your contributions

Contributing

First off, thanks for taking the time to contribute! Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make will benefit everybody else and are greatly appreciated.

Please read our contribution guidelines, and thank you for being involved!

License

This project is licensed under the Apache Software License 2.0.

See LICENSE for more information.

Owner

Name: Tech4Ai
Login: tech4ai
Kind: organization

Repositories: 1
Profile: https://github.com/tech4ai

Citation (CITATION.cff)

# This CITATION.cff file was generated with https://bit.ly/cffinit.

cff-version: 1.2.0
title: "Signature Detect Server"
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - name: Tech4Humans
    country: BR
    website: 'https://www.tech4.ai/'
    email: iag@tech4h.com.br
repository-code: 'https://github.com/tech4ai/t4ai-signature-detect-server'
repository: >-
  https://huggingface.co/collections/tech4humans/deteccao-de-assinaturas-678b087d8b0ce22ae8c3f60e
abstract: >-
  This project provides a pipeline for deploying and
  performing inference with a YOLOv8 object detection model
  using Triton Inference Server on Google Cloud's Vertex AI,
  locally or Docker based systems. The repository includes
  scripts for automating the deployment process, a graphical
  user interface for inference, and performance analysis
  tools for optimizing the model's performance.
keywords:
  - Object Detection
  - Triton Inference Server
  - Yolov8
  - Artificial Intelligence
license: Apache-2.0

GitHub Events

Total

Create event: 1
Issues event: 1
Release event: 1
Watch event: 15
Push event: 10
Pull request event: 2
Fork event: 3

Last Year

Create event: 1
Issues event: 1
Release event: 1
Watch event: 15
Push event: 10
Pull request event: 2
Fork event: 3