https://github.com/darkstarstrix/datavolt

Reusable data engineering toolkit My personal data infrastructure

https://github.com/darkstarstrix/datavolt

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.6%) to scientific vocabulary

Keywords

data-engineering data-engineering-pipeline data-loading infrastructure performance preprocessing
Last synced: 5 months ago · JSON representation

Repository

Reusable data engineering toolkit My personal data infrastructure

Basic Info
  • Host: GitHub
  • Owner: DarkStarStrix
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: master
  • Homepage:
  • Size: 56.6 MB
Statistics
  • Stars: 17
  • Watchers: 1
  • Forks: 2
  • Open Issues: 0
  • Releases: 2
Topics
data-engineering data-engineering-pipeline data-loading infrastructure performance preprocessing
Created about 1 year ago · Last pushed 8 months ago
Metadata Files
Readme Changelog Contributing License Code of conduct

README.md

DataVolt: Modular Enterprise Data Engineering Framework

DataVolt Logo

Coverage License Version Python PyPI CI

Overview

DataVolt is an enterprise-grade framework for building and maintaining scalable data engineering pipelines. It provides a comprehensive suite of tools for data ingestion, transformation, and processing, enabling organizations to standardize their data operations and accelerate development cycles.

Modular VoltModule Architecture

At the core of DataVolt is the concept of VoltModules: modular, domain-scoped directories (mini_dirs) that encapsulate a single use case or data engineering workflow. Each VoltModule follows a consistent internal structure and pattern, making it easy to:

  • Reuse, extend, or compose modules for new domains or projects
  • Standardize data engineering practices across teams
  • Rapidly spin up new pipelines by combining or customizing VoltModules

VoltModules can cover a wide range of data engineering needs—from market analysis to tokenization, feature engineering, and beyond. The repository provides a rich set of ready-to-use modules, and you can easily add your own or extend existing ones.

Repository Structure

Note: The structure below is an illustrative example of how DataVolt is organized around VoltModules and shared utilities. Your actual repository may differ. To view your current structure, use a tool like tree or ls in your project root.

DataVolt/ ├── modules/ # Collection of VoltModules (domain-specific mini_dirs) │ ├── market_analysis/ # Example VoltModule: Market Analysis │ │ ├── __init__.py │ │ └── ... # Module-specific logic │ ├── tokenization/ # Example VoltModule: Tokenization │ │ ├── __init__.py │ │ └── ... │ └── ... # Add or extend VoltModules as needed ├── loaders/ # Data Ingestion Layer (shared utilities) │ ├── __init__.py │ └── ... ├── preprocess/ # Data Transformation Layer (shared utilities) │ ├── __init__.py │ └── ... ├── ext/ # Extension Layer (logging, custom steps, etc.) │ ├── logger.py │ └── ... └── ...

  • modules/: Houses all VoltModules, each in its own directory, following a common pattern.
  • loaders/, preprocess/, ext/: Provide shared utilities and frameworks for use within VoltModules or standalone.

Key Features

  • VoltModules: Modular, domain-scoped, and reusable mini_dirs for any data engineering use case
  • Rapid Customization: Add, extend, or compose modules to fit evolving requirements
  • Standardization: Consistent patterns and internal structure across all modules
  • Comprehensive Toolkit: Everything needed for data engineering, from ingestion to advanced analytics

Installation

bash pip install datavolt

Or with uv: bash uv install datavolt

Quick Start

Using a VoltModule

```python from datavolt.modules.market_analysis import MarketAnalysisModule

module = MarketAnalysisModule(config={...}) result = module.run() ```

Building Your Own VoltModule

  1. Create a new directory under modules/ (e.g., my_use_case/)
  2. Add an __init__.py and implement your logic following the VoltModule pattern
  3. Import and use your module as needed

Example: Data Ingestion and Transformation

```python from datavolt.loaders.csv_loader import CSVLoader from datavolt.preprocess.pipeline import PreprocessingPipeline

loader = CSVLoader(file_path="data.csv") dataset = loader.load()

pipeline = PreprocessingPipeline([...]) processed_dataset = pipeline.run(dataset) ```

Extending DataVolt

  • Add new VoltModules for new domains or workflows
  • Plug in tools (e.g., new loaders, preprocessors) into existing modules
  • Compose modules to build complex pipelines

Use Cases

  • Market analysis, tokenization, and domain-specific analytics
  • Standardized, reproducible data preprocessing
  • Scalable machine learning and feature engineering pipelines
  • Integration with cloud, SQL, and ML frameworks

Contributing

We welcome contributions! To add a new VoltModule or extend the framework:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/my-module)
  3. Add your module under modules/ and follow the VoltModule pattern
  4. Commit and push your changes
  5. Open a Pull Request

License

DataVolt is distributed under the MIT License. See LICENSE for details.

Support


DataVolt: Empowering Modular Data Engineering Excellence

Owner

  • Name: Allan Murimi Wandia
  • Login: DarkStarStrix
  • Kind: user
  • Location: U.S.A
  • Company: Freelance

Full stack Dev Turning ideas into projects

GitHub Events

Total
  • Release event: 2
  • Watch event: 16
  • Delete event: 6
  • Push event: 51
  • Pull request event: 12
  • Create event: 7
Last Year
  • Release event: 2
  • Watch event: 16
  • Delete event: 6
  • Push event: 51
  • Pull request event: 12
  • Create event: 7

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 0
  • Total pull requests: 6
  • Average time to close issues: N/A
  • Average time to close pull requests: about 16 hours
  • Total issue authors: 0
  • Total pull request authors: 2
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 6
  • Bot issues: 0
  • Bot pull requests: 6
Past Year
  • Issues: 0
  • Pull requests: 6
  • Average time to close issues: N/A
  • Average time to close pull requests: about 16 hours
  • Issue authors: 0
  • Pull request authors: 2
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 6
  • Bot issues: 0
  • Bot pull requests: 6
Top Authors
Issue Authors
Pull Request Authors
  • dependabot[bot] (4)
  • imgbot[bot] (1)
Top Labels
Issue Labels
Pull Request Labels
dependencies (4) rust (3)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 13 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 1
  • Total maintainers: 1
pypi.org: datavolt

A reusable workflow for data engineering pipelines

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 13 Last month
Rankings
Dependent packages count: 9.7%
Average: 32.2%
Dependent repos count: 54.7%
Maintainers (1)
Last synced: 6 months ago

Dependencies

.github/workflows/Tests.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v5 composite
  • actions/upload-artifact v4 composite
pyproject.toml pypi
requirements.txt pypi
  • boto3 *
  • configparser *
  • joblib *
  • matplotlib *
  • nltk *
  • numpy >=2.0.0
  • pandas >=1.3.0
  • pillow *
  • plotly *
  • psutil *
  • pytest *
  • scikit-learn >=1.5.0
  • scipy *
  • seaborn *
  • setuptools >=70.0.0
  • sqlalchemy *
  • torch >=2.5.0
setup.py pypi
  • boto3 *
  • joblib *
  • neptune-client >=0.9.0
  • numpy ==1.22.0
  • pandas >=1.3.0
  • pillow *
  • scikit-learn ==1.5.0
  • setuptools >=70.0.0
  • sqlalchemy *
  • torch ==2.5.0
.github/workflows/python-publish.yml actions
  • actions/checkout v4 composite
  • actions/download-artifact v4 composite
  • actions/setup-python v5 composite
  • actions/upload-artifact v4 composite
  • pypa/gh-action-pypi-publish release/v1 composite
Cargo.toml cargo
  • tempfile 3.8 development
  • log 0.4
  • polars 0.35
  • rayon 1.8
  • sysinfo 0.29
  • thiserror 1.0