https://github.com/darkstarstrix/datavolt
Reusable data engineering toolkit My personal data infrastructure
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.6%) to scientific vocabulary
Keywords
Repository
Reusable data engineering toolkit My personal data infrastructure
Basic Info
Statistics
- Stars: 17
- Watchers: 1
- Forks: 2
- Open Issues: 0
- Releases: 2
Topics
Metadata Files
README.md
DataVolt: Modular Enterprise Data Engineering Framework
Overview
DataVolt is an enterprise-grade framework for building and maintaining scalable data engineering pipelines. It provides a comprehensive suite of tools for data ingestion, transformation, and processing, enabling organizations to standardize their data operations and accelerate development cycles.
Modular VoltModule Architecture
At the core of DataVolt is the concept of VoltModules: modular, domain-scoped directories (mini_dirs) that encapsulate a single use case or data engineering workflow. Each VoltModule follows a consistent internal structure and pattern, making it easy to:
- Reuse, extend, or compose modules for new domains or projects
- Standardize data engineering practices across teams
- Rapidly spin up new pipelines by combining or customizing VoltModules
VoltModules can cover a wide range of data engineering needs—from market analysis to tokenization, feature engineering, and beyond. The repository provides a rich set of ready-to-use modules, and you can easily add your own or extend existing ones.
Repository Structure
Note: The structure below is an illustrative example of how DataVolt is organized around VoltModules and shared utilities. Your actual repository may differ. To view your current structure, use a tool like
treeorlsin your project root.
DataVolt/
├── modules/ # Collection of VoltModules (domain-specific mini_dirs)
│ ├── market_analysis/ # Example VoltModule: Market Analysis
│ │ ├── __init__.py
│ │ └── ... # Module-specific logic
│ ├── tokenization/ # Example VoltModule: Tokenization
│ │ ├── __init__.py
│ │ └── ...
│ └── ... # Add or extend VoltModules as needed
├── loaders/ # Data Ingestion Layer (shared utilities)
│ ├── __init__.py
│ └── ...
├── preprocess/ # Data Transformation Layer (shared utilities)
│ ├── __init__.py
│ └── ...
├── ext/ # Extension Layer (logging, custom steps, etc.)
│ ├── logger.py
│ └── ...
└── ...
- modules/: Houses all VoltModules, each in its own directory, following a common pattern.
- loaders/, preprocess/, ext/: Provide shared utilities and frameworks for use within VoltModules or standalone.
Key Features
- VoltModules: Modular, domain-scoped, and reusable mini_dirs for any data engineering use case
- Rapid Customization: Add, extend, or compose modules to fit evolving requirements
- Standardization: Consistent patterns and internal structure across all modules
- Comprehensive Toolkit: Everything needed for data engineering, from ingestion to advanced analytics
Installation
bash
pip install datavolt
Or with uv:
bash
uv install datavolt
Quick Start
Using a VoltModule
```python from datavolt.modules.market_analysis import MarketAnalysisModule
module = MarketAnalysisModule(config={...}) result = module.run() ```
Building Your Own VoltModule
- Create a new directory under
modules/(e.g.,my_use_case/) - Add an
__init__.pyand implement your logic following the VoltModule pattern - Import and use your module as needed
Example: Data Ingestion and Transformation
```python from datavolt.loaders.csv_loader import CSVLoader from datavolt.preprocess.pipeline import PreprocessingPipeline
loader = CSVLoader(file_path="data.csv") dataset = loader.load()
pipeline = PreprocessingPipeline([...]) processed_dataset = pipeline.run(dataset) ```
Extending DataVolt
- Add new VoltModules for new domains or workflows
- Plug in tools (e.g., new loaders, preprocessors) into existing modules
- Compose modules to build complex pipelines
Use Cases
- Market analysis, tokenization, and domain-specific analytics
- Standardized, reproducible data preprocessing
- Scalable machine learning and feature engineering pipelines
- Integration with cloud, SQL, and ML frameworks
Contributing
We welcome contributions! To add a new VoltModule or extend the framework:
- Fork the repository
- Create a feature branch (
git checkout -b feature/my-module) - Add your module under
modules/and follow the VoltModule pattern - Commit and push your changes
- Open a Pull Request
License
DataVolt is distributed under the MIT License. See LICENSE for details.
Support
- Documentation: DataVolt Docs
- Issue Tracking: GitHub Issues
- Professional Support: Contact allanw.mk@gmail.com
DataVolt: Empowering Modular Data Engineering Excellence
Owner
- Name: Allan Murimi Wandia
- Login: DarkStarStrix
- Kind: user
- Location: U.S.A
- Company: Freelance
- Website: https://www.kaggle.com/allanwandia
- Repositories: 1
- Profile: https://github.com/DarkStarStrix
Full stack Dev Turning ideas into projects
GitHub Events
Total
- Release event: 2
- Watch event: 16
- Delete event: 6
- Push event: 51
- Pull request event: 12
- Create event: 7
Last Year
- Release event: 2
- Watch event: 16
- Delete event: 6
- Push event: 51
- Pull request event: 12
- Create event: 7
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 0
- Total pull requests: 6
- Average time to close issues: N/A
- Average time to close pull requests: about 16 hours
- Total issue authors: 0
- Total pull request authors: 2
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 6
- Bot issues: 0
- Bot pull requests: 6
Past Year
- Issues: 0
- Pull requests: 6
- Average time to close issues: N/A
- Average time to close pull requests: about 16 hours
- Issue authors: 0
- Pull request authors: 2
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 6
- Bot issues: 0
- Bot pull requests: 6
Top Authors
Issue Authors
Pull Request Authors
- dependabot[bot] (4)
- imgbot[bot] (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 13 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 1
- Total maintainers: 1
pypi.org: datavolt
A reusable workflow for data engineering pipelines
- Homepage: https://github.com/DarkStarStrix/DataVolt
- Documentation: https://datavolt.readthedocs.io/
- License: MIT License
-
Latest release: 0.0.1
published about 1 year ago
Rankings
Maintainers (1)
Dependencies
- actions/checkout v4 composite
- actions/setup-python v5 composite
- actions/upload-artifact v4 composite
- boto3 *
- configparser *
- joblib *
- matplotlib *
- nltk *
- numpy >=2.0.0
- pandas >=1.3.0
- pillow *
- plotly *
- psutil *
- pytest *
- scikit-learn >=1.5.0
- scipy *
- seaborn *
- setuptools >=70.0.0
- sqlalchemy *
- torch >=2.5.0
- boto3 *
- joblib *
- neptune-client >=0.9.0
- numpy ==1.22.0
- pandas >=1.3.0
- pillow *
- scikit-learn ==1.5.0
- setuptools >=70.0.0
- sqlalchemy *
- torch ==2.5.0
- actions/checkout v4 composite
- actions/download-artifact v4 composite
- actions/setup-python v5 composite
- actions/upload-artifact v4 composite
- pypa/gh-action-pypi-publish release/v1 composite
- tempfile 3.8 development
- log 0.4
- polars 0.35
- rayon 1.8
- sysinfo 0.29
- thiserror 1.0