lab-dotnet-spark
Proof of Concept for Local Testable Spark Environment
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (6.7%) to scientific vocabulary
Keywords
Repository
Proof of Concept for Local Testable Spark Environment
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 2
- Releases: 0
Topics
Metadata Files
README.md
Proof of Concept for Local Testable Spark Environment
This repository demonstrates how to set up a local environment to test against Spark using Aspire. The benefits of having such an ability include:
- Rapid Development: Quickly iterate and test Spark applications locally without needing a full cluster setup.
- Cost Efficiency: Save costs by testing locally rather than on expensive cloud resources.
- Debugging: Easier to debug and troubleshoot issues in a controlled local environment.
Testable Spark Environment using Aspire
The Program.cs file sets up a testable Spark environment using Aspire. Below is an overview of the process:
- Application Setup: The application is initiated using
DistributedApplication.CreateBuilder(args)which sets up the environment. - SQL Server Integration: A SQL Server container is added with
builder.AddSqlServer("sql").WithDataVolume(). - Spark Master Node:
- A Spark master container is configured with various environment variables to run in master mode.
- HTTP and Spark master endpoints are set up for communication.
- Spark Worker Node:
- A Spark worker container is configured and linked to the master node.
- Environment variables are set to connect to the Spark master.
- Jupyter Notebook Integration:
- A Jupyter container is added to the environment, configured with a token for security.
- Various volumes and endpoints are set up for the Jupyter container.
- The Jupyter container is set to wait for the Spark master and worker containers.
This setup ensures that a testable Spark environment is created, integrating SQL Server and Jupyter Notebook for a comprehensive testing setup.
Architecture Diagram
mermaid
graph TD
SQLServer[SQL Server] -->|Data Source| SparkMaster(Spark Master)
SparkMaster --> SparkWorker(Spark Worker)
SparkMaster --> JupyterNotebook(Jupyter Notebook)
SparkWorker --> JupyterNotebook
Kafka --> SparkWorker
Kafka --> KafkaUI
SQLServer --> CloudBeaver
- SQL Server: Acts as the data source for the Spark environment.
- Spark Master: Coordinates the Spark application, managing resources and scheduling tasks.
- Spark Worker: Executes tasks assigned by the Spark Master.
- Jupyter Notebook: Provides an interactive environment to run Spark jobs, analyze data, and visualize results.
Owner
- Name: David Vreony
- Login: dpvreony
- Kind: user
- Location: UK
- Website: http://www.dpvreony.co.uk
- Repositories: 39
- Profile: https://github.com/dpvreony
.NET Developer \ Analyst \ Architect with experience in Developer Experience, Financial Services and Healthcare domains
Citation (citation.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: Building a lab for local development of spark
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: David
family-names: Vreony
orcid: 'https://orcid.org/0000-0001-6855-0779'
repository-code: 'https://github.com/dpvreony/lab-dotnet-spark'
license: MIT
GitHub Events
Total
- Delete event: 36
- Issue comment event: 53
- Public event: 1
- Push event: 123
- Pull request review event: 5
- Pull request event: 71
- Create event: 29
Last Year
- Delete event: 36
- Issue comment event: 53
- Public event: 1
- Push event: 123
- Pull request review event: 5
- Pull request event: 71
- Create event: 29
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| renovate[bot] | 2****] | 50 |
| David Vreony | d****y | 19 |
Issues and Pull Requests
Last synced: 8 months ago
All Time
- Total issues: 1
- Total pull requests: 60
- Average time to close issues: N/A
- Average time to close pull requests: 4 days
- Total issue authors: 1
- Total pull request authors: 2
- Average comments per issue: 0.0
- Average comments per pull request: 0.42
- Merged pull requests: 46
- Bot issues: 1
- Bot pull requests: 58
Past Year
- Issues: 1
- Pull requests: 60
- Average time to close issues: N/A
- Average time to close pull requests: 4 days
- Issue authors: 1
- Pull request authors: 2
- Average comments per issue: 0.0
- Average comments per pull request: 0.42
- Merged pull requests: 46
- Bot issues: 1
- Bot pull requests: 58
Top Authors
Issue Authors
- renovate[bot] (1)
Pull Request Authors
- renovate[bot] (58)
- dpvreony (2)