lab-dotnet-spark

Proof of Concept for Local Testable Spark Environment

https://github.com/dpvreony/lab-dotnet-spark

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.7%) to scientific vocabulary

Keywords

aspire spark
Last synced: 8 months ago · JSON representation ·

Repository

Proof of Concept for Local Testable Spark Environment

Basic Info
  • Host: GitHub
  • Owner: dpvreony
  • License: mit
  • Language: C#
  • Default Branch: main
  • Homepage:
  • Size: 120 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 2
  • Releases: 0
Topics
aspire spark
Created about 1 year ago · Last pushed 8 months ago
Metadata Files
Readme Contributing License Code of conduct Citation

README.md

Proof of Concept for Local Testable Spark Environment

This repository demonstrates how to set up a local environment to test against Spark using Aspire. The benefits of having such an ability include:

  • Rapid Development: Quickly iterate and test Spark applications locally without needing a full cluster setup.
  • Cost Efficiency: Save costs by testing locally rather than on expensive cloud resources.
  • Debugging: Easier to debug and troubleshoot issues in a controlled local environment.

Testable Spark Environment using Aspire

The Program.cs file sets up a testable Spark environment using Aspire. Below is an overview of the process:

  • Application Setup: The application is initiated using DistributedApplication.CreateBuilder(args) which sets up the environment.
  • SQL Server Integration: A SQL Server container is added with builder.AddSqlServer("sql").WithDataVolume().
  • Spark Master Node:
    • A Spark master container is configured with various environment variables to run in master mode.
    • HTTP and Spark master endpoints are set up for communication.
  • Spark Worker Node:
    • A Spark worker container is configured and linked to the master node.
    • Environment variables are set to connect to the Spark master.
  • Jupyter Notebook Integration:
    • A Jupyter container is added to the environment, configured with a token for security.
    • Various volumes and endpoints are set up for the Jupyter container.
    • The Jupyter container is set to wait for the Spark master and worker containers.

This setup ensures that a testable Spark environment is created, integrating SQL Server and Jupyter Notebook for a comprehensive testing setup.

Architecture Diagram

mermaid graph TD SQLServer[SQL Server] -->|Data Source| SparkMaster(Spark Master) SparkMaster --> SparkWorker(Spark Worker) SparkMaster --> JupyterNotebook(Jupyter Notebook) SparkWorker --> JupyterNotebook Kafka --> SparkWorker Kafka --> KafkaUI SQLServer --> CloudBeaver

  • SQL Server: Acts as the data source for the Spark environment.
  • Spark Master: Coordinates the Spark application, managing resources and scheduling tasks.
  • Spark Worker: Executes tasks assigned by the Spark Master.
  • Jupyter Notebook: Provides an interactive environment to run Spark jobs, analyze data, and visualize results.

Owner

  • Name: David Vreony
  • Login: dpvreony
  • Kind: user
  • Location: UK

.NET Developer \ Analyst \ Architect with experience in Developer Experience, Financial Services and Healthcare domains

Citation (citation.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Building a lab for local development of spark
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: David
    family-names: Vreony
    orcid: 'https://orcid.org/0000-0001-6855-0779'
repository-code: 'https://github.com/dpvreony/lab-dotnet-spark'
license: MIT

GitHub Events

Total
  • Delete event: 36
  • Issue comment event: 53
  • Public event: 1
  • Push event: 123
  • Pull request review event: 5
  • Pull request event: 71
  • Create event: 29
Last Year
  • Delete event: 36
  • Issue comment event: 53
  • Public event: 1
  • Push event: 123
  • Pull request review event: 5
  • Pull request event: 71
  • Create event: 29

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 69
  • Total Committers: 2
  • Avg Commits per committer: 34.5
  • Development Distribution Score (DDS): 0.275
Past Year
  • Commits: 69
  • Committers: 2
  • Avg Commits per committer: 34.5
  • Development Distribution Score (DDS): 0.275
Top Committers
Name Email Commits
renovate[bot] 2****] 50
David Vreony d****y 19

Issues and Pull Requests

Last synced: 8 months ago

All Time
  • Total issues: 1
  • Total pull requests: 60
  • Average time to close issues: N/A
  • Average time to close pull requests: 4 days
  • Total issue authors: 1
  • Total pull request authors: 2
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.42
  • Merged pull requests: 46
  • Bot issues: 1
  • Bot pull requests: 58
Past Year
  • Issues: 1
  • Pull requests: 60
  • Average time to close issues: N/A
  • Average time to close pull requests: 4 days
  • Issue authors: 1
  • Pull request authors: 2
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.42
  • Merged pull requests: 46
  • Bot issues: 1
  • Bot pull requests: 58
Top Authors
Issue Authors
  • renovate[bot] (1)
Pull Request Authors
  • renovate[bot] (58)
  • dpvreony (2)
Top Labels
Issue Labels
Pull Request Labels
dependencies (1)