https://github.com/awslabs/aws-ddk

An open source development framework to help you build data workflows and modern data architecture on AWS.

https://github.com/awslabs/aws-ddk

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.1%) to scientific vocabulary

Keywords

aws dataengineering dataops python

Keywords from Contributors

diagram reinforcement-learning labels interaction deployment
Last synced: 5 months ago · JSON representation

Repository

An open source development framework to help you build data workflows and modern data architecture on AWS.

Basic Info
Statistics
  • Stars: 269
  • Watchers: 13
  • Forks: 23
  • Open Issues: 17
  • Releases: 19
Topics
aws dataengineering dataops python
Created about 4 years ago · Last pushed 10 months ago
Metadata Files
Readme Changelog Contributing License Code of conduct

README.md

AWS DataOps Development Kit (DDK)

Actions Status npm version PyPi version NPM Downloads PyPi Downloads

Packages 🗳️

The AWS DataOps Development Kit is an open source development framework for customers that build data workflows and modern data architecture on AWS.

Based on the AWS CDK, it offers high-level abstractions allowing you to build pipelines that manage data flows on AWS, driven by DevOps best practices. The framework is extensible, you can add abstractions for your own data processing infrastructure or replace our best practices with your own standards. It's easy to share templates, so everyone in your organisation can concentrate on the business logic of dealing with their data, rather than boilerplate logic.


The DDK Core is a library of CDK constructs that you can use to build data workflows and modern data architecture on AWS, following our best practice. The DDK Core is modular and extensible, if our best practice doesn't work for you, then you can update and share your own version with the rest of your organisation by leveraging a private AWS Code Artifact repository.

You can compose constructs from the DDK Core into a DDK App. Your DDK App can also add contain constructs from the CDK Framework or the AWS Construct Library.

Overview

For a detailed walk-through, check out our Workshop or take a look at examples.

Build Data Pipelines

One of the core features of DDK is ability to create Data Pipelines. A DDK DataPipeline is a chained series of stages. It automatically “wires” the stages together using AWS EventBridge Rules .

DDK comes with a library of stages, however users can also create their own based on their use cases, and are encouraged to share them with the community.

Let's take a look at an example below:

```python ...

firehoses3stage = FirehoseToS3Stage( self, "ddk-firehose-s3", bucket=ddkbucket, dataoutputprefix="raw/", ) sqslambdastage = SqsToLambdaStage( scope=self, id="ddk-sqs-lambda", code=Code.fromasset("./lambda"), handler="index.lambdahandler", layers=[ LayerVersion.fromlayerversionarn( self, "ddk-lambda-layer-wrangler", f"arn:aws:lambda:{self.region}:336392948345:layer:AWSSDKPandas-Python39:1", ) ] )

( DataPipeline(scope=self, id="ddk-pipeline") .addstage(firehoses3stage) .addstage(sqslambdastage) ) ... ```

First, we import the required resources from the awsddkcore library, including the two stage constructs: FirehoseToS3Stage() and SqsToLambdaStage(). These two classes are then instantiated and the delivery stream is configured with the S3 prefix (raw/). Finally, the DDK DataPipeline construct is used to chain these two stages together into a data pipeline.

Complete source code of the data pipeline above can be found in AWS DDK Examples - Basic Data Pipeline

Official Resources

Getting Help

The best way to interact with our team is through GitHub. You can open an issue and choose from one of our templates for bug reports, feature requests, or documentation issues. If you have a feature request, don't forget you can search existing issues and upvote or comment on existing issues before creating a new one.

Contributing

We welcome community contributions and pull requests. Please see CONTRIBUTING.md for details on how to set up a development environment and submit code.

Other Ways to Support

One way you can support our project is by letting others know that your organisation uses the DDK. If you would like us to include your company's name and/or logo in this README file, please raise a 'Support the DDK' issue. Note that by raising a this issue (and related pull request), you are granting AWS permission to use your company’s name (and logo) for the limited purpose described here and you are confirming that you have authority to grant such permission.

License

This project is licensed under the Apache-2.0 License.

Owner

  • Name: Amazon Web Services - Labs
  • Login: awslabs
  • Kind: organization
  • Location: Seattle, WA

AWS Labs

GitHub Events

Total
  • Create event: 14
  • Release event: 1
  • Issues event: 2
  • Watch event: 12
  • Delete event: 11
  • Issue comment event: 33
  • Push event: 24
  • Pull request review comment event: 12
  • Pull request review event: 15
  • Pull request event: 24
  • Fork event: 2
Last Year
  • Create event: 14
  • Release event: 1
  • Issues event: 2
  • Watch event: 12
  • Delete event: 11
  • Issue comment event: 33
  • Push event: 24
  • Pull request review comment event: 12
  • Pull request review event: 15
  • Pull request event: 24
  • Fork event: 2

Committers

Last synced: almost 3 years ago

All Time
  • Total Commits: 199
  • Total Committers: 12
  • Avg Commits per committer: 16.583
  • Development Distribution Score (DDS): 0.422
Top Committers
Name Email Commits
Lucas Hanson l****n@g****m 115
jaidisido j****o@g****m 39
kukushking 3****g@u****m 15
dependabot[bot] 4****]@u****m 12
Nick Corbett c****n@a****k 7
anmolsgandhi 7****i@u****m 3
Cyril Fait 9****t@u****m 2
Nick Corbett n****t@g****m 2
Amazon GitHub Automation 5****o@u****m 1
Vlad Emelianov v****z@g****m 1
kukushking a****k@a****k 1
Akhil B 9****l@u****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 77
  • Total pull requests: 266
  • Average time to close issues: about 2 months
  • Average time to close pull requests: 3 days
  • Total issue authors: 9
  • Total pull request authors: 10
  • Average comments per issue: 0.77
  • Average comments per pull request: 1.72
  • Merged pull requests: 228
  • Bot issues: 0
  • Bot pull requests: 36
Past Year
  • Issues: 1
  • Pull requests: 28
  • Average time to close issues: N/A
  • Average time to close pull requests: 15 days
  • Issue authors: 1
  • Pull request authors: 3
  • Average comments per issue: 0.0
  • Average comments per pull request: 1.82
  • Merged pull requests: 10
  • Bot issues: 0
  • Bot pull requests: 16
Top Authors
Issue Authors
  • malachi-constant (41)
  • anmolsgandhi (14)
  • Rizxcviii (7)
  • kukushking (5)
  • devansh-gandhi (3)
  • stevebanik (2)
  • awspbade (1)
  • noah-paige (1)
  • awsdiegorad (1)
Pull Request Authors
  • malachi-constant (250)
  • dependabot[bot] (47)
  • LeonLuttenberger (11)
  • stevebanik (2)
  • cyclich (2)
  • anmolsgandhi (2)
  • kukushking (1)
  • devansh-gandhi (1)
  • EgorDm (1)
  • Rizxcviii (1)
Top Labels
Issue Labels
core (26) bug (18) enhancement (15) feature-request (13) backlog (12) typescript (11) documentation (6) effort/small (5) effort/medium (4) effort/large (4) question (3) investigating (3) decision-point (3) testing (3) cli (3) p1 (1) good first issue (1) help wanted (1) wontfix (1)
Pull Request Labels
dependencies (55) core (26) javascript (22) ruby (22) bug (20) typescript (17) enhancement (15) feature-request (7) documentation (5) testing (3) python (2) security (1)

Packages

  • Total packages: 3
  • Total downloads:
    • pypi 1,106 last-month
    • npm 454 last-month
  • Total dependent packages: 1
    (may contain duplicates)
  • Total dependent repositories: 6
    (may contain duplicates)
  • Total versions: 49
  • Total maintainers: 6
pypi.org: aws-ddk-core

The AWS DataOps Development Kit is an open source development framework for customers that build data workflows and modern data architecture on AWS.

  • Versions: 23
  • Dependent Packages: 0
  • Dependent Repositories: 3
  • Downloads: 1,032 Last month
Rankings
Stargazers count: 4.9%
Downloads: 5.3%
Average: 7.0%
Dependent packages count: 7.4%
Forks count: 8.6%
Dependent repos count: 9.2%
Maintainers (3)
Last synced: 6 months ago
pypi.org: aws-ddk

AWS DataOps Development Kit - CLI

  • Versions: 15
  • Dependent Packages: 0
  • Dependent Repositories: 3
  • Downloads: 74 Last month
Rankings
Stargazers count: 4.9%
Downloads: 6.4%
Average: 7.3%
Dependent packages count: 7.4%
Forks count: 8.6%
Dependent repos count: 9.2%
Maintainers (2)
Last synced: 6 months ago
npmjs.org: aws-ddk-core

The AWS DataOps Development Kit is an open source development framework for customers that build data workflows and modern data architecture on AWS.

  • Versions: 11
  • Dependent Packages: 1
  • Dependent Repositories: 0
  • Downloads: 454 Last month
Rankings
Dependent repos count: 21.2%
Average: 26.5%
Dependent packages count: 29.0%
Downloads: 29.4%
Last synced: 6 months ago

Dependencies

docs/Gemfile rubygems
  • github-pages >= 0 development
  • jekyll-feed >= 0 development
  • jekyll-theme-cayman ~> 0.2
  • tzinfo ~> 1.2
  • tzinfo-data >= 0
  • wdm ~> 0.1.1
  • webrick ~> 1.7
docs/Gemfile.lock rubygems
  • 102 dependencies
.github/workflows/pages.yml actions
  • actions/checkout v2 composite
  • peaceiris/actions-gh-pages v3 composite
.github/workflows/docs-release.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v4 composite
  • peaceiris/actions-gh-pages v3 composite
.github/workflows/build.yml actions
  • actions/checkout v3 composite
  • actions/download-artifact v3 composite
  • actions/setup-node v3 composite
  • actions/setup-python v4 composite
  • actions/upload-artifact v3 composite
.github/workflows/pull-request-lint.yml actions
  • amannn/action-semantic-pull-request v5.0.2 composite
.github/workflows/upgrade-main.yml actions
  • actions/checkout v3 composite
  • actions/download-artifact v3 composite
  • actions/setup-node v3 composite
  • actions/upload-artifact v3 composite
  • peter-evans/create-pull-request v4 composite
docs/_plugins/jekyll-tabs/jekyll-tabs.gemspec rubygems
  • jekyll >= 3.0, < 5.0
package-lock.json npm
  • 919 dependencies
package.json npm
  • @aws-cdk/aws-glue-alpha 2.85.0-alpha.0 development
  • @aws-cdk/aws-kinesisfirehose-alpha 2.85.0-alpha.0 development
  • @aws-cdk/aws-kinesisfirehose-destinations-alpha 2.85.0-alpha.0 development
  • @aws-cdk/integ-tests-alpha 2.85.0-alpha.0 development
  • @types/jest ^27 development
  • @types/node ^16 development
  • @typescript-eslint/eslint-plugin ^6 development
  • @typescript-eslint/parser ^6 development
  • aws-cdk-lib 2.85.0 development
  • constructs 10.0.5 development
  • eslint ^8 development
  • eslint-config-prettier ^8.10.0 development
  • eslint-import-resolver-node ^0.3.9 development
  • eslint-import-resolver-typescript ^3.6.1 development
  • eslint-plugin-import ^2.29.0 development
  • eslint-plugin-prettier ^4.2.1 development
  • jest ^27 development
  • jest-junit ^15 development
  • jsii 1.x development
  • jsii-diff ^1.91.0 development
  • jsii-docgen ^7.2.9 development
  • jsii-pacmak ^1.91.0 development
  • jsii-rosetta 1.x development
  • npm-check-updates ^16 development
  • prettier ^2.8.8 development
  • projen ^0.76.15 development
  • standard-version ^9 development
  • ts-jest ^27 development
  • typescript ^4.9.5 development
  • deepmerge 4.0.0
  • ts-node ^10.9.1
  • yaml ^2.3.3
yarn.lock npm
  • 925 dependencies
test/mwaa/requirements.txt pypi