https://github.com/awslabs/amazon-msk-data-generator
Data generator for Amazon MSK
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.8%) to scientific vocabulary
Keywords
Repository
Data generator for Amazon MSK
Basic Info
Statistics
- Stars: 14
- Watchers: 1
- Forks: 4
- Open Issues: 6
- Releases: 0
Topics
Metadata Files
README.md
Amazon MSK Data Generator
MSK Data Generator is a translation of the awesome Voluble Apache Kafka data generator from Clojure to Java. (Link in Resources Section below)
The killer feature is being able to generate events which reference other generated events. (AKA: cross-reference, reference-able, joinable, etc.)
For example, we can generate one stream of Order events containing a customerid (as well as price, sku, quantity, etc.) and at same time, we can generate a different stream of Customer events containing a customerid (as well as first name, last name, location, etc.) The dynamically generated Customer event customerid can reference the Order event customerid.
Why this matters?
Multiple streams of "joinable" data is especially useful when building
stream processor applications (in Kinesis Data Analytics for Apache Flink or Kinesis Data
Analytics Studio for example) which perform joins.
For an example, see AWS Big Data Blog Query your Amazon MSK topics interactively using Amazon Kinesis Data Analytics Studio
Why translate to Java?
By translating to Java, the hope is we open up the potential of wider community collaboration. (Nothing against Clojure mind you! It's just more folks know Java.)
This project can likely be used outside of Amazon MSK, but to start at least, the focus will be making this generator easy to use with Amazon MSK.
Further Context
MSK Data Generator is deployed and configured as a Kafka Connect Source, so basic knowledge of Kafka Connect will be helpful.
Like many dynamic data generation projects, the key component is the use
of Java Faker library. Knowing more about Java Faker capabilities and options will be helpful.
See link in Resources section below.
Getting Started
MSK Data Generator can be deployed in a variety of ways including:
Deploying in a container running in Elastic Container Service
Deploying as a Kafka Connect source connector in MSK Connect
Customizing Data Generation Configuration
There are 5 essential constructs to understand when customizing key-value data generation:
Directives
genk,genkp,genv, andgenvpGenerators
withormatchingAttribute the name of the field to generate data
Qualifiers
sometimesExpressions based on Java faker
For example, consider the configuration of the following:
``` "genkp.customer.with": "#{Internet.uuid}", "genv.customer.name.with": "#{Name.fullname}", "genv.customer.gender.with": "#{Demographic.sex}", "genv.customer.favoritebeer.with": "#{Beer.name}", "genv.customer.state.with": "#{Address.state}",
"genkp.order.with": "#{Internet.uuid}", "genv.order.productid.with": "#{number.numberbetween '101','109'}", "genv.order.quantity.with": "#{number.numberbetween '1','5'}", "genv.order.customerid.matching": "customer.key" ```
This config will generate data to the customer and customer topics and assumes the MSK cluster has been configured to allow auto topic creation OR the customer and order topics have already been created.
For example, the above configuration will create 2 events with every iteration similar to the following:
customer event with a key of 0c88cbb7-eb4a-44f0-83aa-00957761b3b6 (because Internet.uuid for random string from Java Faker) and JSON payload of
{
"favorite_beer": "Weihenstephaner Hefeweissbier",
"gender": "Male",
"name": "Miss Gilbert Luettgen",
"state": "Oregon"
}
order event with a random string key of dc236186-9037-45a0-8b91-a3c2b50f0582 (again, because of Internet.uuid)
and a JSON payload of
{
"quantity": "4",
"product_id": "132",
"customer_id": "0c88cbb7-eb4a-44f0-83aa-00957761b3b6"
}
Notice how the order event customer_id value references the previously generated customer key field? (Hint: with this kind of data generation, we can test our join code!)
This also highlights the differences between with and matching in configuration.
In this example, with is utilizing and methods available from Java Faker see API docs and then compare the class methods with configuration above such as Name.full_name, Beer.name, etc.
With this example above and the 5 previously mentioned essential constructs in mind, the sequence is:
directive.topic.attribute-or-qualifier.generator: expression
For further information on data generation configuration options, check both the Voluble README as well as some of the examples in this repo
External References
Voluble (basis for this project) https://github.com/MichaelDrogalis/voluble
Java Faker https://github.com/DiUS/java-faker
Java Faker API docs https://dius.github.io/java-faker/apidocs/
Owner
- Name: Amazon Web Services - Labs
- Login: awslabs
- Kind: organization
- Location: Seattle, WA
- Website: http://amazon.com/aws/
- Repositories: 914
- Profile: https://github.com/awslabs
AWS Labs
GitHub Events
Total
- Issues event: 3
- Watch event: 2
- Pull request event: 1
- Fork event: 1
Last Year
- Issues event: 3
- Watch event: 2
- Pull request event: 1
- Fork event: 1
Issues and Pull Requests
Last synced: about 2 years ago
All Time
- Total issues: 2
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 2
- Total pull request authors: 0
- Average comments per issue: 2.5
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- aldredb (1)
- tmcgrath (1)
- alberttwong (1)
- wesleytong (1)
Pull Request Authors
- aldredb (1)
- Neuw84 (1)
- manishahluwalia (1)