https://github.com/awslabs/barometer

A tool to automate analytic platform evaluations. Barometer helps customers to get data points needed for service selection/service configurations for given workload

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.2%) to scientific vocabulary

Keywords

aws benchmarking databases

Last synced: 5 months ago · JSON representation

Repository

A tool to automate analytic platform evaluations. Barometer helps customers to get data points needed for service selection/service configurations for given workload

Basic Info

Host: GitHub
Owner: awslabs
License: apache-2.0
Language: TypeScript
Default Branch: main
Homepage:
Size: 51 MB

Statistics

Stars: 19
Watchers: 4
Forks: 2
Open Issues: 9
Releases: 0

Topics

aws benchmarking databases

Created over 3 years ago · Last pushed over 1 year ago

Metadata Files

Readme Changelog Contributing License Code of conduct

Barometer

A tool to automate analytic platform evaluations

Barometer helps customers to get data points needed for service selection/service configurations for given workload. Barometer tool is created by AWS Prototyping team (EMEA)

🔰 Description

Barometer will deploy cdk stack which is used to run benchmarking experiments. The experiment is a combination of platform and workload which can be defined using cli-wizard provided by Barometer tool. Example running experiment in Quickstart.

🛠 Use cases

Comparison of service performance: Redshift vs Redshift Serverless
Comparison of configurations: Redshift dc2 vs ra3 node type
Performance impact of feature: Redshift AQUA vs Redshift WLM
Right tool for the job selection: Athena vs Redshift for your workload
Registering your custom platform: Redshift vs My Own Database
Registering your custom workload: My own dataset vs Redshift
Run benchmarking only on my platform
Bring your own workload (dataset, ddl and queries to benchmark)

Barometer supports below combinations as experiment

Supported platforms:
Supported workloads:
- TPC-DS/v2
- Bring your own workload

🎒 Pre-requisites

Docker: Install docker service and docker cli. This tool uses docker to build image and run containers.
Minimum disk space of 2 GB for building and deploying docker image

🚀 Installing

Clone this repository and run docker build -t barometer . in barometer directory (root of the git project)

🎮 Deployment

Run below command to deploy barometer to your aws account.

```shell

Example 1: Passing local aws credentials to the docker container for deployment (deploying in eu-west-1 region)

docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock -v ~/.aws:/root/.aws barometer deploy eu-west-1

Example 2: Using AWS profile (ex: dev) to deploy

docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock -v ~/.aws:/root/.aws -e AWS_PROFILE=dev barometer deploy eu-west-1

Example 3: Passing aws region as environment variable

docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock \ -v ~/.aws:/root/.aws -e AWSPROFILE=dev \ -e AWSREGION=eu-west-1 barometer deploy

Example 4: Using aws secret access key and aws secret id to deploy (with optional session token - temporary credentials)

docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock \ -e AWSACCESSKEYID= \ -e AWSSECRETACCESSKEY= \ -e AWSSESSIONTOKEN= \ barometer deploy eu-west-1

```

Run below command to run cli-wizard once barometer is successfully deployed to your AWS account.

```shell

Example 1: Passing local aws credentials to the docker container for running wizard (deployed in eu-west-1 region)

docker run -it -v /var/run/docker.sock:/var/run/docker.sock -v ~/.aws:/root/.aws \ --name barometer-wizard \ barometer wizard eu-west-1

Example 2: Using AWS profile (ex: dev) to run wizard

docker run -it -v /var/run/docker.sock:/var/run/docker.sock -v ~/.aws:/root/.aws -e AWS_PROFILE=dev \ --name barometer-wizard \ barometer wizard eu-west-1

Example 3: Using aws secret access key and aws secret id to run wizard (with optional session token - temporary credentials)

docker run -it -v /var/run/docker.sock:/var/run/docker.sock \ -e AWSACCESSKEYID= \ -e AWSSECRETACCESSKEY= \ -e AWSSESSIONTOKEN= \ --name barometer-wizard \ barometer wizard eu-west-1

Example 4: Reusing wizard configurations

docker start -ia barometer-wizard

Example 5: Persisting wizard configurations

docker run -it -v /var/run/docker.sock:/var/run/docker.sock -v ~/.aws:/root/.aws \ -v ~/storage:/build/cli-wizard/storage \ --name barometer-wizard \ barometer wizard eu-west-1 ```

🎬 Quickstart

Run benchmark only

This option can be used as Benchmark your own platform or Bring your own platform

You can directly benchmark any database with this option. The option is available under Manage Experiments > Run benchmarking only. Depending on where the database is hosted you need to follow below steps as prerequisites to use run benchmark only option.

If database and Barometer is in the same VPC

Create a new secret manager secret having values in below defined json format. All properties are case-sensitive and required except dbClusterIdentifier

json { "username": "database-user", "password": "*******", "engine": "redshift", "host": "my-database-host.my-domain.com", "port": 5439, "dbClusterIdentifier": "redshift-cluster-1", "dbname": "dev" }

Add tag to the secret Tag name = ManagedBy, Tag Value = BenchmarkingStack. This is for Barometer to have permissions to use it
Upload your benchmarking queries to the DataBucket (Bucket created by BenchmarkingStack, available as Output) in new folder with any name (for example: my-benchmarking-queries). Note: the queries can have any name and will be executed in sorted order of their names.

s3://benchmarkingstack-databucket-random-id my-benchmarking-queries | | +-- query1.sql | +-- query2.sql

Allow network connection from QueryRunnerSG (Available as Output of BenchmarkingStack) to your database security group

If database and Barometer is not in the same VPC

In addition to the steps 1,2 and 3 mentioned above (both in the same VPC), follow below steps to Establish VPC Peering connection between BenchmarkingVPC and the VPC where database is hosted.

Go to VPC console > Peering connection menu from left navigation
Create new Peering connection selecting both VPCs (BenchmarkingVPC and DatabaseVPC)
Accept peering connection request from Action menu
Go to the VPC > Route tables and select any route table associated with BenchmarkingStack subnet
Add new route with Destination = CIDR range of the DatabaseVPC and Target = Peering connection id (starts with pcx-)
Repeat steps 4 and 5 for route table associated with BenchmarkingStack second subnet
Go to the VPC > Route tables and select route table associated with DatabaseVPC subnet (if using default VPC select the only route table available)
Add new route with Destination = 10.0.0.0/16 and Target = Peering connection id (starts with pcx-)
Follow last step 4 - allow network connection from both in the same VPC above.

Bring your own workload (BYOW)

You can bring your own workload for benchmarking to Barometer. In this context, workload is defined as files arranged in specific structure on your s3 bucket. To bring your own workload for the benchmarking you need to follow below steps as prerequisites.

Prepare workload on your s3 bucket. It should contain folder structure as defined below. You can create folder with the name of your workload (ex: my-workload) at any level in your s3 bucket. The root of your workload folder should have three sub-directories called volumes, ddl and benchmarking-queries.
1. volumes sub-directory: this directory contains scale factor for your workload. for example your workload may have dataset available in 1gb, 50gb and 1tb scales. You can create as many scale factors as you want with minimum one. Within each scale factor sub-directory you should have directory matching table name with all table data in .parquet format under it.
2. ddl sub-directory: this directory contains ddl-scripts to create tables respective to the platform in question. For example, ddl-scripts for redshift platform should go under redshift folder and ddl specific to mysql should be placed under its own directory matching with platform name. You can place more than one ddl scripts too, they will be executed in order of their names.
3. benchmarking-queries sub-directory: this directory contains benchmarking queries with respect to the platform in question. You can place more than one benchmarking-query files, they will be executed in order of their names per user session.

```

Requires my-workload (can be any name) to follow convention on s3 bucket

my-workload
|  +-- volumes
|      |  +-- 1gb
|      |      |  +-- table_name_1
|      |      |      |  +-- file-1.parquet
|      |      |      |  +-- file-2.parquet
|      |      |  +-- table_name_2
|      |      |      |  +-- file-1.parquet
|      |      |      |  +-- file-2.parquet
|  +-- ddl
|      |  +-- redshift
|      |      |  +-- ddl.query1.sql
|      |      |  +-- ddl.query2.sql
|      |  +-- mysql
|      |      |  +-- ddl.query.sql
|  +-- benchmarking-queries
|      |  +-- redshift
|      |      |  +-- query1.sql
|      |      |  +-- query2.sql
|      |  +-- mysql
|      |      |  +-- query1.sql
|      |      |  +-- query2.sql

```

Run the cli-wizard and go to Manage workload > Add new workload to import your workload. Wizard will validate and import workload if structure validation is successful.
Wizard will print bucket policy while importing your workload. Please update your s3 bucket's bucket policy with printed one.

BYOW sample

In this project, you can find a BYOW example (custom-workload directory). You can create the same structure as mentioned above, by copying these 3 directories (SQL and DDL statements, and dataset) to your S3 bucket. After this, you can run this workload using the Barometer cli-wizard, to configure it as a "BYOW from S3" workload.

custom-workload/benchmarking-queries/redshift

Contains 5 SQL OLAP-like queries (.sql files).
It disables the Redshift query results cache
It tags the sessions for better monitoring.

custom-workload/ddl/redshift

It creates three tables: one Fact table and two dimensions.
It doesn't specify any Distribution Styles, nor Sort keys. Redshift will create these automatically, based on the workloads. You're free to change these, to analyze their query plans and performance.

custom-workload/volumes/small

A small (less than 30MB) dataset, containing the data for the 3 tables above in Apache Parquet format.

Architecture

User flow

User deploys Barometer Benchmarking Stack
Barometer Benchmarking stack creates infrastructure & step function workflows
User uses cli-wizard to define & run experiments which triggers experiment runner workflow internally
Workflow deploys, benchmarks & destroys platform (additional cloudformation stack to deploy service, e.g. Redshift Cluster)
Workflow creates persistent dashboard registering metrics
User uses this dashboard to compare benchmarking results

Detailed architecture for Redshift platform

Cleanup

To clean up any platform, delete stack with name starting with platform name. Example: redshift-xyz
Go to Cloudformation service and select stack named BenchmarkingStack (or run cdk destroy from cdk-stack folder)

👀 See Also

Architectural & design concepts driving this project
Benchmarking Stack infrastructure
Cli Wizard
How to add new platform support
How to add new workload support

Owner

Name: Amazon Web Services - Labs
Login: awslabs
Kind: organization
Location: Seattle, WA

Website: http://amazon.com/aws/
Repositories: 914
Profile: https://github.com/awslabs

AWS Labs

GitHub Events

Total

Last Year

Issues and Pull Requests

Last synced: almost 2 years ago

All Time

Total issues: 3
Total pull requests: 20
Average time to close issues: N/A
Average time to close pull requests: 10 days
Total issue authors: 1
Total pull request authors: 3
Average comments per issue: 0.0
Average comments per pull request: 0.0
Merged pull requests: 14
Bot issues: 0
Bot pull requests: 18

Past Year

Issues: 0
Pull requests: 12
Average time to close issues: N/A
Average time to close pull requests: 7 days
Issue authors: 0
Pull request authors: 3
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 6
Bot issues: 0
Bot pull requests: 10

View more stats

Top Authors

Issue Authors

anandshah123 (3)

Pull Request Authors

dependabot[bot] (17)
anandshah123 (1)
badogan (1)

Top Labels

Issue Labels

Pull Request Labels

dependencies (17) javascript (15)

Dependencies

source/cdk-stack/common-functions/jdbc-query-runner/pom.xml maven

com.amazonaws:aws-java-sdk-bom 1.12.178 import
com.amazon.redshift:redshift-jdbc42 2.1.0.5 provided
com.amazonaws.secretsmanager:aws-secretsmanager-jdbc 1.0.7
com.amazonaws:aws-java-sdk-cloudwatchmetrics
com.amazonaws:aws-java-sdk-s3
com.amazonaws:aws-lambda-java-core 1.2.1
com.amazonaws:aws-lambda-java-log4j2 1.5.1
com.google.code.gson:gson 2.9.0

source/cdk-stack/package-lock.json npm

441 dependencies

source/cdk-stack/package.json npm

@aws-cdk/assert 1.151.0 development
@types/adm-zip ^0.4.34 development
@types/jest ^27.4.1 development
@types/node ^17.0.23 development
aws-cdk ^1.151.0 development
jest ^27.3.1 development
ts-jest ^27.1.4 development
ts-node ^10.7.0 development
typescript ~4.6.3 development
@aws-cdk/aws-dynamodb 1.151.0
@aws-cdk/aws-ec2 1.151.0
@aws-cdk/aws-ecs 1.151.0
@aws-cdk/aws-iam 1.151.0
@aws-cdk/aws-kms 1.151.0
@aws-cdk/aws-lambda 1.151.0
@aws-cdk/aws-s3 1.151.0
@aws-cdk/aws-sns 1.151.0
@aws-cdk/aws-sns-subscriptions 1.151.0
@aws-cdk/aws-stepfunctions 1.151.0
@aws-cdk/aws-stepfunctions-tasks 1.151.0
@aws-cdk/core 1.151.0
source-map-support ^0.5.21

source/cli-wizard/package-lock.json npm

636 dependencies

source/cli-wizard/package.json npm

@aws-cdk/assert 1.150.0 development
@testing-library/jest-dom ^5.14.1 development
@testing-library/react ^12.1.0 development
@testing-library/user-event ^13.2.1 development
@types/inquirer ^8.2.0 development
@types/jest ^27.0.1 development
@types/node ^17.0.23 development
@typescript-eslint/eslint-plugin ^5.17.0 development
@typescript-eslint/parser ^5.17.0 development
aws-cdk ^1.150.0 development
esbuild ^0.12.28 development
eslint ^7.32.0 development
jest ^27.2.0 development
ts-jest ^27.0.5 development
ts-node ^10.2.1 development
typescript ~4.6.3 development
@aws-cdk/aws-athena ^1.150.0
@aws-cdk/aws-redshift ^1.150.0
@aws-cdk/core 1.150.0
@aws-sdk/client-cloudformation ^3.58.0
@aws-sdk/client-lambda ^3.58.0
@aws-sdk/client-s3 ^3.58.0
@aws-sdk/client-sfn ^3.58.0
inquirer ^8.2.2
joi ^17.4.2
open ^8.4.0
source-map-support ^0.5.21
uuid ^8.3.2

Dockerfile docker

alpine 3.15 build
maven 3-openjdk-8 build
node 17-alpine build

source/cdk-stack/common-functions/jdbc-query-runner/Dockerfile docker

public.ecr.aws/lambda/java 8.al2 build

source/cdk-stack/common-functions/costexplorer-integration/requirements.txt pypi

source/cdk-stack/common-functions/create-destory-platform/requirements.txt pypi

source/cdk-stack/common-functions/dashboard-builder/requirements.txt pypi

source/cdk-stack/common-functions/platform-lambda-proxy/requirements.txt pypi

source/cdk-stack/common-functions/stepfn-helpers/requirements.txt pypi

https://github.com/awslabs/barometer

Science Score: 13.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Barometer

📋 Table of content

🔰 Description

🛠 Use cases

🎒 Pre-requisites

🚀 Installing

🎮 Deployment

Example 1: Passing local aws credentials to the docker container for deployment (deploying in eu-west-1 region)

Example 2: Using AWS profile (ex: dev) to deploy

Example 3: Passing aws region as environment variable

Example 4: Using aws secret access key and aws secret id to deploy (with optional session token - temporary credentials)

Example 1: Passing local aws credentials to the docker container for running wizard (deployed in eu-west-1 region)

Example 2: Using AWS profile (ex: dev) to run wizard

Example 3: Using aws secret access key and aws secret id to run wizard (with optional session token - temporary credentials)

Example 4: Reusing wizard configurations

Example 5: Persisting wizard configurations

🎬 Quickstart

Run benchmark only

If database and Barometer is in the same VPC

If database and Barometer is not in the same VPC

Bring your own workload (BYOW)

Requires my-workload (can be any name) to follow convention on s3 bucket

BYOW sample

custom-workload/benchmarking-queries/redshift

custom-workload/ddl/redshift

custom-workload/volumes/small

Architecture

User flow

Detailed architecture for Redshift platform

Cleanup

👀 See Also

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies