https://github.com/awslabs/barometer

A tool to automate analytic platform evaluations. Barometer helps customers to get data points needed for service selection/service configurations for given workload

https://github.com/awslabs/barometer

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.2%) to scientific vocabulary

Keywords

aws benchmarking databases
Last synced: 5 months ago · JSON representation

Repository

A tool to automate analytic platform evaluations. Barometer helps customers to get data points needed for service selection/service configurations for given workload

Basic Info
  • Host: GitHub
  • Owner: awslabs
  • License: apache-2.0
  • Language: TypeScript
  • Default Branch: main
  • Homepage:
  • Size: 51 MB
Statistics
  • Stars: 19
  • Watchers: 4
  • Forks: 2
  • Open Issues: 9
  • Releases: 0
Topics
aws benchmarking databases
Created over 3 years ago · Last pushed over 1 year ago
Metadata Files
Readme Changelog Contributing License Code of conduct

README.md

Barometer

A tool to automate analytic platform evaluations

Barometer helps customers to get data points needed for service selection/service configurations for given workload. Barometer tool is created by AWS Prototyping team (EMEA)

📋 Table of content

🔰 Description

Barometer will deploy cdk stack which is used to run benchmarking experiments. The experiment is a combination of platform and workload which can be defined using cli-wizard provided by Barometer tool. Example running experiment in Quickstart.

🛠 Use cases

  • Comparison of service performance: Redshift vs Redshift Serverless
  • Comparison of configurations: Redshift dc2 vs ra3 node type
  • Performance impact of feature: Redshift AQUA vs Redshift WLM
  • Right tool for the job selection: Athena vs Redshift for your workload
  • Registering your custom platform: Redshift vs My Own Database
  • Registering your custom workload: My own dataset vs Redshift
  • Run benchmarking only on my platform
  • Bring your own workload (dataset, ddl and queries to benchmark)

Barometer supports below combinations as experiment

🎒 Pre-requisites

  • Docker: Install docker service and docker cli. This tool uses docker to build image and run containers.
  • Minimum disk space of 2 GB for building and deploying docker image

🚀 Installing

Clone this repository and run docker build -t barometer . in barometer directory (root of the git project)

🎮 Deployment

  1. Run below command to deploy barometer to your aws account.

```shell

Example 1: Passing local aws credentials to the docker container for deployment (deploying in eu-west-1 region)

docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock -v ~/.aws:/root/.aws barometer deploy eu-west-1

Example 2: Using AWS profile (ex: dev) to deploy

docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock -v ~/.aws:/root/.aws -e AWS_PROFILE=dev barometer deploy eu-west-1

Example 3: Passing aws region as environment variable

docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock \ -v ~/.aws:/root/.aws -e AWSPROFILE=dev \ -e AWSREGION=eu-west-1 barometer deploy

Example 4: Using aws secret access key and aws secret id to deploy (with optional session token - temporary credentials)

docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock \ -e AWSACCESSKEYID= \ -e AWSSECRETACCESSKEY= \ -e AWSSESSIONTOKEN= \ barometer deploy eu-west-1

```

  1. Run below command to run cli-wizard once barometer is successfully deployed to your AWS account.

```shell

Example 1: Passing local aws credentials to the docker container for running wizard (deployed in eu-west-1 region)

docker run -it -v /var/run/docker.sock:/var/run/docker.sock -v ~/.aws:/root/.aws \ --name barometer-wizard \ barometer wizard eu-west-1

Example 2: Using AWS profile (ex: dev) to run wizard

docker run -it -v /var/run/docker.sock:/var/run/docker.sock -v ~/.aws:/root/.aws -e AWS_PROFILE=dev \ --name barometer-wizard \ barometer wizard eu-west-1

Example 3: Using aws secret access key and aws secret id to run wizard (with optional session token - temporary credentials)

docker run -it -v /var/run/docker.sock:/var/run/docker.sock \ -e AWSACCESSKEYID= \ -e AWSSECRETACCESSKEY= \ -e AWSSESSIONTOKEN= \ --name barometer-wizard \ barometer wizard eu-west-1

Example 4: Reusing wizard configurations

docker start -ia barometer-wizard

Example 5: Persisting wizard configurations

docker run -it -v /var/run/docker.sock:/var/run/docker.sock -v ~/.aws:/root/.aws \ -v ~/storage:/build/cli-wizard/storage \ --name barometer-wizard \ barometer wizard eu-west-1 ```

🎬 Quickstart

Run benchmark only

This option can be used as Benchmark your own platform or Bring your own platform

You can directly benchmark any database with this option. The option is available under Manage Experiments > Run benchmarking only. Depending on where the database is hosted you need to follow below steps as prerequisites to use run benchmark only option.

If database and Barometer is in the same VPC

  1. Create a new secret manager secret having values in below defined json format. All properties are case-sensitive and required except dbClusterIdentifier

json { "username": "database-user", "password": "*******", "engine": "redshift", "host": "my-database-host.my-domain.com", "port": 5439, "dbClusterIdentifier": "redshift-cluster-1", "dbname": "dev" }

  1. Add tag to the secret Tag name = ManagedBy, Tag Value = BenchmarkingStack. This is for Barometer to have permissions to use it
  2. Upload your benchmarking queries to the DataBucket (Bucket created by BenchmarkingStack, available as Output) in new folder with any name (for example: my-benchmarking-queries). Note: the queries can have any name and will be executed in sorted order of their names.

s3://benchmarkingstack-databucket-random-id my-benchmarking-queries | | +-- query1.sql | +-- query2.sql

  1. Allow network connection from QueryRunnerSG (Available as Output of BenchmarkingStack) to your database security group

If database and Barometer is not in the same VPC

In addition to the steps 1,2 and 3 mentioned above (both in the same VPC), follow below steps to Establish VPC Peering connection between BenchmarkingVPC and the VPC where database is hosted.

  1. Go to VPC console > Peering connection menu from left navigation
  2. Create new Peering connection selecting both VPCs (BenchmarkingVPC and DatabaseVPC)
  3. Accept peering connection request from Action menu
  4. Go to the VPC > Route tables and select any route table associated with BenchmarkingStack subnet
  5. Add new route with Destination = CIDR range of the DatabaseVPC and Target = Peering connection id (starts with pcx-)
  6. Repeat steps 4 and 5 for route table associated with BenchmarkingStack second subnet
  7. Go to the VPC > Route tables and select route table associated with DatabaseVPC subnet (if using default VPC select the only route table available)
  8. Add new route with Destination = 10.0.0.0/16 and Target = Peering connection id (starts with pcx-)
  9. Follow last step 4 - allow network connection from both in the same VPC above.

Bring your own workload (BYOW)

You can bring your own workload for benchmarking to Barometer. In this context, workload is defined as files arranged in specific structure on your s3 bucket. To bring your own workload for the benchmarking you need to follow below steps as prerequisites.

  1. Prepare workload on your s3 bucket. It should contain folder structure as defined below. You can create folder with the name of your workload (ex: my-workload) at any level in your s3 bucket. The root of your workload folder should have three sub-directories called volumes, ddl and benchmarking-queries.
    1. volumes sub-directory: this directory contains scale factor for your workload. for example your workload may have dataset available in 1gb, 50gb and 1tb scales. You can create as many scale factors as you want with minimum one. Within each scale factor sub-directory you should have directory matching table name with all table data in .parquet format under it.
    2. ddl sub-directory: this directory contains ddl-scripts to create tables respective to the platform in question. For example, ddl-scripts for redshift platform should go under redshift folder and ddl specific to mysql should be placed under its own directory matching with platform name. You can place more than one ddl scripts too, they will be executed in order of their names.
    3. benchmarking-queries sub-directory: this directory contains benchmarking queries with respect to the platform in question. You can place more than one benchmarking-query files, they will be executed in order of their names per user session.

```

Requires my-workload (can be any name) to follow convention on s3 bucket

my-workload
|  +-- volumes
|      |  +-- 1gb
|      |      |  +-- table_name_1
|      |      |      |  +-- file-1.parquet
|      |      |      |  +-- file-2.parquet
|      |      |  +-- table_name_2
|      |      |      |  +-- file-1.parquet
|      |      |      |  +-- file-2.parquet
|  +-- ddl
|      |  +-- redshift
|      |      |  +-- ddl.query1.sql
|      |      |  +-- ddl.query2.sql
|      |  +-- mysql
|      |      |  +-- ddl.query.sql
|  +-- benchmarking-queries
|      |  +-- redshift
|      |      |  +-- query1.sql
|      |      |  +-- query2.sql
|      |  +-- mysql
|      |      |  +-- query1.sql
|      |      |  +-- query2.sql

```

  1. Run the cli-wizard and go to Manage workload > Add new workload to import your workload. Wizard will validate and import workload if structure validation is successful.
  2. Wizard will print bucket policy while importing your workload. Please update your s3 bucket's bucket policy with printed one.

BYOW sample

In this project, you can find a BYOW example (custom-workload directory). You can create the same structure as mentioned above, by copying these 3 directories (SQL and DDL statements, and dataset) to your S3 bucket. After this, you can run this workload using the Barometer cli-wizard, to configure it as a "BYOW from S3" workload.

custom-workload/benchmarking-queries/redshift

  • Contains 5 SQL OLAP-like queries (.sql files).
  • It disables the Redshift query results cache
  • It tags the sessions for better monitoring.

custom-workload/ddl/redshift

  • It creates three tables: one Fact table and two dimensions.
  • It doesn't specify any Distribution Styles, nor Sort keys. Redshift will create these automatically, based on the workloads. You're free to change these, to analyze their query plans and performance.

custom-workload/volumes/small

  • A small (less than 30MB) dataset, containing the data for the 3 tables above in Apache Parquet format.

Architecture

User flow

  1. User deploys Barometer Benchmarking Stack
  2. Barometer Benchmarking stack creates infrastructure & step function workflows
  3. User uses cli-wizard to define & run experiments which triggers experiment runner workflow internally
  4. Workflow deploys, benchmarks & destroys platform (additional cloudformation stack to deploy service, e.g. Redshift Cluster)
  5. Workflow creates persistent dashboard registering metrics
  6. User uses this dashboard to compare benchmarking results

Detailed architecture for Redshift platform

Cleanup

  1. To clean up any platform, delete stack with name starting with platform name. Example: redshift-xyz
  2. Go to Cloudformation service and select stack named BenchmarkingStack (or run cdk destroy from cdk-stack folder)

👀 See Also

Owner

  • Name: Amazon Web Services - Labs
  • Login: awslabs
  • Kind: organization
  • Location: Seattle, WA

AWS Labs

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: almost 2 years ago

All Time
  • Total issues: 3
  • Total pull requests: 20
  • Average time to close issues: N/A
  • Average time to close pull requests: 10 days
  • Total issue authors: 1
  • Total pull request authors: 3
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 14
  • Bot issues: 0
  • Bot pull requests: 18
Past Year
  • Issues: 0
  • Pull requests: 12
  • Average time to close issues: N/A
  • Average time to close pull requests: 7 days
  • Issue authors: 0
  • Pull request authors: 3
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 6
  • Bot issues: 0
  • Bot pull requests: 10
Top Authors
Issue Authors
  • anandshah123 (3)
Pull Request Authors
  • dependabot[bot] (17)
  • anandshah123 (1)
  • badogan (1)
Top Labels
Issue Labels
Pull Request Labels
dependencies (17) javascript (15)

Dependencies

source/cdk-stack/common-functions/jdbc-query-runner/pom.xml maven
  • com.amazonaws:aws-java-sdk-bom 1.12.178 import
  • com.amazon.redshift:redshift-jdbc42 2.1.0.5 provided
  • com.amazonaws.secretsmanager:aws-secretsmanager-jdbc 1.0.7
  • com.amazonaws:aws-java-sdk-cloudwatchmetrics
  • com.amazonaws:aws-java-sdk-s3
  • com.amazonaws:aws-lambda-java-core 1.2.1
  • com.amazonaws:aws-lambda-java-log4j2 1.5.1
  • com.google.code.gson:gson 2.9.0
source/cdk-stack/package-lock.json npm
  • 441 dependencies
source/cdk-stack/package.json npm
  • @aws-cdk/assert 1.151.0 development
  • @types/adm-zip ^0.4.34 development
  • @types/jest ^27.4.1 development
  • @types/node ^17.0.23 development
  • aws-cdk ^1.151.0 development
  • jest ^27.3.1 development
  • ts-jest ^27.1.4 development
  • ts-node ^10.7.0 development
  • typescript ~4.6.3 development
  • @aws-cdk/aws-dynamodb 1.151.0
  • @aws-cdk/aws-ec2 1.151.0
  • @aws-cdk/aws-ecs 1.151.0
  • @aws-cdk/aws-iam 1.151.0
  • @aws-cdk/aws-kms 1.151.0
  • @aws-cdk/aws-lambda 1.151.0
  • @aws-cdk/aws-s3 1.151.0
  • @aws-cdk/aws-sns 1.151.0
  • @aws-cdk/aws-sns-subscriptions 1.151.0
  • @aws-cdk/aws-stepfunctions 1.151.0
  • @aws-cdk/aws-stepfunctions-tasks 1.151.0
  • @aws-cdk/core 1.151.0
  • source-map-support ^0.5.21
source/cli-wizard/package-lock.json npm
  • 636 dependencies
source/cli-wizard/package.json npm
  • @aws-cdk/assert 1.150.0 development
  • @testing-library/jest-dom ^5.14.1 development
  • @testing-library/react ^12.1.0 development
  • @testing-library/user-event ^13.2.1 development
  • @types/inquirer ^8.2.0 development
  • @types/jest ^27.0.1 development
  • @types/node ^17.0.23 development
  • @typescript-eslint/eslint-plugin ^5.17.0 development
  • @typescript-eslint/parser ^5.17.0 development
  • aws-cdk ^1.150.0 development
  • esbuild ^0.12.28 development
  • eslint ^7.32.0 development
  • jest ^27.2.0 development
  • ts-jest ^27.0.5 development
  • ts-node ^10.2.1 development
  • typescript ~4.6.3 development
  • @aws-cdk/aws-athena ^1.150.0
  • @aws-cdk/aws-redshift ^1.150.0
  • @aws-cdk/core 1.150.0
  • @aws-sdk/client-cloudformation ^3.58.0
  • @aws-sdk/client-lambda ^3.58.0
  • @aws-sdk/client-s3 ^3.58.0
  • @aws-sdk/client-sfn ^3.58.0
  • inquirer ^8.2.2
  • joi ^17.4.2
  • open ^8.4.0
  • source-map-support ^0.5.21
  • uuid ^8.3.2
Dockerfile docker
  • alpine 3.15 build
  • maven 3-openjdk-8 build
  • node 17-alpine build
source/cdk-stack/common-functions/jdbc-query-runner/Dockerfile docker
  • public.ecr.aws/lambda/java 8.al2 build
source/cdk-stack/common-functions/costexplorer-integration/requirements.txt pypi
source/cdk-stack/common-functions/create-destory-platform/requirements.txt pypi
source/cdk-stack/common-functions/dashboard-builder/requirements.txt pypi
source/cdk-stack/common-functions/platform-lambda-proxy/requirements.txt pypi
source/cdk-stack/common-functions/stepfn-helpers/requirements.txt pypi