https://github.com/awslabs/aws-crt-s3-benchmarks

Benchmarking for multiple AWS S3 libraries.

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.0%) to scientific vocabulary

Keywords

aws s3

Last synced: 6 months ago · JSON representation

Repository

Benchmarking for multiple AWS S3 libraries.

Basic Info

Host: GitHub
Owner: awslabs
License: apache-2.0
Language: Python
Default Branch: main
Homepage:
Size: 1.06 MB

Statistics

Stars: 12
Watchers: 12
Forks: 6
Open Issues: 4
Releases: 0

Topics

aws s3

Created over 2 years ago · Last pushed 6 months ago

Metadata Files

Readme Contributing License Code of conduct Notice

aws-crt-s3-benchmarks

This project is for benchmarking different S3 workloads using various languages and S3 clients.

This project is under active development and subject to change.

Running Benchmarks

Requirements

To start:
- Python 3.9+ with pip
On Amazon Linux 2023, a script is provided to install further tools. Otherwise, depending on the language you want to benchmark, you'll need:
- CMake 3.22+
- C99 / C++20 compiler (e.g. gcc, clang)
- JDK17+ (e.g. corretto, openjdk)
- Maven
- Python C extension headers and libraries (e.g. python3-devel)

To benchmark ALL the workloads, your machine needs 300+ GiB of disk space available, and fast enough internet to upload a terabyte to S3 within your lifetime. But if you're only running 1 workload, you'll upload fewer files and use less disk space.

Your machine must have AWS credentials, with permission to read and write to an S3 bucket.

Get Started

First, clone this repo.

Then install the requirements listed above. On Amazon Linux 2023, you can simply run this script: sh ./aws-crt-s3-benchmarks/scripts/install-tools-AL2023.py

Then, install packages needed by the python scripts: sh python3 -m pip install -r aws-crt-s3-benchmarks/scripts/requirements.txt

Prepare S3 Files

Next, run scripts/prep-s3-files.py. This script creates and configures an S3 bucket, put files in S3 for benchmarks to download, and create files on disk for benchmarks to upload:

```sh usage: prep-s3-files.py [-h] --bucket BUCKET --region REGION --files-dir FILES_DIR [--workloads WORKLOADS [WORKLOADS ...]]

Create files (on disk, and in S3 bucket) needed to run the benchmarks

optional arguments: -h, --help show this help message and exit --bucket BUCKET S3 bucket (will be created if necessary) --region REGION AWS region (e.g. us-west-2) --files-dir FILES_DIR Root directory for files to upload and download (e.g. ~/files) --workloads WORKLOADS [WORKLOADS ...] Path to specific workload.run.json file. If not specified, everything in workloads/ is prepared (uploading 100+ GiB to S3 and creating 100+ GiB on disk). ```

This script can be run repeatedly. It skips unnecessary work (e.g. won't upload a file that already exists).

S3 Clients

Here are the IDs used for various S3 Clients, and the runner you must build to benchmark them:

| S3_CLIENT | Actual S3 Client Used | Language | Benchmark Runner | |-----------|-----------------------|------|------------------| | crt-c | aws-c-s3 | c | runners/s3-benchrunner-c | | crt-python | aws-crt-python | python | runners/s3-benchrunner-python | | boto3-crt | boto3 using CRT | python | runners/s3-benchrunner-python | | boto3-classic | boto3 with pure-python transfer manager | python | runners/s3-benchrunner-python | | cli-crt | AWS CLI v2 using CRT | python | runners/s3-benchrunner-python | | cli-classic | AWS CLI v2 with pure-python transfer manager | python | runners/s3-benchrunner-python | | crt-java | aws-crt-java | java | runners/s3-benchrunner-java | | sdk-java-client-crt | aws-sdk-java-v2 with CRT based S3AsyncClient | java | runners/s3-benchrunner-java | | sdk-java-client-classic | aws-sdk-java-v2 with pure-java S3AsyncClient | java | runners/s3-benchrunner-java | | sdk-java-tm-crt | aws-sdk-java-v2 with CRT based S3TransferManager | java | runners/s3-benchrunner-java | | sdk-java-tm-classic | aws-sdk-java-v2 with pure-java S3TransferManager | java | runners/s3-benchrunner-java | | sdk-cpp-client-crt | aws-sdk-cpp with S3CrtClient | cpp | runners/s3-benchrunner-cpp | | sdk-cpp-client-classic | aws-sdk-cpp with (non-CRT) S3Client | cpp | runners/s3-benchrunner-cpp | | sdk-cpp-tm-classic | aws-sdk-cpp with (non-CRT) TransferManager | cpp | runners/s3-benchrunner-cpp | | sdk-rust-tm | aws-s3-transfer-manager-rs | rust | runners/s3-benchrunner-rust |

Build a Runner

You must build a "runner" for the S3 client you'll be benchmarking. For example, build runners/s3-benchrunner-python to benchmark aws-crt-python, boto3, or AWS CLI.

Run scripts/build-runner.py: ```sh usage: build-runner.py [-h] --lang {c,python,java} --build-dir BUILD_DIR [--branch BRANCH]

Build a runner and its dependencies

optional arguments: -h, --help show this help message and exit --lang {c,python,java} Build s3-benchrunner- --build-dir BUILD_DIR Root dir for build artifacts --branch BRANCH Git branch/commit/tag to use when pulling dependencies ```

The last line of output from build-runner.py displays the RUNNER_CMD you'll need in the next step.

NOTE: Each runner has a README.md with more advanced instructions. build-runner.py isn't meant to handle advanced use cases like tweaking dependencies, iterating locally, DEBUG builds, etc.

Run a Benchmark

All runners have the same command line interface, and expect to be run from the FILES_DIR you passed to the prep-s3-files.py script.

```sh cd FILES_DIR

RUNNERCMD S3CLIENT WORKLOAD BUCKET REGION TARGET_THROUGHPUT [--nic name1,name2] [--telemetry] ```

S3_CLIENT: ID of S3 client to use (See table above)
RUNNER_CMD: Command to launch runner (e.g. java -jar path/to/runner.jar) This is the last line printed by build-runner.py in the previous step.
WORKLOAD: Path to workload .run.json file (see: workloads/)
BUCKET: S3 bucket name (e.g. my-test-bucket)
REGION: AWS Region (e.g. us-west-2)
TARGET_THROUGHPUT: Target throughput, in gigabits per second. Floating point allowed. Enter the EC2 type's "Network Bandwidth (Gbps)" (e.g. "100.0" for c5n.18xlarge)
NETWORK_INTERFACES: This is optionally supported for crt-c Runner A comma separated list of network interface names without any spaces like "--nic ens5,ens6"
TELEMETRY: This is optionally supported for crt-c Runner Pass --telemetry to enable telemetry. It will be saved in ./telemetry/<workload_name>/<current_data_time>/<runNumber>.csv. It will also write stats to ./telemetry/<workload_name>/<current_data_time>/stats.txt

Most runners should search for AWS credentials something like this.

If you want to run multiple workloads (or ALL workloads) in one go, use this helper script: run-benchmarks.py.

Authoring New Workloads

See workloads/

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Owner

Name: Amazon Web Services - Labs
Login: awslabs
Kind: organization
Location: Seattle, WA

Website: http://amazon.com/aws/
Repositories: 914
Profile: https://github.com/awslabs

AWS Labs

GitHub Events

Total

Issues event: 2
Watch event: 4
Delete event: 28
Issue comment event: 5
Push event: 109
Pull request event: 47
Pull request review event: 61
Pull request review comment event: 62
Fork event: 3
Create event: 25

Last Year

Issues event: 2
Watch event: 4
Delete event: 28
Issue comment event: 5
Push event: 109
Pull request event: 47
Pull request review event: 61
Pull request review comment event: 62
Fork event: 3
Create event: 25

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 2
Total pull requests: 78
Average time to close issues: 3 days
Average time to close pull requests: 3 days
Total issue authors: 2
Total pull request authors: 7
Average comments per issue: 1.0
Average comments per pull request: 0.01
Merged pull requests: 71
Bot issues: 0
Bot pull requests: 2

Past Year

Issues: 2
Pull requests: 18
Average time to close issues: 3 days
Average time to close pull requests: 2 days
Issue authors: 2
Pull request authors: 6
Average comments per issue: 1.0
Average comments per pull request: 0.0
Merged pull requests: 15
Bot issues: 0
Bot pull requests: 2

View more stats

Top Authors

Issue Authors

graebm (1)
ddn-kums (1)
waahm7 (1)
ekaynar (1)

Pull Request Authors

graebm (78)
waahm7 (18)
TingDaoK (16)
DmitriyMusatkin (3)
aajtodd (2)
sullis (1)
dependabot[bot] (1)
GarrettBeatty (1)

Top Labels

Issue Labels

Pull Request Labels

dependencies (1) rust (1)

Dependencies

.github/workflows/ci.yml actions

actions/checkout v4 composite
actions/setup-java v3 composite
actions/setup-python v4 composite

runners/s3-benchrunner-crt-java/pom.xml maven

com.google.code.gson:gson 2.10.1
software.amazon.awssdk.crt:aws-crt [0.26,)

scripts/requirements.txt pypi

autopep8 *
boto3 *
mypy *