https://github.com/awslabs/s3-tables-catalog

The Amazon S3 Tables catalog is a client library that bridges control plane operations provided by S3 Tables to engines like Apache Spark, Flink and others, when used with the Iceberg Table format

https://github.com/awslabs/s3-tables-catalog

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.9%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

The Amazon S3 Tables catalog is a client library that bridges control plane operations provided by S3 Tables to engines like Apache Spark, Flink and others, when used with the Iceberg Table format

Basic Info
  • Host: GitHub
  • Owner: awslabs
  • License: apache-2.0
  • Language: Java
  • Default Branch: main
  • Homepage:
  • Size: 124 KB
Statistics
  • Stars: 133
  • Watchers: 9
  • Forks: 22
  • Open Issues: 14
  • Releases: 9
Created over 1 year ago · Last pushed 10 months ago
Metadata Files
Readme Contributing License Code of conduct

README.md

Amazon S3 Tables Catalog for Apache Iceberg

The Amazon S3 Tables Catalog for Apache Iceberg is an open-source library that bridges S3 Tables operations to engines like Apache Spark, when used with the Apache Iceberg Open Table Format.

This library can: * Translate Apache Iceberg operations such as table discovery, metadata reads, and updates * Add and removes tables in Amazon S3 Tables

What are Amazon S3 Tables and table buckets ?

Amazon S3 Tables are built for storing tabular data, such as daily purchase transactions, streaming sensor data, or ad impressions. Tabular data represents data in columns and rows, like in a database table. Tabular data is most commonly stored in the Apache Parquet format.

The tabular data in Amazon S3 Tables is stored in a new S3 bucket type: a table bucket, which stores tables as subresources. S3 Tables has built-in support for tables in the Apache Iceberg format. Using standard SQL statements, you can query your tables with query engines that support Apache Iceberg, such as Amazon Athena, Amazon Redshift, and Apache Spark.

Current Status

Amazon S3 Tables Catalog for Apache Iceberg is generally available. We're always interested in feedback on features, performance, and compatibility. Please send feedback by opening a GitHub issue.

If you discover a potential security issue in this project we ask that you notify Amazon Web Services (AWS) Security via our vulnerability reporting page. Please do not create a public GitHub issue.

Getting Started

To get started with Amazon S3 Tables, see Tutorial: Getting started with S3 Tables in the Amazon S3 User Guide.

Configuration

  • is your Iceberg Spark session catalog name. Replace it with the name of your catalog, and remember to change the references throughout all configurations that are associated with this catalog. In your code, you should then refer to your Iceberg tables with the fully qualified table name, including the Spark session catalog name, as follows: ...

  • .warehouse points to the Amazon S3 Tables path

  • .catalog-impl = "software.amazon.s3tables.iceberg.S3TablesCatalog" This key is required to point to an implementation class for any custom catalog implementation.

Java Spark app Example

Add the lines below to your pom.xml: <dependency> <groupId>software.amazon.awssdk</groupId> <artifactId>s3tables</artifactId> <version>2.29.26</version> </dependency> <dependency> <groupId>software.amazon.s3tables</groupId> <artifactId>s3-tables-catalog-for-iceberg</artifactId> <version>0.1.8</version> </dependency> Or if you using a BOM just add a dependency on the s3 tables sdk: <dependencyManagement> <dependencies> <dependency> <groupId>software.amazon.awssdk</groupId> <artifactId>bom</artifactId> <version>2.29.26</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement>

Or for Gradle:

dependencies { implementation 'software.amazon.awssdk:s3tables:2.29.26' implementation 'software.amazon.s3tables:s3-tables-catalog-for-iceberg:0.1.8' }

And finally start a spark session:

spark = SparkSession.builder() .config("spark.sql.catalog.<catalog_name>", "org.apache.iceberg.spark.SparkCatalog") .config("spark.sql.catalog.<catalog_name>.catalog-impl","software.amazon.s3tables.iceberg.S3TablesCatalog") .config("spark.sql.catalog.<catalog_name>.warehouse", <TABLE_BUCKET_ARN>) .getOrCreate();

Contributions

We welcome contributions to Amazon S3 Tables Catalog for Apache Iceberg! Please see the contributing guidelines for more information on how to report bugs, build from source code, or submit pull requests.

Security

If you discover a potential security issue in this project we ask that you notify Amazon Web Services (AWS) Security via our vulnerability reporting page. Please do not create a public GitHub issue.

License

This project is licensed under the Apache-2.0 License.

Owner

  • Name: Amazon Web Services - Labs
  • Login: awslabs
  • Kind: organization
  • Location: Seattle, WA

AWS Labs

GitHub Events

Total
  • Create event: 26
  • Release event: 8
  • Issues event: 24
  • Watch event: 123
  • Delete event: 9
  • Issue comment event: 45
  • Push event: 30
  • Public event: 1
  • Pull request review comment event: 8
  • Pull request review event: 30
  • Pull request event: 51
  • Fork event: 19
Last Year
  • Create event: 26
  • Release event: 8
  • Issues event: 24
  • Watch event: 123
  • Delete event: 9
  • Issue comment event: 45
  • Push event: 30
  • Public event: 1
  • Pull request review comment event: 8
  • Pull request review event: 30
  • Pull request event: 51
  • Fork event: 19

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 6
  • Total pull requests: 17
  • Average time to close issues: 22 days
  • Average time to close pull requests: 8 days
  • Total issue authors: 6
  • Total pull request authors: 10
  • Average comments per issue: 0.5
  • Average comments per pull request: 0.12
  • Merged pull requests: 7
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 6
  • Pull requests: 17
  • Average time to close issues: 22 days
  • Average time to close pull requests: 8 days
  • Issue authors: 6
  • Pull request authors: 10
  • Average comments per issue: 0.5
  • Average comments per pull request: 0.12
  • Merged pull requests: 7
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • jhmartin (1)
  • rhasson (1)
  • bhaskar-pv (1)
  • tundraraj (1)
  • harryharanb (1)
  • juzov-billie (1)
  • Samrose-Ahmed (1)
  • Karthilearns (1)
  • stefan-novak-brt (1)
  • yc2984 (1)
  • bloodeagle40234 (1)
  • ebyhr (1)
  • averemee-si (1)
  • Aruun (1)
  • dominicrathbone (1)
Pull Request Authors
  • stubz151 (6)
  • jackye1995 (4)
  • domingosnovo (3)
  • shekharsud (3)
  • sullis (2)
  • matt-huh (2)
  • devinrsmith (2)
  • jamesbornholt (1)
  • crh23 (1)
  • fuziontech (1)
  • krishamzn (1)
  • AndrewMakin (1)
Top Labels
Issue Labels
Pull Request Labels
needs work (1)

Packages

  • Total packages: 2
  • Total downloads: unknown
  • Total dependent packages: 0
    (may contain duplicates)
  • Total dependent repositories: 0
    (may contain duplicates)
  • Total versions: 14
repo1.maven.org: software.amazon.s3tables:s3-tables-catalog-for-iceberg

Amazon S3 Tables Catalog for Apache Iceberg.

  • Versions: 8
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Stargazers count: 25.8%
Forks count: 31.3%
Dependent repos count: 34.0%
Average: 34.9%
Dependent packages count: 48.6%
Last synced: 10 months ago
repo1.maven.org: software.amazon.s3tables:s3-tables-catalog-for-iceberg-runtime

Amazon S3 Tables Catalog for Apache Iceberg Runtime Jar.

  • Versions: 6
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 34.0%
Average: 41.3%
Dependent packages count: 48.6%
Last synced: 10 months ago

Dependencies

.github/workflows/gradle-publish.yml actions
  • actions/checkout v4 composite
  • actions/setup-java v4 composite
  • aws-actions/configure-aws-credentials v2 composite
  • gradle/actions/setup-gradle af1da67850ed9a4cedd57bfd976089dd991e2582 composite
.github/workflows/gradle.yml actions
  • actions/checkout v4 composite
  • actions/setup-java v4 composite
  • aws-actions/configure-aws-credentials v2 composite
  • gradle/actions/dependency-submission af1da67850ed9a4cedd57bfd976089dd991e2582 composite
  • gradle/actions/setup-gradle af1da67850ed9a4cedd57bfd976089dd991e2582 composite
build.gradle maven
  • com.github.ben-manes.caffeine:caffeine 2.9.3 implementation
  • org.apache.commons:commons-configuration2 2.11.0 implementation
  • org.apache.iceberg:iceberg-api 1.6.1 implementation
  • org.apache.iceberg:iceberg-aws 1.6.1 implementation
  • org.apache.iceberg:iceberg-bundled-guava 1.6.1 implementation
  • org.apache.iceberg:iceberg-common 1.6.1 implementation
  • org.apache.iceberg:iceberg-core 1.6.1 implementation
  • software.amazon.awssdk:apache-client 2.29.26 implementation
  • software.amazon.awssdk:aws-core 2.29.26 implementation
  • software.amazon.awssdk:dynamodb 2.29.26 implementation
  • software.amazon.awssdk:glue 2.29.26 implementation
  • software.amazon.awssdk:http-client-spi 2.29.26 implementation
  • software.amazon.awssdk:kms 2.29.26 implementation
  • software.amazon.awssdk:s3 2.29.26 implementation
  • software.amazon.awssdk:s3tables 2.29.26 implementation
  • software.amazon.awssdk:sdk-core 2.29.26 implementation
  • software.amazon.awssdk:sts 2.29.26 implementation
  • software.amazon.awssdk:url-connection-client 2.29.26 implementation
  • org.assertj:assertj-core 3.26.3 testImplementation
  • org.junit.jupiter:junit-jupiter-api 5.11.3 testImplementation
  • org.mockito:mockito-core 4.11.0 testImplementation
  • org.mockito:mockito-inline 4.11.0 testImplementation
  • org.mockito:mockito-junit-jupiter 4.11.0 testImplementation
  • org.junit.platform:junit-platform-launcher * testRuntimeOnly