Recent Releases of https://github.com/awslabs/kubeflow-manifests
https://github.com/awslabs/kubeflow-manifests - v1.7.0-aws-b1.0.3
What's Changed
- Terraform related changes
- Infrastructure and application deployment for terraform now use aws-eks-blueprints v4.32.1 in order to mitigate a breaking change caused by the aws-eks-blueprints github repository being refactored. (#776)
- Fix to pass variable mysqlengineversion to RDS module by @elanv in https://github.com/awslabs/kubeflow-manifests/pull/781
- Increasing the default disk space for gpu instances and adding a variable to configure the disk size. (#782)
- Update rds engine version in terraform deployments. (#785)
- Migrate to aws-eks-blueprints v5 for Kubernetes Addon installation. For GPU deployments the Nvidia operator will be installed. (#779)
- Helm/Kustomize related changes
- Updated the AWS Load Balancer Controller’s Alb policy to include AddTag permissions in Create* operations in accordance with the new aws policy change. By @ananth102 in https://github.com/awslabs/kubeflow-manifests/pull/777
New Contributors
- @elanv made their first contribution in https://github.com/awslabs/kubeflow-manifests/pull/781
Full Changelog: https://github.com/awslabs/kubeflow-manifests/compare/v1.7.0-aws-b1.0.2...v1.7.0-aws-b1.0.3
- YAML
Published by ananth102 over 2 years ago
https://github.com/awslabs/kubeflow-manifests - v1.7.0-aws-b1.0.2
What's Changed
- Fix Kubeflow on AWS terraform deployments failing due to Terraform aws-vpc module breaking changes
- The Terraform AWS provider recently deprecated resources and attributes related to EC2 Classic, resulting in breaking changes to aws-vpc module < 5.0.0 that were using the Terraform AWS provider > 5.0.0. This release updates the aws-vpc module and pins the version of the Terraform AWS provider to > 5 and < 6 to prevent such issues in future.
- Upgraded the terraform aws-vpc module by @amitkalawat in https://github.com/awslabs/kubeflow-manifests/pull/751
- Pin Terraform AWS provider to 5.x.x by @ananth102 in https://github.com/awslabs/kubeflow-manifests/pull/755
- Installation script fixes for cognito-rds-s3-static helm deployments by @sagi-shimoni in https://github.com/awslabs/kubeflow-manifests/pull/749
New Contributors
- @amitkalawat made their first contribution in https://github.com/awslabs/kubeflow-manifests/pull/751
- @sagi-shimoni made their first contribution in https://github.com/awslabs/kubeflow-manifests/pull/749
Full Changelog: https://github.com/awslabs/kubeflow-manifests/compare/v1.7.0-aws-b1.0.0...v1.7.0-aws-b1.0.2
- YAML
Published by ananth102 almost 3 years ago
https://github.com/awslabs/kubeflow-manifests - v1.7.0-aws-b1.0.1
What's Changed
- Fixed manifest issue with pipelines affecting all S3 deployments using static credentials by @jsitu777 https://github.com/awslabs/kubeflow-manifests/pull/710, https://github.com/awslabs/kubeflow-manifests/pull/716.
- If you are using
IRSAasPIPELINE_S3_CREDENTIAL_OPTIONyou are not affected by this issue.
- If you are using
- Added support for automated EFS deployment for Terraform deployments and update EKS blueprints version to v4.31.0 by @rrrkharse in https://github.com/awslabs/kubeflow-manifests/pull/731
- Fix load balancer auto setup script if load balancer schema is
internalby @rrrkharse in https://github.com/awslabs/kubeflow-manifests/pull/732 - Documentation for creating additional profiles when using
IRSAasPIPELINE_S3_CREDENTIAL_OPTIONby @ryansteakley https://github.com/awslabs/kubeflow-manifests/pull/700, https://github.com/awslabs/kubeflow-manifests/pull/722 - Documentation fixes for KServe, RDS-S3 guides https://github.com/awslabs/kubeflow-manifests/pull/720, https://github.com/awslabs/kubeflow-manifests/pull/728
Full Changelog: https://github.com/awslabs/kubeflow-manifests/compare/v1.7.0-aws-b1.0.0...v1.7.0-aws-b1.0.1
- YAML
Published by surajkota about 3 years ago
https://github.com/awslabs/kubeflow-manifests - v1.7.0-aws-b1.0.0
What’s New
This release offers the following features:
* Added support for Kubeflow v1.7.0. Upstream Kubeflow components versions as listed in components versions table
* Support IAM Role for Service Account (IRSA) for using Amazon S3 as artifact store for Kubeflow Pipelines
* IRSA can be used to configure Amazon S3 as an artifact store for pipelines. IRSA allows to use temporary credentials to make API requests and to scope permissions at pod level via Kubernetes service accounts. Instead of creating static IAM User credentials to access S3, using IRSA implements the security best practices of principle of least privilege and credential isolation. (#571, #601, #613, #680, #685)
* Starting this release, we are deprecating the use of IAM user/static credentials in favor of IRSA to configure S3 with Kubeflow pipelines. We highly recommend migrating to using IRSA. For more details about this change refer to the Github issue #704
* Configure Server side encryption and block public access to S3 bucket used by Kubeflow Pipelines by default as security best practice (https://github.com/awslabs/kubeflow-manifests/pull/517, https://github.com/awslabs/kubeflow-manifests/pull/518)
* Support using IRSA with KServe Inference Services. Use this feature to pull images from private ECR repository or load models directly from S3 bucket.
* Support for using Amazon S3 as an object store backend for TensorBoard. Users can now visualize TensorBoard compatible logs stored in S3 published by model servers and training jobs(including TrainingJobs run on SageMaker) to track experiment metrics like loss and accuracy, visualizing the model graph etc.
* Added ability to annotate the service account using AWSIAMforServiceAccount Plugin. Users can use this feature if their organizational policies restrict them from using profile controller for updating IAM policies.
* Setting annotateOnly to true in AWSIAMforServiceAccount Plugin will only annotate the service account in user profile and skip mutating the IAM Policy.
* Support configuring Amazon S3 as a remote backend for storing Terraform state (#674)
* Support configuring auto stopping of idle Jupyter Notebook Servers
* Enabled support for Notebook Culling. Users can save infrastructure costs by specifying notebook instance to stop if it stays idle for certain period of time. (#470)
* Updated notebook containers with the latest AWS optimized Deep Learning Containers(DLC) based on Tensorflow 2.12.0 and PyTorch 2.0.0 (#676)
* Updated Training and Inference containers with the latest AWS optimized Deep Learning Containers(DLC) based on Tensorflow 2.12 and PyTorch 2.0. Support for CPU/GPU based single node training, distributed training, and inference. For latest DLC images, refer to list of DLC images
* Updated the following drivers to newer versions:
* FSx CSI Driver to v0.9.0
* EFS CSI Driver to v1.5.4
* AWS Load Balancer Controller to v2.4.7
* Updated SageMaker Operator for k8s (ACK) to v1.2.1
* Training Job resource now supports Managed warm pool, heterogeneous clusters through Instance Groups and Retry Strategy
* Added support for SageMaker Pipeline and Pipeline Execution
* Training Job resource now supports Update Operations.
* Support for Deployment guard rails for Endpoint Resource.
* Support for Serverless Endpoint for Endpoint Config Resource.
* Support for retaining AWS resources after CR deletion.
* Supports latest versions of Amazon EKS - eks-compatibility
* Support for Kustomize v5.0.1
* Bugfixes and improvements to the automated scripts
Updated documentation available at: https://awslabs.github.io/kubeflow-manifests/release-v1.7.0-aws-b1.0.0/docs/
Known Issues:
- https://github.com/awslabs/kubeflow-manifests/pull/709
Full Changelog: https://github.com/awslabs/kubeflow-manifests/compare/release-v1.6.1-aws-b1.0.2...release-v1.7.0-aws-b1.0.0
- YAML
Published by jsitu777 about 3 years ago
https://github.com/awslabs/kubeflow-manifests - v1.6.1-aws-b1.0.2
What's Changed
- Release v1.6.1 aws b1.0.2 by @ryansteakley in https://github.com/awslabs/kubeflow-manifests/pull/666
- Update secrets-csi-driver to v1.3.2 by @ryansteakley in https://github.com/awslabs/kubeflow-manifests/pull/651
- Update EKS Blueprints to v4.28.0 by @ryansteakley in https://github.com/awslabs/kubeflow-manifests/pull/659
Known Issues
- https://github.com/awslabs/kubeflow-manifests/issues/118 (Workaround documented in issue)
Full Changelog: https://github.com/awslabs/kubeflow-manifests/compare/v1.6.1-aws-b1.0.1...v1.6.1-aws-b1.0.2
- YAML
Published by ryansteakley about 3 years ago
https://github.com/awslabs/kubeflow-manifests - v1.6.1-aws-b1.0.1
What's Changed
- Katib S3-only Helm Path fix by @jsitu777 in https://github.com/awslabs/kubeflow-manifests/pull/507
- Update RDS engine version by @techwithshadab https://github.com/awslabs/kubeflow-manifests/pull/584
- Update terraform-aws-blueprints to v4.12.1 by @ghaering https://github.com/awslabs/kubeflow-manifests/pull/516
Known Issues
- https://github.com/awslabs/kubeflow-manifests/issues/653 (Resolved, please use the latest version)
- #118 (Workaround documented in issue)
Update Existing Kubeflow Installations with RDS or S3 integrations
On February 6th, the Kubernetes project announced changes to the existing community-owned image registry called k8s.gcr.io to host its container images. On the 3rd of April 2023, the old registry k8s.gcr.io will be frozen and no further images for Kubernetes and related subprojects will be pushed to the old registry. The Kubernetes community recommends to start using the new registry.k8s.io as soon as possible. For more information read the community blog.
Only the Secrets Store CSI Driver in the AWS Distribution of Kubeflow is effected. To update the image registry to point towards the new registry.k8s.io please follow the instructions documented in the below github issue comment.
- #636
Full Changelog: https://github.com/awslabs/kubeflow-manifests/compare/v1.6.1-aws-b1.0.0...v1.6.1-aws-b1.0.1
- YAML
Published by ananth102 over 3 years ago
https://github.com/awslabs/kubeflow-manifests - v1.6.1-aws-b1.0.0
What’s New
This release offers the following features:
* Added support for Kubeflow v1.6.1.
* Component versions as listed in components versions table
* Updated SageMaker operator for k8s (ACK) to version 0.4.5
* Updated notebook containers with the latest deep learning containers based on Tensorflow 2.10.0 and PyTorch 1.12.1 (https://github.com/awslabs/kubeflow-manifests/pull/473)
* Includes all the features from v1.6.0-aws-b1.0.0 (Preview)
* Integration of SageMaker with Kubeflow to run hybrid machine learning workflows using SageMaker Operators for Kubernetes (ACK) and SageMaker Components for Kubeflow Pipelines. Documentation
* Added support for Infrastructure as Code (IaaC) 1-click deployment for Kubeflow on AWS using Terraform (preview)
* Added helm support for all supported deployment options (preview)
* Added integration with Prometheus, Amazon Managed Service for Prometheus, and Amazon Managed Grafana to monitor metrics with Kubeflow on AWS. Documentation
* Automated deployment options have been improved to be simplified and more stable (User-friendly make commands, Deterministic install/uninstall etc)
* Integration with AWS Deep Learning Containers to run distributed training and inference workloads
* Supports newer versions of EKS - eks-compatibility
* Added Nvidia GPU support in Terraform (https://github.com/awslabs/kubeflow-manifests/pull/396)
* Enabled KFP Visualizations and Artifact Store with S3 as source (https://github.com/awslabs/kubeflow-manifests/pull/456)
* Bugfixes and improvements to the automated scripts
Updated documentation available at: https://awslabs.github.io/kubeflow-manifests/release-v1.6.1-aws-b1.0.0/docs/
Known Issues:
- Broken helm path for S3 only deployment #504
- https://github.com/awslabs/kubeflow-manifests/issues/508
- https://github.com/awslabs/kubeflow-manifests/pull/516
Update Existing Kubeflow Installations with RDS or S3 integrations
On February 6th, the Kubernetes project announced changes to the existing community-owned image registry called k8s.gcr.io to host its container images. On the 3rd of April 2023, the old registry k8s.gcr.io will be frozen and no further images for Kubernetes and related subprojects will be pushed to the old registry. The Kubernetes community recommends to start using the new registry.k8s.io as soon as possible. For more information read the community blog.
Only the Secrets Store CSI Driver in the AWS Distribution of Kubeflow is effected. To update the image registry to point towards the new registry.k8s.io please follow the instructions documented in the below github issue comment.
- #636
Full Changelog: https://github.com/awslabs/kubeflow-manifests/compare/release-v1.6.0-aws-b1.0.0...release-v1.6.1-aws-b1.0.0
- YAML
Published by akartsky over 3 years ago
https://github.com/awslabs/kubeflow-manifests - v1.6.0-aws-b1.0.0 (Preview)
What’s New
This is a preview release for Kubeflow
v1.6. The Kubeflow working groups have identified some regressions inv1.6.0which will be addressed inv1.6.1. More details can be found here.
This release offers the following features:
* Added support for Kubeflow v1.6.0. Component versions as listed in components versions table
* Integration of SageMaker with Kubeflow to run hybrid machine learning workflows using SageMaker Operators for Kubernetes (ACK) and SageMaker Components for Kubeflow Pipelines. Documentation
* Added helm support for all supported deployment options
* Automated deployment options have been improved to be simplified and more stable
* Added support for Infrastructure as Code (IaaC) 1-click deployment for Kubeflow on AWS using Terraform (preview)
* Terraform stacks added for all supported deployment options
* Creates a VPC and EKS Cluster
* Creates S3 buckets, RDS instances, and/or Cognito resources as needed
* Configures and deploys Kubeflow
* Configured using EKS Blueprints for improved customizability/extensability
* Configurable S3 endpoint configuration for S3 and RDS-S3 deployment options, allowing PrivateLink and non-commercial region users to connect to their respective S3 endpoints
* Added integration with Prometheus, Amazon Managed Service for Prometheus, and Amazon Managed Grafana to monitor metrics with Kubeflow on AWS. Documentation
* Updated notebook containers with the latest deep learning containers based on Tensorflow 2.9.1 and PyTorch 1.12 (https://github.com/awslabs/kubeflow-manifests/pull/363)
* Integration with AWS Deep Learning Containers to run distributed training and inference workloads
* Enable usage of HTTPs only S3 bucket (https://github.com/awslabs/kubeflow-manifests/pull/335)
* Support for EKS - 1.22, 1.23
This release includes the following bug fixes: * Re-enable mysql for s3-only pipelines deployment (https://github.com/awslabs/kubeflow-manifests/pull/310)
Updated documentation available at: https://awslabs.github.io/kubeflow-manifests/release-v1.6.0-aws-b1.0.0/docs/
Known Issues
- #117 (Workaround documented in issue)
- #118 (Workaround documented in issue)
- Following known issues will be fixed in next release:
- https://github.com/kubeflow/pipelines/issues/8256
- https://github.com/kubeflow/kubeflow/issues/6648
- https://github.com/awslabs/kubeflow-manifests/pull/448
- https://github.com/awslabs/kubeflow-manifests/pull/443
- https://github.com/awslabs/kubeflow-manifests/pull/444
- https://github.com/awslabs/kubeflow-manifests/pull/441
- https://github.com/awslabs/kubeflow-manifests/pull/446#pullrequestreview-1127380568
Full Changelog: https://github.com/awslabs/kubeflow-manifests/compare/release-v1.5.1-aws-b1.0.2...release-v1.6.0-aws-b1.0.0
- YAML
Published by rrrkharse over 3 years ago
https://github.com/awslabs/kubeflow-manifests - v1.5.1-aws-b1.0.2
What’s New
This release includes the following bug fixes merged as part of https://github.com/awslabs/kubeflow-manifests/pull/373: * Fix S3 bucket name substitution for all S3 related deployments, i.e. Cognito-RDS-S3, RDS-S3, S3 (https://github.com/awslabs/kubeflow-manifests/pull/333) * See https://github.com/awslabs/kubeflow-manifests/issues/336 for more details about the issue. This bug was introduced in v1.5.1-aws-b1.0.1. * Fix the missing mysql resources in S3 only deployment (https://github.com/awslabs/kubeflow-manifests/pull/310) * Enable usage of HTTPs only S3 bucket (https://github.com/awslabs/kubeflow-manifests/pull/244) * Fix for RDS-S3 test (https://github.com/awslabs/kubeflow-manifests/pull/341)
Updated documentation available at: https://awslabs.github.io/kubeflow-manifests/release-v1.5.1-aws-b1.0.2/docs/
Known Issues
- #117 (Workaround documented in issue)
- #118 (Workaround documented in issue)
- https://github.com/kubeflow/pipelines/issues/7361 (Terminating the pipeline run does not trigger the deletion logic programmed via the signal handled in a component. This affects all components in general. Terminate functionality in SageMaker components for Kubeflow pipelines is also affected. Workaround is to manually stop the training jobs)
Full Changelog: https://github.com/awslabs/kubeflow-manifests/compare/v1.5.1-aws-b1.0.1...v1.5.1-aws-b1.0.2
- YAML
Published by surajkota over 3 years ago
https://github.com/awslabs/kubeflow-manifests - v1.5.1-aws-b1.0.1
What’s New
This release includes the following bug fixes: * fix Kserve's ingress Gateway (#311) * Add support for non-root EFS files ownership( #268) * Hardcoded S3 endpoint url in workflow controller configmap (#257) * Add CDK created EKS cluster subnet tags for RDS script (#295) * Doc fixes: #304, #307
Updated documentation available at: https://awslabs.github.io/kubeflow-manifests/release-v1.5.1-aws-b1.0.1/docs/
Known Issues
- #117 (Workaround documented in issue)
- #118 (Workaround documented in issue)
- https://github.com/kubeflow/pipelines/issues/7361 (Terminating the pipeline run does not trigger the deletion logic programmed via the signal handled in a component. This affects all components in general. Terminate functionality in SageMaker components for Kubeflow pipelines is also affected. Workaround is to manually stop the training jobs)
Full Changelog: https://github.com/awslabs/kubeflow-manifests/compare/v1.5.1-aws-b1.0.0...v1.5.1-aws-b1.0.1
- YAML
Published by surajkota almost 4 years ago
https://github.com/awslabs/kubeflow-manifests - v1.5.1-aws-b1.0.0
What’s New
This release offers the following features: * Upgrade Kubeflow components for 1.5.1. Component versions as listed in components versions table * Access AWS services from Katib using the AWS IAM Roles for Service Accounts (IRSA) integration with Kubeflow Profiles * Access AWS services from pipeline pods using the AWS IAM Roles for Service Accounts (IRSA) integration with Kubeflow Profiles * Switch from KFServing to KServe as default serving component. Component guide for model serving over load balancer endpoint using KServe * AWS optimized Jupyter notebook server images for TensorFlow-2.6.3 and PyTorch-1.11 * Bug fix to remove unused mysql-pod and pv-claim from pipeline component (#222)
Updated documentation available at: https://awslabs.github.io/kubeflow-manifests/release-v1.5.1-aws-b1.0.0/docs/
Known Issues
- #257
- #117 (Workaround documented in issue)
- #118 (Workaround documented in issue)
- https://github.com/kubeflow/pipelines/issues/7361 (Terminating the pipeline run does not trigger the deletion logic programmed via the signal handled in a component. This affects all components in general. Terminate functionality in SageMaker components for Kubeflow pipelines is also affected. Workaround is to manually stop the training jobs)
Full Changelog: https://github.com/awslabs/kubeflow-manifests/commits/v1.5.1-aws-b1.0.0
- YAML
Published by surajkota almost 4 years ago
https://github.com/awslabs/kubeflow-manifests - v1.4.1-aws-b1.0.0
What’s New
New Website : https://awslabs.github.io/kubeflow-manifests/
This release offers the following integrations and deployment options:
- Automated Setup Script for Amazon Relational Database Service (RDS) and Amazon S3
- Automated Setup Script for AWS Cognito
- Automated Setup Script for Amazon Elastic File System (EFS)
- Automated Setup Script for Amazon FSx for Lustre
- Automated Setup Script for exposing Kubeflow over Application Load Balancer
- AWS IAM Roles for Service Accounts (IRSA) integration with Kubeflow Profiles with support for Notebook component
- Component Guide for Kubeflow KServe/KFServing on AWS
- Amazon CloudWatch Container Insights Integration to capture EKS logs and metrics
- Source for Kubeflow Notebook containers is now part of the repo
- [Bug Fix] AWS Cognito Logout
Known Issues
- https://github.com/awslabs/kubeflow-manifests/issues/117 (Workaround documented in issue)
- https://github.com/awslabs/kubeflow-manifests/issues/118 (Workaround documented in issue)
- https://github.com/awslabs/kubeflow-manifests/issues/87
Contributors
- @akartsky, @AlexandreBrown, @goswamig, @judyheflin, @mbaijal, @rrrkharse, @ryansteakley, @surajkota, @wenjinsitu
Full Changelog: https://github.com/awslabs/kubeflow-manifests/commits/v1.4.1-aws-b1.0.0
- YAML
Published by ryansteakley about 4 years ago
https://github.com/awslabs/kubeflow-manifests - v1.3.1-aws-b1.0.0
What's New
This release offers the following integrations and deployment options: * AWS optimized Jupyter notebook server images based on AWS Deep Learning Containers * Integration with AWS Application Load Balancer to manage external traffic using the AWS Load Balancer Controller * Integration with AWS Certificate Manager and AWS Cognito for TLS and authentication * Integration with Amazon Relational Database Service (RDS) in Pipelines and AutoML(Katib) for persistent metadata store * Integration with Amazon S3 in Pipelines for persistent artifacts store * Integration Amazon EFS CSI driver to manage Amazon Elastic File System (EFS) as persistent workspace or data volumes * Integration with Amazon FSx CSI driver to manage Lustre file systems as persistent workspace or data volumes * Detailed end to end deployment guides for a number of deployment options * Move to Kustomize based installation (no longer use kfctl which was used in Kubeflow-1.2) * Compatible with EKS v1.19, v1.20 and v1.21
Known Issues
- #62
- #83
- #117
- #118
Contributors
- @akartsky, @AlexandreBrown, @goswamig, @jlbutler, @judyheflin, @mbaijal, @rrrkharse, @ryansteakley, @surajkota
Full Changelog: https://github.com/awslabs/kubeflow-manifests/commits/v1.3.1-aws-b1.0.0
- YAML
Published by surajkota over 4 years ago