https://github.com/awslabs/mlspace
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.4%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: awslabs
- License: apache-2.0
- Language: TypeScript
- Default Branch: develop
- Size: 8.67 MB
Statistics
- Stars: 23
- Watchers: 4
- Forks: 3
- Open Issues: 7
- Releases: 16
Metadata Files
README.md
MLSpace
What is MLSpace?
MLSpace enables data scientists to leverage the power of Amazon SageMaker through a secure, PKI-enabled portal so they can collaboratively build, train, and deploy machine learning models for mission use cases. MLSpace provides frictionless access to machine learning resources and is especially targeted at individuals and teams without direct access to the AWS platform. In short, MLSpace is an accessible, open source, data science environment for data science teams or communities of any size. It is a serverless application, significantly reducing administrative and application hosting costs.
MLSpace provides users access to selected resources within the Amazon SageMaker service (e.g., Jupyter notebooks, training jobs, endpoints) through a user interface (UI) that mirrors the AWS Management Console. If available in region, MLSpace customers can also access Amazon Ground Truth, Amazon Translate, and Amazon Bedrock. MLSpace also provides project management, data management, and portfolio management features that are not explicitly offered by Amazon SageMaker. These features support the governance and resource management & control of the customer’s data science environment. MLSpace can be installed and used in any region where SageMaker is available.
Deployment Prerequisites
Pre-Deployment Steps
- Set up and have access to an AWS account
- Have your Identity Provider (IdP) information and access
- Optional: Create Notebook & App Policies & Roles in advance if your organization requires pre-approvals
- Optional: Have your VPC information available, if you are using an existing one for your deployment
- Optional: Have your Proxy information available if required
- Note: CDK briefly leverages SSM. Confirm it is approved for use by your organization before beginning
- Note: The MLSpace deployment is optimized for Linux-based environments. If your local environment does not meet this requirement, we recommend provisioning an Amazon EC2 instance running Linux to serve as your deployment platform.
Software
In order to build and deploy MLSpace to your AWS account you will need the following software installed on your machine:
- git
- awscli
- nodejs 18+
- docker or a compatible runtime for generating lambda layers
- cdk (
npm install -g cdk)
Additional Information
In addition to the required software you will also need to have the following information:
- AWS account Id and region you'll be deploying MLSpace into (you'll need admin credentials or similar)
- Identity provider (IdP) information including the OIDC endpoint and client name
Configuring MLSpace
There are two options for configuring MLSpace for deployment:
Option 1 (Recommended)- Configure using the MLSpace Config Wizard
Configure MLSpace using Option 1 if:
- you want to be prompted for only the settings necessary to launch MLSpace with minimal configuration changes; or...
- you want to be prompted only for the necessary settings as well as values that are commonly modified (VPC configuration, IAM roles, etc)
- you want to configure and deploy MLSpace with a generated config file which is not committed to git
After running the MLSpace Config Wizard, a new file will be generated: /lib/config.json.
When MLSpace is deployed it will merge the settings in /lib/config.json and constants.ts to determine the final configuration settings (giving precedence to values set in /lib/config.json).
Any values left empty while using the MLSpace Config Wizard will default to what is set in the lib/constants.ts file.
The MLSpace Config Wizard can be invoked with the command:
Bash
npm run config
This will prompt you to choose between Basic Config and Advanced Config.
- Basic Config - only prompts for the properties which must be set in order to deploy MLSpace.
- Advanced Config - prompts for required fields as well as optional configurations which are commonly customized.
If selecting Basic Config, the properties you will be prompted for are:
- AWS account ID: the AWS account ID for the account MLSpace will be deployed into
- AWS region: the region that MLSpace resources will be deployed into
- OIDC URL: the OIDC endpoint that will be used for MLSpace authentication
- OIDC Client Name: the OIDC client name that should be used by MLSpace for authentication
If selecting Advanced Config you will be prompted for the same properties Basic Config prompts for, as well as other optional values. Anything not specified will use the defaults in constants.ts and/or provisioned by MLSpace.
The Advanced Config will ask:
Do you want to use existing VPC? If you answered yes you will be prompted for:
- VPC Name: if MLSpace is being deployed into an existing VPC this should be the name of that VPC
- VPC ID: if MLSpace is being deployed into an existing VPC this should be the ID of that VPC
- VPC Default Security Group: if MLSpace is being deployed into an existing VPC this should be the default security group of that VPC
Do you want to use existing IAM Roles? If you answered yes you will be prompted for:
- S3 Reader Role ARN: ARN of an existing IAM role to use for reading from the static website S3 bucket
- Bucket Deployment Role ARN: ARN of an existing IAM role to use for deploying to the static website S3 bucket
- Notebook Role ARN: ARN of an existing IAM role to associate with all notebooks created in MLSpace
- App Role ARN: ARN of an existing IAM role to use for executing the MLSpace lambdas
- System Role ARN: ARN of an existing IAM role to use for executing the MLSpace system lambdas (cleanup and configuration)
- EMR Default Role ARN: ARN of an existing IAM role that will be used as the 'ServiceRole' for all EMR clusters
- EMR EC2 Instance Role ARN: ARN of an existing role that will be used as the 'JobFlowRole' and 'AutoScalingRole' for all EMR clusters
Do you want to modify the banner displayed on MLSpace? If you answered yes you will be prompted for:
- System Banner Text: the text to display on the system banner displayed at the top and bottom of the MLSpace web application. If set to a blank string no banner will be displayed
- System Banner Background Color: the background color of the system banner if enabled
- System Banner Text Color: the color of the text displayed in the system banner if enabled
Option 2 - Configure by updating lib/constants.ts
Configure MLSpace using Option 2 if:
- the MLSpace Config Wizard doesn't configure all of the settings you need to customize
- you wish to have your configuration changes in a file that's committed to git
- will have to resolve conflicts when upgrading MLSpace
If you are pre-creating roles you will need to ensure that the required role ARNs (APP_ROLE_ARN, NOTEBOOK_ROLE_ARN, and SYSTSTEM_ROLE_ARN), policy ARNs ( ENDPOINT_CONFIG_INSTANCE_CONSTRAINT_POLICY_ARN, JOB_INSTANCE_CONSTRAINT_POLICY_ARN, and KMS_INSTANCE_CONDITIONS_POLICY_ARN), role names (KEY_MANAGER_ROLE_NAME if EXISTING_KMS_MASTER_KEY_ARN is not set), and AWS_ACCOUNT (used to ensure unique S3 bucket names) have been properly set in lib/constants.ts.
You will also need to set OIDC_URL and OIDC_CLIENT_NAME with the correct values based on your chosen IdP. These property must be set prior to deploying MLSpace.
To see the full list of configurable properties and their descriptions, see the Configurable deployment parameters section.
Creating a production optimized web app build
Once configuration has been completed you will also need to create a production build of the web application. You can do this by changing to the web application directory (frontend/) and running:
```Bash
From project root directory
cd frontend npm run clean && npm install ```
This will generate a production optimized build of the web application and documentation, the resulting artifacts will be written to the frontend/build/ directory.
There are no web application specific configuration parameters that need to be set as the configuration will be dynamically generated as part of the CDK deployment based on the variables set during the configuration steps, as well as the deployed resources.
Deploying the CDK application
The MLSpace application is a standard CDK application and can be deployed just as any CDK application is deployed:
```Bash
From project root directory
npm install && cdk bootstrap
Once the account has been bootstrap you can deploy the application. You can optionally include --require-approval never in the below command if you don't want to confirm changes:
```Bash
From project root directory
cdk deploy --all ```
Configurable deployment parameters
If the config-helper doesn't provide the level of customization you need for your deployment, you can update the values in lib/constants.ts based on your specific deployment needs. Some of these will directly impact whether new resources are created within your account or whether existing resources (VPC, KMS, Roles, etc) will be leveraged.
Required Parameters
| Variable | Description | Default |
|----------|:-------------:|------:|
| AWSACCOUNT | The account number that MLSpace is being deployed into. Used to disambiguated S3 buckets within a region. | - |
| AWSREGION | The region that MLSpace is being deployed into. This is only needed when you are using an existing VPC or KMS key and EXISTING_KMS_MASTER_KEY_ARN or EXISTING_VPC_ID is set. | - |
| KEYMANAGERROLENAME | Name of the IAM role with permissions to manage the KMS Key. If this property is set you _do not need to set EXISTING_KMS_MASTER_KEY_ARN. | - |
| OIDCURL | The OIDC endpoint that will be used for MLSpace authentication | - |
| OIDCCLIENT_NAME | The OIDC client name that should be used by MLSpace for authentication | - |
### Optional Parameters
| Variable | Description | Default | |------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|------------------------------------:| | IDP_ENDPOINT_SSM_PARAM | If set, MLSpace will use the value of this parameter as the `OIDC_URL`. During deployment the value of this parameter will be read from SSM. This value takes precedence over `OIDC_URL` if both are set. | - | | OIDC_REDIRECT_URL | The redirect URL that should be used after succesfully authenticating with the OIDC provider. This will default to the API gateway URL generated by the CDK deployment but can be manually set if you're using custom DNS | - | | OIDC_VERIFY_SSL | Whether or not calls to the OIDC endpoint specified in the `OIDC_URL` environment variable should validate the server certificate | `true` | | OIDC_VERIFY_SIGNATURE | Whether or not the lambda authorizer should verify the JWT token signature | `true` | | ADDITIONAL_LAMBDA_ENVIRONMENT_VARS | A map of key value pairs which will be set as environment variables on every MLSpace lambda | `{}` | | RESOURCE_TERMINATION_INTERVAL | Interval (in minutes) to run the resource termination cleanup lambda | `60` | | BACKGROUND_REFRESH_INTERVAL | Interval (in seconds) to run background resource data updates | `60` | | DATASETS_TABLE_NAME | Dynamo DB table to hold dataset related metadata | `mlspace-datasets` | | PROJECTS_TABLE_NAME | Dynamo DB table to hold project related metadata | `mlspace-projects` | | PROJECT_USERS_TABLE_NAME | Dynamo DB table to hold project membership related metadata for users. Including permissions and project/user specific IAM role data. | `mlspace-project-users` | | PROJECT_GROUPS_TABLE_NAME | Dynamo DB table to hold project membership related metadata for groups. Including project permissions. | `mlspace-project-groups` | | GROUPS_TABLE_NAME | Dynamo DB table to hold group related metadata | `mlspace-groups` | | GROUP_USERS_TABLE_NAME | Dynamo DB table to hold group membership related metadata. Including permissions and group/user specific IAM role data. | `mlspace-group-users` | | GROUPS_MEMBERSHIP_HISTORY_TABLE_NAME | Dynamo DB table to hold group membership history audit data. Indluding when users are added and removed from groups and what user completed the action. | `mlspace-group-membership-history` | | GROUP_DATASETS_TABLE_NAME | Dynamo DB Table to hold the relationships for what datasets are shared with a group. | `mlspace-group-datasets` | | USERS_TABLE_NAME | Dynamo DB table to hold user related metadata | `mlspace-users` | | APP_CONFIGURATION_TABLE_NAME | Dynamo DB table to hold dynamic configuration settings. These are settings than can be modified after the app has been deployed. | `mlspace-app-configuration` | | CONFIG_BUCKET_NAME | S3 bucket used to store MLSpace configuration files (notebook lifecycle configs, notebook params, etc) | `mlspace-config` | | DATA_BUCKET_NAME | S3 bucket used to store user uploaded dataset files | `mlspace-datasets` | | LOGS_BUCKET_NAME | S3 bucket used to store logs from EMR clusters launched in MLSpace and, if configured, MLSpace cloudtrail events | `mlspace-logs` | | ACCESS_LOGS_BUCKET_NAME | S3 bucket which will store access logs if `ENABLE_ACCESS_LOGGING` is `true` | `mlspace-access-logs` | | WEBSITE_BUCKET_NAME | S3 bucket used to store the static MLSpace website | `mlspace-website` | | MLSPACE_LIFECYCLE_CONFIG_NAME | Name of the default licycle config that should be used with MLSpace notebooks (will be generated as part of the CDK deployment) | `mlspace-notebook-lifecycle-config` | | NOTEBOOK_PARAMETERS_FILE_NAME | Filename of the default notebook parameters that is generated as part of the CDK deployment | `mlspace-website` | | PERMISSIONS_BOUNDARY_POLICY_NAME | Name of the managed policy used as a permissions boundary for Secure User Scoped Roles. If this is not set the default permissions boundary will be created and used | - | | EXISTING_KMS_MASTER_KEY_ARN | ARN of existing KMS key to use with MLSpace. This key should allow the roles associated with the `NOTEBOOK_ROLE_ARN`, `APP_ROLE_ARN`, and `SYSTEM_ROLE_ARN` usage of the key. This value takes precedence over `KEY_MANAGER_ROLE_NAME` if both are set. If this property is set you _do not_ need to set `KEY_MANAGER_ROLE_NAME`. | - | | SYSTEM_TAG | Tag which will be applied to all MLSpace resources created with the AWS account to which MLSpace is deployed | `MLSpace` | | IAM_RESOURCE_PREFIX | Value preprended to MLSpace Secure User Scoped Roles and policies when `MANAGE_IAM_ROLES` is set to `true` | `MLSpace` | | MANAGE_IAM_ROLES | This setting determines whether or not MLSpace will utilize unique roles per project/user combinations | `true` | | NOTIFICATION_DISTRO | Optional email distribution list which will be notified whenOwner
- Name: Amazon Web Services - Labs
- Login: awslabs
- Kind: organization
- Location: Seattle, WA
- Website: http://amazon.com/aws/
- Repositories: 914
- Profile: https://github.com/awslabs
AWS Labs
GitHub Events
Total
- Create event: 39
- Issues event: 1
- Release event: 5
- Watch event: 7
- Delete event: 31
- Issue comment event: 7
- Push event: 58
- Pull request review comment event: 19
- Pull request review event: 46
- Pull request event: 74
- Fork event: 1
Last Year
- Create event: 39
- Issues event: 1
- Release event: 5
- Watch event: 7
- Delete event: 31
- Issue comment event: 7
- Push event: 58
- Pull request review comment event: 19
- Pull request review event: 46
- Pull request event: 74
- Fork event: 1
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 2
- Total pull requests: 92
- Average time to close issues: 25 days
- Average time to close pull requests: 4 days
- Total issue authors: 2
- Total pull request authors: 13
- Average comments per issue: 2.0
- Average comments per pull request: 0.46
- Merged pull requests: 71
- Bot issues: 0
- Bot pull requests: 14
Past Year
- Issues: 1
- Pull requests: 23
- Average time to close issues: N/A
- Average time to close pull requests: 12 days
- Issue authors: 1
- Pull request authors: 8
- Average comments per issue: 0.0
- Average comments per pull request: 0.13
- Merged pull requests: 10
- Bot issues: 0
- Bot pull requests: 11
Top Authors
Issue Authors
- KyleLilly (2)
- szotrj (1)
- ruckc (1)
Pull Request Authors
- dustins (82)
- estohlmann (65)
- douglas1850 (50)
- AlejandroRigau (44)
- LightSeekerSC (44)
- KyleLilly (34)
- dependabot[bot] (25)
- github-actions[bot] (11)
- tomjansto (2)
- batzela (1)
- bedanley (1)
- jmharold (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- 1585 dependencies
- @babel/plugin-proposal-private-property-in-object ^7.21.11 development
- @stylistic/eslint-plugin ^1.5.0 development
- @testing-library/jest-dom ^5.16.5 development
- @testing-library/react ^14.0.0 development
- @testing-library/user-event ^14.4.3 development
- @types/jest ^27.5.2 development
- @types/lodash ^4.14.194 development
- @types/node ^16.18.23 development
- @types/react ^18.0.37 development
- @types/react-dom ^18.0.11 development
- @types/redux-mock-store ^1.0.3 development
- @types/redux-persist ^4.3.1 development
- @types/sinon ^10.0.14 development
- @types/uuid ^9.0.1 development
- eslint ^8.38.0 development
- eslint-import-resolver-typescript ^3.5.5 development
- eslint-plugin-import ^2.26.0 development
- eslint-plugin-simple-import-sort ^10.0.0 development
- eslint-plugin-spellcheck 0.0.19 development
- lint-staged ^13.2.1 development
- react-scripts 5.0.1 development
- redux-mock-store ^1.5.4 development
- typescript ^4.9.5 development
- vitepress ^1.0.0-rc.44 development
- @cloudscape-design/components ^3.0.341
- @cloudscape-design/components-themeable ^3.0.546
- @cloudscape-design/design-tokens ^3.0.30
- @cloudscape-design/global-styles ^1.0.10
- @reduxjs/toolkit ^1.9.5
- axios ^0.27.2
- git-repo-info ^2.1.1
- jest-mock-axios ^4.7.0-beta
- lodash 4.17.21
- quill ^1.3.5
- react ^18.2.0
- react-dom ^18.2.0
- react-oidc-context ^2.2.0
- react-quill ^2.0.0-beta.4
- react-redux ^8.0.2
- react-router-dom ^6.10.0
- redux-persist ^6.0.0
- zod ^3.21.4
- 431 dependencies
- @aws-sdk/client-s3 ^3.400.0 development
- @stylistic/eslint-plugin ^1.5.0 development
- @types/aws-lambda 8.10.119 development
- @types/jsonwebtoken ^9.0.2 development
- @types/node * development
- @typescript-eslint/eslint-plugin ^5.36.1 development
- @typescript-eslint/parser ^5.36.1 development
- aws-cdk-lib ^2.93.0 development
- constructs ^10.0.97 development
- esbuild ^0.19.2 development
- eslint ^8.23.0 development
- eslint-import-resolver-typescript ^3.4.1 development
- eslint-plugin-import ^2.26.0 development
- eslint-plugin-simple-import-sort ^7.0.0 development
- eslint-plugin-spellcheck ^0.0.19 development
- lint-staged ^13.0.3 development
- typescript ~4.7.4 development
- aws-cdk-lib ^2.93.0
- PyJWT ==2.6.0
- boto3 ==1.24.94
- cachetools ==5.3.2
- dynamodb-json ==1.3
- moto ==4.0.8
- pyseto ==1.7.8
- cachetools ==5.3.2 test
- pytest ==7.1.2 test
- pytest-cov ==3.0.0 test
- pytest-html ==3.1.1 test
- actions/checkout v4 composite
- actions/setup-node v4 composite
- actions/setup-python v5 composite
- ahmadnassri/action-workflow-queue v1 composite
- aws-actions/configure-aws-credentials v4 composite
- jsdaniell/create-json v1.2.3 composite
- rtCamp/action-slack-notify v2 composite
- actions/checkout v4 composite
- actions/setup-node v4 composite
- actions/setup-python v5 composite
- ahmadnassri/action-workflow-queue v1 composite
- aws-actions/configure-aws-credentials v4 composite
- jsdaniell/create-json v1.2.3 composite
- rtCamp/action-slack-notify v2 composite
- actions/checkout v4 composite
- rtCamp/action-slack-notify v2 composite
- actions/checkout v4 composite
- actions/setup-node v4 composite
- rtCamp/action-slack-notify v2 composite
- actions/checkout v4 composite
- actions/checkout v4 composite
- rtCamp/action-slack-notify v2 composite
- actions/checkout v4 composite
- actions/checkout v4 composite
- actions/setup-node v4 composite
- actions/setup-python v5 composite
- pre-commit/action v3.0.1 composite
- rtCamp/action-slack-notify v2 composite
- actions/checkout v4 composite
- actions/configure-pages v4 composite
- actions/deploy-pages v4 composite
- actions/setup-node v4 composite
- actions/upload-pages-artifact v3 composite
- 263 dependencies
- @cloudscape-design/components ^3.0.341 development
- @percy/cypress ^3.1.2 development
- cypress ^13.13.3 development
- husky ^8.0.0 development
- lint-staged ^13.0.3 development
- typescript ^4.8.2 development
- cypress-file-upload ^5.0.8