https://github.com/converged-computing/flex-aws-topology
Mapping AWS topology API into a fluxion graph
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.7%) to scientific vocabulary
Repository
Mapping AWS topology API into a fluxion graph
Basic Info
Statistics
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Flex AWS Topology
Explore using the AWS topology API
We want to eventually use the AWS Topology API to add metadata to fluence about a cluster topology. To do this we can use the Go bindings.
Design
Overview
The Flux Framework "flux-sched" or fluxion project provides modular bindings in different languages for intelligent, graph-based scheduling. When we extend fluxion to a tool or project that warrants logic of this type, we call this a flex! Thus, the project here demonstrates flex-archspec, or using fluxion to match some system request to what we know in the archspec graph. E.g.,
Do you have an x86 system with this compiler option?
This is a simple use case that doesn't perfectly reflect the OCI container use case, but we need to start somewhere! For this very basic setup we are going to:
- Load the machines into a JSON Graph (called JGF).
- Try doing a query against system metadata
There will eventually be a third component - a container image specification, for which we need to include somewhere here. I am starting simple!
Concepts
From the above, the following definitions might be useful.
- Flux Framework: a modular framework for putting together a workload manager. It is traditionally for HPC, but components have been used in other places (e.g., here, Kubernetes, etc). It is analogous to Kubernetes in that it is modular and used for running batch workloads.
- fluxion: refers to flux-framework/flux-sched and is the scheduler component or module of Flux Framework. There are bindings in several languages, and specifically the Go bindings (server at flux-framework/flux-k8s) assemble into the project "fluence."
- flex is an out of tree tool, plugin, or similar that uses fluxion to flexibly schedule or match some kind of graph-based resources. This project is an example of a flex!
Usage
Build
This demonstrates how to build the bindings. You will need to be in the VSCode developer container environment, or produce the same on your host. Note that we currently are using this commit that si a fork of milroy's work to ensure the module name matches what is added to go.mod (it won't work otherwise). When this is merged, we will update to flux-framework/flux-sched. Below shows the make command that builds our final binary!
bash
make
```console
This needs to match the flux-sched install and latest commit, for now we are using a fork of milroy's branch
that has a go.mod updated to match the org name
go get -u github.com/researchapps/flux-sched/resource/reapi/bindings/go/src/fluxcli@86f5bb331342f2883b057920cf58e2c042aef881
go mod tidy mkdir -p ./bin GOOS=linux CGOCFLAGS="-I/opt/flux-sched/resource/reapi/bindings/c" CGOLDFLAGS="-L/usr/lib -L/opt/flux-sched/resource -lfluxion-resource -L/opt/flux-sched/resource/libjobspec -ljobspecconv -L//opt/flux-sched/resource/reapi/bindings -lreapicli -lflux-idset -lstdc++ -lczmq -ljansson -lhwloc -lboostsystem -lflux-hostlist -lboostgraph -lyaml-cpp" go build -ldflags '-w' -o bin/aws-topology src/cmd/main.go ```
The output is generated in bin:
bash
$ ls bin/
aws-topology
Run
1. Paths
Ensure you have your LD library path set to find flux sched (fluxion) libraries.
bash
export LD_LIBRARY_PATH=/usr/lib:/opt/flux-sched/resource:/opt/flux-sched/resource/reapi/bindings:/opt/flux-sched/resource/libjobspec
2. Credentials
Ensure you have AWS credentials in your environment (I figure any cloud scheduler we use is going to have environment ephemeral secrets and not other kinds of credentials, but we can change this).
bash
export AWS_ACCESS_KEY_ID=xxxxxxxxxxxxx
export AWS_SECRET_ACCESS_KEY=xxxxxxxxxxx
export AWS_SESSION_TOKEN=xxxxxxxxxxxxx
4. Nodes
You can either manually create node (be sure to choose an hpc instance family type) or create a mini cluster with the config provided here:
bash
eksctl create cluster --config-file eksctl-config.yaml
That will create a cluster with 2 nodes in a placement group in us-east-2. It takes a while.
5. Run
Here is how to run with an instance id, and note that the instance is running:
bash
./bin/aws-topology --instance i-0cd50e305e797dde3
And here is how to run with a placement group (per the creation above):
bash
./bin/aws-topology --group eks-efa-testing --region us-east-2
This will generate the JGF to a non-temporary file for you to debug:
bash
./bin/aws-topology --group eks-efa-testing --region us-east-2 --file ./aws-topology.json
This shows simple output and you can also view the generated topology:
Output for JGF
```console This is the flex aws topology prototype Match policy: first Load format: JSON Graph Format (JGF) Created flex resource graph &{%!s(*fluxcli.ReapiCtx=&{})} Topology Query Parameters: { DryRun: false, GroupNames: ["eks-efa-testing"] } { Instances: [{ AvailabilityZone: "us-east-2b", GroupName: "eks-efa-testing", InstanceId: "i-02125af4faf797399", InstanceType: "hpc6a.48xlarge", NetworkNodes: ["nn-ec17a935b39a06f41","nn-dd9ec3119ca6ea9dc","nn-a59759166e67e7c02"], ZoneId: "use2-az2" },{ AvailabilityZone: "us-east-2b", GroupName: "eks-efa-testing", InstanceId: "i-0fbbd476a798a3f82", InstanceType: "hpc6a.48xlarge", NetworkNodes: ["nn-ec17a935b39a06f41","nn-dd9ec3119ca6ea9dc","nn-a59759166e67e7c02"], ZoneId: "use2-az2" }], NextToken: "..." } i-02125af4faf797399 is not yet seen, adding with uid 1 nn-ec17a935b39a06f41 is not yet seen, adding with uid 2 nn-dd9ec3119ca6ea9dc is not yet seen, adding with uid 3 nn-a59759166e67e7c02 is not yet seen, adding with uid 4 Creating instance node for i-02125af4faf797399 Creating network node for nn-ec17a935b39a06f41 Creating network node for nn-dd9ec3119ca6ea9dc Creating network node for nn-a59759166e67e7c02 i-0fbbd476a798a3f82 is not yet seen, adding with uid 5 Creating instance node for i-0fbbd476a798a3f82 Creating node 0 cluster Creating node 1 i-02125af4faf797399 Creating node 2 nn-ec17a935b39a06f41 Creating node 3 nn-dd9ec3119ca6ea9dc Creating node 4 nn-a59759166e67e7c02 Creating node 5 i-0fbbd476a798a3f82 Creating edge (4 contains->5) (5 in-> 4) Creating edge (0 contains->2) (2 in-> 0) Creating edge (2 contains->3) (3 in-> 2) Creating edge (3 contains->4) (4 in-> 3) Creating edge (4 contains->1) (1 in-> 4) ```Note that we could next add some kind of match - I'm guessing we care about distances in the graph more than attributes. I'll wait to chat with folks more about next steps, because I've accomplished the goal I set out to do. This was immensely satisfying to work on.
6. Cleanup
Don't forget to cleanup your nodes - they cost money!
bash
eksctl delete cluster --config-file eksctl-config.yaml
License
HPCIC DevTools is distributed under the terms of the MIT license. All new contributions must be made under this license.
See LICENSE, COPYRIGHT, and NOTICE for details.
SPDX-License-Identifier: (MIT)
LLNL-CODE- 842614
Owner
- Name: Converged Computing
- Login: converged-computing
- Kind: organization
- Website: https://converged-computing.org
- Repositories: 84
- Profile: https://github.com/converged-computing
The best of cloud and high performance computing: technology and community combined.
GitHub Events
Total
Last Year
Packages
- Total packages: 1
- Total downloads: unknown
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 0
proxy.golang.org: github.com/converged-computing/flex-aws-topology
- Homepage: https://github.com/converged-computing/flex-aws-topology
- Documentation: https://pkg.go.dev/github.com/converged-computing/flex-aws-topology#section-documentation
- License: MIT
Rankings
Dependencies
- fluxrm/flux-core bookworm build
- github.com/aws/aws-sdk-go v1.49.9
- github.com/converged-computing/jsongraph-go v0.0.0-20231221142338-90aad45775bb
- github.com/jmespath/go-jmespath v0.4.0
- github.com/researchapps/flux-sched/resource/reapi/bindings/go v0.0.0-20231222214538-0f33b17f6e79
- k8s.io/api=>k8s.io/api v0.22.3
- k8s.io/apiextensions-apiserver=>k8s.io/apiextensions-apiserver v0.22.3
- k8s.io/apimachinery=>k8s.io/apimachinery v0.22.3
- k8s.io/apiserver=>k8s.io/apiserver v0.22.3
- k8s.io/cli-runtime=>k8s.io/cli-runtime v0.22.3
- k8s.io/client-go=>k8s.io/client-go v0.22.3
- k8s.io/cloud-provider=>k8s.io/cloud-provider v0.22.3
- k8s.io/cluster-bootstrap=>k8s.io/cluster-bootstrap v0.22.3
- k8s.io/code-generator=>k8s.io/code-generator v0.22.3
- k8s.io/component-base=>k8s.io/component-base v0.22.3
- k8s.io/component-helpers=>k8s.io/component-helpers v0.22.3
- k8s.io/controller-manager=>k8s.io/controller-manager v0.22.3
- k8s.io/cri-api=>k8s.io/cri-api v0.22.3
- k8s.io/csi-translation-lib=>k8s.io/csi-translation-lib v0.22.3
- k8s.io/kube-aggregator=>k8s.io/kube-aggregator v0.22.3
- k8s.io/kube-controller-manager=>k8s.io/kube-controller-manager v0.22.3
- k8s.io/kube-proxy=>k8s.io/kube-proxy v0.22.3
- k8s.io/kube-scheduler=>k8s.io/kube-scheduler v0.22.3
- k8s.io/kubectl=>k8s.io/kubectl v0.22.3
- k8s.io/kubelet=>k8s.io/kubelet v0.22.3
- k8s.io/kubernetes=>k8s.io/kubernetes v1.22.3
- k8s.io/legacy-cloud-providers=>k8s.io/legacy-cloud-providers v0.22.3
- k8s.io/metrics=>k8s.io/metrics v0.22.3
- k8s.io/mount-utils=>k8s.io/mount-utils v0.22.3
- k8s.io/pod-security-admission=>k8s.io/pod-security-admission v0.22.3
- k8s.io/sample-apiserver=>k8s.io/sample-apiserver v0.22.3
- github.com/aws/aws-sdk-go v1.49.9
- github.com/converged-computing/jsongraph-go v0.0.0-20231221142338-90aad45775bb
- github.com/davecgh/go-spew v1.1.0
- github.com/jmespath/go-jmespath v0.4.0
- github.com/jmespath/go-jmespath/internal/testify v1.5.1
- github.com/pmezard/go-difflib v1.0.0
- github.com/researchapps/flux-sched/resource/reapi/bindings/go v0.0.0-20231222214538-0f33b17f6e79
- github.com/stretchr/objx v0.1.0
- gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405
- gopkg.in/yaml.v2 v2.2.8