sonar
Tool to profile usage of HPC resources by regularly probing processes.
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.0%) to scientific vocabulary
Keywords
Repository
Tool to profile usage of HPC resources by regularly probing processes.
Basic Info
Statistics
- Stars: 11
- Watchers: 3
- Forks: 5
- Open Issues: 46
- Releases: 4
Topics
Metadata Files
README.md
sonar
Sonar is a tool to profile usage of HPC resources by regularly sampling processes, nodes, queues, and clusters.
Sonar examines /proc and/or runs some diagnostic programs, filters and groups the information, and
prints it to stdout or sends it to a remote collector (via Kafka).

Image: Midjourney, CC BY-NC 4.0
For more about the motivation and design, and other considerations, see doc/DESIGN.md.
Subcommands
Sonar has several subcommands that collect information about nodes, jobs, clusters, and processes and print it on stdout:
sonar pstakes a snapshot of the currently running processes on the nodesonar sysinfoextracts hardware information about the nodesonar slurmextracts information about overall job state from the slurm databasessonar clusterextracts information about partitions and node state from the slurm databases
Those subcommands are all run-once: Sonar exits after producing output. Additionally, however,
sonar daemon starts Sonar and keeps it memory-resident, running subprograms at intervals specified
by a configuration file. In this mode, exfiltration of data in production normally happens via
Kafka.
Finally, sonar help prints some useful help.
Compilation and installation
In principle you just do this:
- Make sure you have Rust installed (I install Rust through
rustup) - Clone this project
- If building with Kafka support (the default), you must have the OpenSSL development libraries installed, as noted here. On Ubuntu, this is libssl-dev, on Fedora it is openssl-devel.
- Build it:
cargo build --release - The binary is then located at
target/release/sonar - Copy it to wherever it needs to be
In practice it is a little harder:
- The binutils you have need to be new enough for the assembler to understand
--gdwarf5(for Kafka) and some other things (to link the GPU probe libraries) - Some of the tests in
util/(if you are going to be running those) requirego
Some distros, notably RHEL8, have binutils that are too old, you can check by running e.g. as --version,
the major version number is also the version number of binutils. Binutils 2.32 are new
enough for the GPU probe libraries but may not be new enough for Kafka. Binutils 2.40 are known to
work for both. Also see comments in gpuapi/Makefile.
Output format options
There are two output formats, the old format and the new format, currently coexisting but the old format will be phased out.
The recommended output format is the "new" JSON format. Use the command line switch --json with
all commands to force this format. Most subcommands currently default to either CSV or an older
JSON format, but in daemon mode, only the new format is available.
Collect processes with sonar ps
It's sensible to run sonar ps every 5 minutes on every compute node.
```console $ sonar ps --help
...
Options for ps:
--rollup
Merge process records that have the same job ID and command name (on systems
with stable job IDs only)
--min-cpu-time seconds
Include records for jobs that have used at least this much CPU time
[default: none]
--exclude-system-jobs
Exclude records for system jobs (uid < 1000)
--exclude-users user,user,...
Exclude records whose users match these names [default: none]
--exclude-commands command,command,...
Exclude records whose commands start with these names [default: none]
--min-cpu-percent percentage
Include records for jobs that have on average used at least this
percentage of CPU, NOTE THIS IS NONMONOTONIC [default: none]
--min-mem-percent percentage
Include records for jobs that presently use at least this percentage of
real memory, NOTE THIS IS NONMONOTONIC [default: none]
--lockdir directory
Create a per-host lockfile in this directory and exit early if the file
exists on startup [default: none]
--load
Print per-cpu and per-gpu load data
--json
Format output as new JSON, not CSV
--cluster name
Optional cluster name (required for --json) with which to tag output
```
NOTE that if you use --lockdir, it should name a directory that is cleaned on reboot, such as
/var/run, /run, or a tmpfs, and ideally it is a directory on a disk local to the node, not a
shared disk.
Here is an example output (with the default older CSV output format): ```console $ sonar ps --exclude-system-jobs --min-cpu-time=10 --rollup
v=0.7.0,time=2023-08-10T11:09:41+02:00,host=somehost,cores=8,user=someone,job=0,cmd=fish,cpu%=2.1,cpukib=64400,gpus=none,gpu%=0,gpumem%=0,gpukib=0,cputimesec=138 v=0.7.0,time=2023-08-10T11:09:41+02:00,host=somehost,cores=8,user=someone,job=0,cmd=sonar,cpu%=761,cpukib=372,gpus=none,gpu%=0,gpumem%=0,gpukib=0,cputimesec=137 v=0.7.0,time=2023-08-10T11:09:41+02:00,host=somehost,cores=8,user=someone,job=0,cmd=brave,cpu%=14.6,cpukib=2907168,gpus=none,gpu%=0,gpumem%=0,gpukib=0,cputimesec=3532 v=0.7.0,time=2023-08-10T11:09:41+02:00,host=somehost,cores=8,user=someone,job=0,cmd=alacritty,cpu%=0.8,cpukib=126700,gpus=none,gpu%=0,gpumem%=0,gpukib=0,cputimesec=51 v=0.7.0,time=2023-08-10T11:09:41+02:00,host=somehost,cores=8,user=someone,job=0,cmd=pulseaudio,cpu%=0.7,cpukib=90640,gpus=none,gpu%=0,gpumem%=0,gpukib=0,cputimesec=399 v=0.7.0,time=2023-08-10T11:09:41+02:00,host=somehost,cores=8,user=someone,job=0,cmd=slack,cpu%=3.9,cpukib=716924,gpus=none,gpu%=0,gpumem%=0,gpukib=0,cputimesec=266 ```
Collect system information with sonar sysinfo
The sysinfo subcommand collects information about the system and prints it in JSON form on stdout
(this is the default older JSON format):
console
$ sonar sysinfo
{
"timestamp": "2024-02-26T00:00:02+01:00",
"hostname": "ml1.hpc.uio.no",
"description": "2x14 (hyperthreaded) Intel(R) Xeon(R) Gold 5120 CPU @ 2.20GHz, 125 GB, 3x NVIDIA GeForce RTX 2080 Ti @ 11GB",
"cpu_cores": 56,
"mem_gb": 125,
"gpu_cards": 3,
"gpumem_gb": 33
}
Typical usage for sysinfo is to run the command after reboot and (for hot-swappable systems and
VMs) once every 24 hours, and to aggregate the information in some database.
The sysinfo subcommand currently has no options.
Collecting job information with sonar slurm (incomplete)
To be written.
This command exists partly to allow clusters to always push data, partly to collect the data for long-term storage, partly to offload the Slurm database manager during query processing.
Collecting partition and node information with sonar cluster (incomplete)
To be written.
This command exists partly to allow clusters to always push data, partly to collect the data for long-term storage.
Collect and analyze results
Sonar data are used by two other tools:
- JobAnalyzer allows Sonar logs to be queried and analyzed, and provides dashboards, interactive and batch queries, and reporting of system activity, policy violations, hung jobs, and more. It is under active development.
- JobGraph provides high-level plots of system activity. Mapping files for JobGraph can be found in the data folder. Its development has been dormant for some time.
Versions and release procedures
We use semantic versioning. The major version is expected to remain at zero for the foreseeable future, reflecting the experimental nature of Sonar.
At the time of writing we require:
- 2021 edition of Rust
- Rust 1.72.1 (can be found with cargo msrv find)
For all other versioning information, see doc/VERSIONING.md.
Authors
- Radovan Bast
- Mathias Bockwoldt
- Lars T. Hansen
- Henrik Rojas Nagel
How we run sonar on a cluster
See doc/HOWTO-DEPLOY.md.
Similar and related tools (incomplete)
- Reference implementation which serves as inspiration: https://github.com/UNINETTSigma2/appusage
- TACC Stats
- Ganglia Monitoring System
Owner
- Name: NordicHPC
- Login: NordicHPC
- Kind: organization
- Website: https://nordichpc.github.io
- Repositories: 10
- Profile: https://github.com/NordicHPC
Collaboration of Nordic computing facility staff and friends.
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: "sonar: Tool to profile usage of HPC resources by regularly probing processes using ps."
message: "If you use this software, please cite it as below."
type: software
authors:
- given-names: Radovan
family-names: Bast
email: radovan.bast@uit.no
affiliation: UiT The Arctic University of Norway
orcid: 'https://orcid.org/0000-0003-4498-3806'
- given-names: Mathias
family-names: Bockwoldt
- given-names: Lars T.
family-names: Hansen
- given-names: Henrik Rojas
family-names: Nagel
repository-code: 'https://github.com/NordicHPC/sonar'
url: 'https://github.com/NordicHPC/sonar'
keywords:
- profiling
- high-performance computing
- processes
- resource usage
license: GPL-3.0
version: '0.6.2'
GitHub Events
Total
- Create event: 38
- Release event: 1
- Issues event: 110
- Watch event: 5
- Delete event: 28
- Member event: 1
- Issue comment event: 229
- Push event: 112
- Pull request review event: 111
- Pull request review comment event: 104
- Pull request event: 105
Last Year
- Create event: 38
- Release event: 1
- Issues event: 110
- Watch event: 5
- Delete event: 28
- Member event: 1
- Issue comment event: 229
- Push event: 112
- Pull request review event: 111
- Pull request review comment event: 104
- Pull request event: 105
Committers
Last synced: almost 3 years ago
All Time
- Total Commits: 308
- Total Committers: 2
- Avg Commits per committer: 154.0
- Development Distribution Score (DDS): 0.081
Top Committers
| Name | Commits | |
|---|---|---|
| Radovan Bast | b****t@u****m | 283 |
| Mathias Bockwoldt | m****t@u****o | 25 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 174
- Total pull requests: 252
- Average time to close issues: about 2 months
- Average time to close pull requests: 7 days
- Total issue authors: 3
- Total pull request authors: 5
- Average comments per issue: 1.71
- Average comments per pull request: 1.34
- Merged pull requests: 208
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 91
- Pull requests: 133
- Average time to close issues: 25 days
- Average time to close pull requests: 6 days
- Issue authors: 3
- Pull request authors: 3
- Average comments per issue: 0.78
- Average comments per pull request: 1.36
- Merged pull requests: 104
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- lars-t-hansen (152)
- bast (21)
- tveito (1)
Pull Request Authors
- lars-t-hansen (201)
- bast (30)
- mathiasbockwoldt (14)
- 2maz (5)
- benteb (2)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 5
-
Total downloads:
- pypi 771 last-month
-
Total dependent packages: 0
(may contain duplicates) -
Total dependent repositories: 10
(may contain duplicates) - Total versions: 33
- Total maintainers: 1
proxy.golang.org: github.com/nordichpc/sonar/util/formats
- Documentation: https://pkg.go.dev/github.com/nordichpc/sonar/util/formats#section-documentation
- License: other
-
Latest release: v0.14.0
published 5 months ago
Rankings
proxy.golang.org: github.com/NordicHPC/sonar/util/formats
- Homepage: https://github.com/NordicHPC/sonar
- Documentation: https://pkg.go.dev/github.com/NordicHPC/sonar/util/formats#section-documentation
- License: GPL-3.0
-
Latest release: v0.14.0
published 5 months ago
Rankings
proxy.golang.org: github.com/NordicHPC/sonar
- Documentation: https://pkg.go.dev/github.com/NordicHPC/sonar#section-documentation
- License: other
-
Latest release: v0.14.0
published 5 months ago
Rankings
proxy.golang.org: github.com/nordichpc/sonar
- Documentation: https://pkg.go.dev/github.com/nordichpc/sonar#section-documentation
- License: other
-
Latest release: v0.14.0
published 5 months ago
Rankings
pypi.org: sonar
Sonar: Tool to profile usage of HPC resources by regularly probing processes
- Homepage: https://github.com/NordicHPC/sonar
- Documentation: https://sonar.readthedocs.io/
- License: GNU General Public License v3 (GPLv3)
-
Latest release: 0.5.0
published almost 4 years ago
Rankings
Maintainers (1)
Dependencies
- actions/checkout v2 composite
- android_system_properties 0.1.5
- autocfg 1.1.0
- bitflags 1.3.2
- bumpalo 3.12.0
- cc 1.0.79
- cfg-if 1.0.0
- chrono 0.4.23
- clap 4.1.4
- clap_derive 4.1.0
- clap_lex 0.3.1
- codespan-reporting 0.11.1
- core-foundation-sys 0.8.3
- cxx 1.0.88
- cxx-build 1.0.88
- cxxbridge-flags 1.0.88
- cxxbridge-macro 1.0.88
- errno 0.2.8
- errno-dragonfly 0.1.2
- heck 0.4.0
- hermit-abi 0.2.6
- hostname 0.3.1
- iana-time-zone 0.1.53
- iana-time-zone-haiku 0.1.1
- io-lifetimes 1.0.4
- is-terminal 0.4.2
- js-sys 0.3.60
- libc 0.2.139
- link-cplusplus 1.0.8
- linux-raw-sys 0.1.4
- log 0.4.17
- match_cfg 0.1.0
- num-integer 0.1.45
- num-traits 0.2.15
- num_cpus 1.15.0
- once_cell 1.17.0
- os_str_bytes 6.4.1
- proc-macro-error 1.0.4
- proc-macro-error-attr 1.0.4
- proc-macro2 1.0.50
- quote 1.0.23
- rustix 0.36.7
- scratch 1.0.3
- strsim 0.10.0
- subprocess 0.2.9
- syn 1.0.107
- termcolor 1.2.0
- time 0.1.45
- unicode-ident 1.0.6
- unicode-width 0.1.10
- version_check 0.9.4
- wait-timeout 0.2.0
- wasi 0.10.0+wasi-snapshot-preview1
- wasm-bindgen 0.2.83
- wasm-bindgen-backend 0.2.83
- wasm-bindgen-macro 0.2.83
- wasm-bindgen-macro-support 0.2.83
- wasm-bindgen-shared 0.2.83
- winapi 0.3.9
- winapi-i686-pc-windows-gnu 0.4.0
- winapi-util 0.1.5
- winapi-x86_64-pc-windows-gnu 0.4.0
- windows-sys 0.42.0
- windows_aarch64_gnullvm 0.42.1
- windows_aarch64_msvc 0.42.1
- windows_i686_gnu 0.42.1
- windows_i686_msvc 0.42.1
- windows_x86_64_gnu 0.42.1
- windows_x86_64_gnullvm 0.42.1
- windows_x86_64_msvc 0.42.1