prmon

Standalone monitor for process resource consumption

https://github.com/hsf/prmon

Science Score: 77.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
    2 of 9 committers (22.2%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.6%) to scientific vocabulary

Keywords

cmake cpp high-energy-physics process-monitor python scientific-computing
Last synced: 6 months ago · JSON representation ·

Repository

Standalone monitor for process resource consumption

Basic Info
  • Host: GitHub
  • Owner: HSF
  • License: apache-2.0
  • Language: C++
  • Default Branch: main
  • Size: 863 KB
Statistics
  • Stars: 49
  • Watchers: 12
  • Forks: 22
  • Open Issues: 4
  • Releases: 15
Topics
cmake cpp high-energy-physics process-monitor python scientific-computing
Created almost 8 years ago · Last pushed about 1 year ago
Metadata Files
Readme License Code of conduct Citation Authors Zenodo

README.md

Process Monitor (prmon)

All Contributors <!-- ALL-CONTRIBUTORS-BADGE:END -->

Build Status License Codefactor HSF Project

DOI

The PRocess MONitor is a small stand alone program that can monitor the resource consumption of a process and its children. This is useful in the context of the WLCG/HSF working group to evaluate the costs and performance of HEP workflows in WLCG. In a previous incarnation (MemoryMonitor) it has been used by ATLAS for some time to gather data on resource consumption by production jobs. One of its most useful features is to use smaps to correctly calculate the Proportional Set Size in the group of processes monitored, which is a much better indication of the true memory consumption of a group of processes where children share many pages.

prmon currently runs on Linux machines as it requires access to the /proc interface to process statistics.

Build and Deployment

Cloning the project

As prmon has dependencies on submodules, clone the project as

git clone --recurse-submodules https://github.com/HSF/prmon.git

Building the project

Building prmon requires a C++ compiler that fully supports C++11, and CMake version 3.3 or higher. It also has dependencies on:

and can use either external system-supplied versions or internal copies provided by submodules.

Building is usually as simple as:

mkdir build
cd build
cmake -DCMAKE_INSTALL_PREFIX=<installdir> -S .. -B .
make -j<number of cores on your machine>
make install

Unless otherwise specified, the default behavior for dependencies is to first try to find an external version and fall back to the internal submodule copy if not found. To explicitly force the use of either add any of the following configure options: - -DUSE_EXTERNAL_NLOHMANN_JSON={TRUE,FALSE,AUTO} - ON, TRUE: Force an external version and fail if not found. - OFF, FALSE: Require the internal copy be used. - AUTO: Search for an external version and fall back to the internal copy if not found. - -Dnlohmann_json_DIR=/path/to/config - The path to the directory containing nlohmann_jsonConfig.cmake. Necessary if nlohmannjson is not installed into CMake's search path. - `-DUSEEXTERNALSPDLOG={TRUE,FALSE,**AUTO**} -ON,TRUE: Force an external version and fail if not found. -OFF,FALSE: Require the internal copy be used. - **AUTO**: Search for an external version and fall back to the internal copy if not found. --DspdlogDIR=/path/to/config - The path to the directory containingspdlogConfig.cmake`. Necessary if spdlog is not installed into CMake's search path.

The option -DCMAKE_BUILD_TYPE can switch between all of the standard build types. The default is Release; use RelWithDebInfo if you want debug symbols.

To build a statically linked version of prmon, set the BUILD_STATIC CMake variable to ON (e.g., adding -DBUILD_STATIC=ON to the command line).

Note that in a build environment with CVMFS available the C++ compiler and CMake can be taken by setting up a recent LCG release.

To enable pulling and building the gtest framework as well as tests dependent on gtest, build with -DBUILD_GTESTS=ON.

Creating a package with CPack

A cpack based package can be created by invoking

make package

Running the tests

To run the tests of the project, first build it and then invoke

make test

Running the tests requires Python version 3.6 or higher.

Running

The prmon binary is invoked with the following arguments:

sh prmon [--pid PPP] [--filename prmon.txt] [--json-summary prmon.json] \ [--log-filename prmon.log] [--interval 30] \ [--suppress-hw-info] [--units] [--netdev DEV] \ [--disable MON1] [--level LEV] [--level MON:LEV] \ [--fast-memmon] \ [-- prog arg arg ...]

  • --pid the 'mother' PID to monitor (all children in the same process tree are monitored as well)
  • --filename output file for time-stamped monitored values
  • --json-summmary output file for summary data written in JSON format
  • --log-filename output file for log messages
  • --interval time, in seconds, between monitoring snapshots
  • --suppress-hw-info flag that turns-off hardware information collection
  • --units add information on units for each metric to JSON file
  • --netdev restricts network statistics to one (or more) network devices
  • --disable is used to disable specific monitors (and can be specified multiple times); the default is that prmon monitors everything that it can
    • Note that the wallmon monitor is the only monitor that cannot be disabled
  • --level is used to set the logging level for monitors
    • --level LEV sets the level for all monitors to LEV
    • --level MON:LEV sets the level for monitor MON to LEV
    • The valid levels are trace, debug, info, warn, error, critical
  • --fast-memmon toggles on fast memory monitoring using smaps_rollup
  • -- after this argument the following arguments are treated as a program to invoke and remaining arguments are passed to it; prmon will then monitor this process instead of being given a PID via --pid

prmon will exit with 1 if there is a problem with inconsistent or incomplete arguments. If prmon starts a program itself (using --) then prmon will exit with the exit code of the child process.

When invoked with -h or --help usage information is printed, as well as a list of all available monitoring components.

Fast Memory Monitoring

When invoked with --fast-memmon prmon uses the smaps_rollup files that contain pre-summed memory information for each monitored process. This is a faster approach compared to the default behavior, where prmon aggregates the results itself by going over each of the monitored processes' mappings one by one.

If the current kernel doesn't support smaps_rollup then the default approach is used. Users should also note that fast memory monitoring might not contain all metrics that the default approach supports, e.g., vmem. In that case, the missing metric will be omitted in the output. If any of these issues are encountered, a relevant message is printed to notify the user.

Environment Variables

The PRMON_DISABLE_MONITOR environment variable can be used to specify a comma separated list of monitor names that will be disabled. This is useful when prmon is being invoked by some other part of a job or workflow, so the user does not have direct access to the command line options used. e.g.

sh export PRMON_DISABLE_MONITOR=nvidiamon other_code_that_invokes_prmon ...

Disables the nvidiamon monitor.

Outputs

In the filename output file, plain text with statistics written every interval seconds are written. The first line gives the column names.

In the json-summary file values for the maximum and average statistics are given in JSON format. This file is rewritten every interval seconds with the current summary values. Use the --units option to see exactly which units are used for each metric (the value of 1 for a unit means it is a pure number).

In the log-filename output file, log messages (e.g., errors, warnings etc.) are written.

Monitoring of CPU, I/O and memory is reliably accurate, at least to within the sampling time. Monitoring of network I/O is not reliable unless the monitored process is isolated from other processes performing network I/O (it gives an upper bound on the network activity, but the monitoring is per network device as Linux does not give per-process network data by default).

Visualisation

The prmon_plot.py script (Python3) can be used to plot the outputs of prmon from the timestamped output file (usually prmon.txt). Some examples include:

  • Memory usage as a function of wall-time: sh prmon_plot.py --input prmon.txt --xvar wtime --yvar vmem,pss,rss,swap --yunit GB

  • Rate of change in memory usage as a function of wall-time: sh prmon_plot.py --input prmon.txt --xvar wtime --yvar vmem,pss,rss,swap --diff --yunit MB

  • Rate of change in CPU usage as a function of wall-time with stacked user and system utilizations: sh prmon_plot.py --input prmon.txt --xvar wtime --yvar utime,stime --yunit SEC --diff --stacked

The plots above, as well as the input prmon.txt file that is used to produce them, can be found under the example-plots folder.

The script allows the user to specify variables, their units, plotting style (stacked vs overlaid), as well as the format of the output image. Use -h for more information.

Data Compression

The prmon_compress_output.py script (Python3) can be used to compress the output file while keeping the most relevant information.

The compression algorithm works as follows: * For the number of processes, threads, and GPUs, only the measurements that are different with respect to the previous ones are kept. * For all other metrics, only the measurements that satisfy an interpolation condition are kept.

This latter condition can be summarized as: * For any three neighboring (and time-ordered) measurements, A, B, and C, B is deleted if the linear interpolation between A and C is consistent with B threshold. Otherwise, it's retained. The threshold can be configured via the --precision parameter (default: 0.05, i.e. 5%)

The time index of the final output will be the union of the algorithm outputs of the single time series. Each series will have NA values where a point was deleted at a kept index and, unless otherwise specified by the --skip-interpolate parameter, will be linearly interpolated to maintain a consistent number of data points and the result will be rounded to the nearest integer for consistency with the original input.

If the --skip-interpolate parameter is passed, deleted values will be written as empty strings in the output file, and will be interpreted as NA values when imported into Pandas.

Example: sh prmon_compress_output.py --input prmon.txt --precision 0.3 --skip-interpolate

Feedback and Contributions

We're very happy to get feedback on prmon as well as suggestions for future development. Please have a look at our Contribution Guide.

Profiling

To build prmon with profiling, set one of the CMake variables PROFILE_GPROF or PROFILE_GPERFTOOLS to ON. This enables GNU prof profiling or gperftools profiling, respectively. If your gperftools are in a non-standard place, pass a hint to CMake using Gperftools_ROOT_DIR.

Copyright

Copyright (c) 2018-2025 CERN.

Contributors

Thanks goes to these wonderful people (emoji key):

Graeme A Stewart
Graeme A Stewart

Alaettin Serhan Mete
Alaettin Serhan Mete

Anubhab Das
Anubhab Das

Chuck Atkins
Chuck Atkins

Ismail Ryabchuk
Ismail Ryabchuk

Chris Burr
Chris Burr

Riccardo Maganza
Riccardo Maganza

Miguel Gila
Miguel Gila

Dan Protopopescu
Dan Protopopescu

This project follows the all-contributors specification. Contributions of any kind welcome!

Owner

  • Name: HEP Software Foundation
  • Login: HSF
  • Kind: organization
  • Email: hsf-coordination@googlegroups.com

The HEP Software Foundation facilitates coordination and common efforts in high energy physics (HEP) software and computing internationally.

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: prmon
date-released: "2018-06-08"
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Graeme Andrew
    family-names: Stewart
    email: graeme.andrew.stewart@cern.ch
    affiliation: CERN
    orcid: 'https://orcid.org/0000-0003-0182-7088'
  - given-names: Alaettin Serhan
    family-names: Mete
    email: alaettin.serhan.mete@cern.ch
    affiliation: Argonne National Laboratory
    orcid: 'https://orcid.org/0000-0002-5508-530X'
doi: 10.5281/zenodo.2554202

GitHub Events

Total
  • Create event: 12
  • Release event: 1
  • Issues event: 5
  • Watch event: 5
  • Delete event: 8
  • Issue comment event: 33
  • Push event: 29
  • Pull request review event: 4
  • Pull request event: 23
  • Fork event: 2
Last Year
  • Create event: 12
  • Release event: 1
  • Issues event: 5
  • Watch event: 5
  • Delete event: 8
  • Issue comment event: 33
  • Push event: 29
  • Pull request review event: 4
  • Pull request event: 23
  • Fork event: 2

Committers

Last synced: over 1 year ago

All Time
  • Total Commits: 141
  • Total Committers: 9
  • Avg Commits per committer: 15.667
  • Development Distribution Score (DDS): 0.489
Past Year
  • Commits: 11
  • Committers: 5
  • Avg Commits per committer: 2.2
  • Development Distribution Score (DDS): 0.455
Top Committers
Name Email Commits
Graeme A Stewart g****t@c****h 72
Alaettin Serhan Mete s****e@g****m 55
quantum-shift 6****t 8
Dan Protopopescu p****p@c****h 1
Oliver Freyermuth o****h@g****m 1
Riccardo Maganza r****a@g****m 1
Chris Burr c****r 1
Ismail Ryabchuk 9****2 1
Miguel Gila m****a 1
Committer Domains (Top 20 + Academic)
cern.ch: 2

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 47
  • Total pull requests: 80
  • Average time to close issues: 4 months
  • Average time to close pull requests: 26 days
  • Total issue authors: 10
  • Total pull request authors: 13
  • Average comments per issue: 2.83
  • Average comments per pull request: 1.8
  • Merged pull requests: 74
  • Bot issues: 0
  • Bot pull requests: 9
Past Year
  • Issues: 2
  • Pull requests: 11
  • Average time to close issues: 3 months
  • Average time to close pull requests: about 20 hours
  • Issue authors: 1
  • Pull request authors: 3
  • Average comments per issue: 13.0
  • Average comments per pull request: 0.36
  • Merged pull requests: 10
  • Bot issues: 0
  • Bot pull requests: 9
Top Authors
Issue Authors
  • graeme-a-stewart (26)
  • amete (10)
  • sciaba (3)
  • eduardo-rodrigues (2)
  • elmsheus (2)
  • amolhj (1)
  • qgl90 (1)
  • vrpascuzzi (1)
  • Aymane-Leyli (1)
  • nikoladze (1)
  • quantum-shift (1)
Pull Request Authors
  • graeme-a-stewart (31)
  • amete (26)
  • allcontributors[bot] (8)
  • quantum-shift (8)
  • olifre (2)
  • chuckatkins (1)
  • elmsheus (1)
  • raghvendra253 (1)
  • chrisburr (1)
  • miguelgila (1)
  • Cossack42 (1)
  • rmaganza (1)
  • CorentinBT (1)
Top Labels
Issue Labels
enhancement (16) question (7) bug (5) tests (4) documentation (3) externals (3) wontfix (1) Continuous Integration (1)
Pull Request Labels
bug (11) enhancement (11) tests (3) documentation (1) externals (1) question (1) work in progress (1)

Packages

  • Total packages: 1
  • Total downloads: unknown
  • Total dependent packages: 1
  • Total dependent repositories: 0
  • Total versions: 7
conda-forge.org: prmon

The PRocess MONitor is a small stand alone program that can monitor the resource consumption of a process and its children. This is useful in the context of the WLCG/HSF working group to evaluate the costs and performance of HEP workflows in WLCG. In a previous incarnation (MemoryMonitor) it has been used by ATLAS for sometime to gather data on resource consumption by production jobs. One of its most useful features is to use smaps to correctly calculate the Proportional Set Size in the group of processes monitored, which is a much better indication of the true memory consumption of a group of processes where children share many pages.

  • Versions: 7
  • Dependent Packages: 1
  • Dependent Repositories: 0
Rankings
Dependent packages count: 28.8%
Dependent repos count: 34.0%
Average: 34.6%
Forks count: 36.7%
Stargazers count: 39.1%
Last synced: 6 months ago

Dependencies

.github/workflows/build-source-release.yml actions
  • actions/checkout v2 composite
  • actions/upload-release-asset v1 composite
.github/workflows/ci.yml actions
  • actions/checkout v2 composite
.github/workflows/clang-format.yml actions
  • HSF/clang-format-action v0.4 composite
  • actions/checkout v2 composite
.github/workflows/flake8.yml actions
  • actions/checkout v2 composite