dfanalyzer

Layered data-flow analysis for HPC I/O.

https://github.com/llnl/dfanalyzer

Science Score: 52.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
    Organization llnl has institutional domain (software.llnl.gov)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.0%) to scientific vocabulary

Keywords

deep-learning hpc io-analysis multi-layer
Last synced: 6 months ago · JSON representation ·

Repository

Layered data-flow analysis for HPC I/O.

Basic Info
Statistics
  • Stars: 3
  • Watchers: 1
  • Forks: 2
  • Open Issues: 5
  • Releases: 0
Topics
deep-learning hpc io-analysis multi-layer
Created 9 months ago · Last pushed 6 months ago
Metadata Files
Readme Contributing License Citation

README.md

Data Flow Analyzer

Overview

DFAnalyzer is an open-source tool for analyzing performance data from large-scale workflows on distributed systems. It presents a hierarchical, layer-by-layer summary of an application's execution, from high-level application events down to low-level POSIX calls. For each layer, DFAnalyzer quantifies time, operation counts, and data volume, and calculates key performance metrics like bandwidth and operations per second. It also visualizes the overlap between different layers, helping to characterize and understand complex I/O and compute patterns.

Installation

To install DFAnalyzer through pip (recommended for most users):

```bash

Ensure runtime dependencies for optional features (e.g., Darshan, Recorder) are installed.

This might involve using your system's package manager or a tool like Spack.

Example using Spack to prepare the environment:

spack -e tools install

pip install dfanalyzer ```

To install DFAnalyzer from source (for developers or custom builds):

```bash

1. Install system dependencies:

Refer to the "Install system dependencies" step in .github/workflows/ci.yml

(e.g., build-essential, cmake, libarrow-dev, libhdf5-dev, ninja-build, etc.).

Alternatively, tools like Spack can help manage these:

# spack -e tools install

module load ninja

2. Install Python build dependencies:

python -m pip install --upgrade pip meson-python setuptools wheel

3. Install DFAnalyzer from the root of this repository:

The following command includes optional C++ components (tests and tools).

The --prefix argument is optional and specifies the installation location.

pip install -e . \ -Csetup-args="--prefix=$HOME/.local" \ -Csetup-args="-Denabletests=true" \ -Csetup-args="-Denabletools=true"

(Optional) Install dependencies for running tests if you plan to contribute or run local tests:

pip install -r tests/requirements.txt

```

Usage

Here's an example of how to run DFAnalyzer using sample data included in the repository:

```bash

Before running, ensure the sample data is extracted.

For example, to extract the 'dftracer-dlio' sample used below:

mkdir -p tests/data/extracted

tar -xzf tests/data/dftracer-dlio.tar.gz -C tests/data/extracted

dfanalyzer analyzer/preset=dlio tracepath=tests/data/extracted/dftracer-dlio viewtypes=[time_range] ```

This command analyzes the traces and prints a high-level summary of the application's execution. Below is a sample of the "Time Period Summary" output:

bash Time Period Summary ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓ ┃ Metric ┃ Unit ┃ Value ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━┩ │ Job Time │ seconds │ 56.695 │ │ Total Count │ count │ 15,901 │ │ Total Files │ count │ 87 │ │ Total Nodes │ count │ 0 │ │ Total Processes │ count │ 23 │ │ App Count │ count │ 8 │ │ Training Count │ count │ 40 │ │ Compute Count │ count │ 200 │ │ Fetch Data Count │ count │ 160 │ │ Data Loader Count │ count │ 808 │ │ Data Loader Fork Count │ count │ 96 │ │ Reader Count │ count │ 4,008 │ │ Reader POSIX (Lustre) Count │ count │ 10,432 │ │ Reader POSIX (Lustre) Size │ MB │ 111833.161 │ │ Reader POSIX (Lustre) Bandwidth │ MB/s │ 874.982 │ │ Reader POSIX (Lustre) Avg Transfer Size │ MB │ 10.720 │ │ Checkpoint Count │ count │ 8 │ │ Checkpoint POSIX (Lustre) Count │ count │ 45 │ │ Checkpoint POSIX (Lustre) Size │ MB │ 0.011 │ │ Checkpoint POSIX (Lustre) Bandwidth │ MB/s │ 0.791 │ │ Checkpoint POSIX (Lustre) Avg Transfer Size │ MB │ 0.000 │ │ Other POSIX Count │ count │ 96 │ └───────────────────────────────────────────────────────────────────────────────┴────────────────┴────────────────────┘

DFAnalyzer also provides a detailed breakdown of performance metrics for each layer of the application. Here is a snippet of the "Layer Breakdown" section from the same run, which includes the percentage of time each layer overlaps with its parent layer:

bash Layer Breakdown (w/ overlap %) ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓ ┃ Layer ┃ Time (s) ┃ Ops ┃ Ops/sec ┃ Size (MB) ┃ Bandwidth (MB/s) ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩ │ App │ 441.967 (----) │ 8 (----) │ 0.018 │ - │ - │ │ Training │ 439.442 (----) │ 40 (----) │ 0.091 │ - │ - │ │ Compute │ 272.356 (----) │ 200 (----) │ 0.734 │ - │ - │ │ Fetch Data │ 126.179 ( 16%) │ 160 ( 25%) │ 1.268 │ - │ - │ │ Data Loader │ 151.471 ( 45%) │ 808 ( 46%) │ 5.334 │ - │ - │ │ Data Loader Fork │ 2.392 ( 0%) │ 96 ( 0%) │ 40.135 │ - │ - │ │ Reader │ 299.992 ( 40%) │ 4,008 ( 51%) │ 13.360 │ - │ - │ │ Reader POSIX (Lustre) │ 127.812 ( 45%) │ 10,432 ( 48%) │ 81.620 │ 111833.161 ( 46%) │ 874.982 │ │ Checkpoint │ 0.014 ( 0%) │ 8 ( 0%) │ 571.551 │ - │ - │ │ Checkpoint POSIX (Lustre) │ 0.014 ( 0%) │ 45 ( 0%) │ 3268.686 │ 0.011 ( 0%) │ 0.791 │ │ Other POSIX │ 2.392 ( 0%) │ 96 ( 0%) │ 40.135 │ 0.000 (----) │ - │ └─────────────────────────────┴──────────────────┴────────────────┴───────────┴────────────────────┴──────────────────┘

Further Information

For more details, to report issues, or to contribute to DFAnalyzer, please refer to the following resources:

  • Official DFAnalyzer Documentation: For detailed usage, configuration options, and information about analyzers.
  • Issue Tracker: To report bugs or suggest new features.
  • Contributing Guidelines: For information on how to contribute to the project, including setting up a development environment and coding standards.
  • Citation File: If you use DFAnalyzer in your research, please cite it using the information in this file.

Acknowledgments

This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research under the DOE Early Career Research Program (LLNL-CONF-862440). Also, this research is supported in part by the National Science Foundation (NSF) under Grants OAC-2104013, OAC-2313154, and OAC-2411318.

Owner

  • Name: Lawrence Livermore National Laboratory
  • Login: LLNL
  • Kind: organization
  • Email: github-admin@llnl.gov
  • Location: Livermore, CA, USA

For over 70 years, the Lawrence Livermore National Laboratory has applied science and technology to make the world a safer place.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite the software and the paper."
title: "WisIO: Automated I/O Bottleneck Detection with Multi-Perspective Views for HPC Workflows"
version: 0.1.0
abstract: "Analyze, visualize, and understand I/O performance issues in HPC workflows."
license: MIT
url: https://github.com/grc-iit/wisio
repository-code: https://github.com/grc-iit/wisio
contact:
  - name: Izzet Yildirim
    email: izzetcyildirim@gmail.com
authors:
  - family-names: Yildirim
    given-names: Izzet
    orcid: https://orcid.org/0000-0003-3513-0764
  - family-names: Devarajan
    given-names: Hariharan
    orcid: https://orcid.org/0000-0001-5625-3494
  - family-names: Kougkas
    given-names: Anthony
    orcid: https://orcid.org/0000-0003-3943-663X
  - family-names: Sun
    given-names: Xian-He
    orcid: https://orcid.org/0000-0002-1093-0792
  - family-names: Mohror
    given-names: Kathryn
    orcid: https://orcid.org/0000-0002-1366-1655
preferred-citation:
  type: conference-paper
  title: "WisIO: Automated I/O Bottleneck Detection with Multi-Perspective Views for HPC Workflows"
  year: 2025
  authors:
    - family-names: Yildirim
      given-names: Izzet
      orcid: https://orcid.org/0000-0003-3513-0764
    - family-names: Devarajan
      given-names: Hariharan
      orcid: https://orcid.org/0000-0001-5625-3494
    - family-names: Kougkas
      given-names: Anthony
      orcid: https://orcid.org/0000-0003-3943-663X
    - family-names: Sun
      given-names: Xian-He
      orcid: https://orcid.org/0000-0002-1093-0792
    - family-names: Mohror
      given-names: Kathryn
      orcid: https://orcid.org/0000-0002-1366-1655
  conference:
    name: "ICS'25: 2025 International Conference on Supercomputing"
    city: "Salt Lake City"
    region: UT
    country: USA
    date-start: 2025-06-08
    date-end: 2025-06-11
  doi: 10.1145/3721145.3725742
  url: https://doi.org/10.1145/3721145.3725742

GitHub Events

Total
  • Issues event: 6
  • Watch event: 3
  • Issue comment event: 9
  • Member event: 3
  • Push event: 25
  • Pull request review event: 18
  • Pull request review comment event: 28
  • Pull request event: 25
  • Fork event: 2
  • Create event: 3
Last Year
  • Issues event: 6
  • Watch event: 3
  • Issue comment event: 9
  • Member event: 3
  • Push event: 25
  • Pull request review event: 18
  • Pull request review comment event: 28
  • Pull request event: 25
  • Fork event: 2
  • Create event: 3

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 5
  • Total pull requests: 17
  • Average time to close issues: 14 days
  • Average time to close pull requests: 3 days
  • Total issue authors: 2
  • Total pull request authors: 2
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.47
  • Merged pull requests: 9
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 5
  • Pull requests: 17
  • Average time to close issues: 14 days
  • Average time to close pull requests: 3 days
  • Issue authors: 2
  • Pull request authors: 2
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.47
  • Merged pull requests: 9
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • hariharan-devarajan (4)
  • rayandrew (1)
Pull Request Authors
  • izzet (15)
  • rayandrew (1)
Top Labels
Issue Labels
Pull Request Labels
enhancement (3)

Dependencies

Dockerfile docker
  • ubuntu 22.04 build
pyproject.toml pypi
  • dask [bag,dataframe,distributed]~=2023.4.0
  • dask_jobqueue ~=0.8.0
  • hydra-core ~=1.3.0
  • inflect ==7.0
  • jinja2 >=3.0
  • matplotlib >=3.6.0
  • numpy ==1.24.3
  • pandas >=2.0
  • portion >=2.4.0
  • pyarrow >=13
  • pyyaml >=5.4
  • rich ==13.6.0
  • scikit-learn ~=1.3.0
  • scipy ~=1.10.0
  • strenum >=0.4
  • venn ==0.1.3
  • zindex_py ==0.0.5