masters-thesis

Monitoring parallel file system usage in a high-performance computer cluster

https://github.com/jaantollander/masters-thesis

Science Score: 18.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.5%) to scientific vocabulary

Keywords

computer-cluster exploratory-data-analysis high-performance-computing io-behavior lustre monitoring-computer-systems observability parallel-file-system time-series-analysis
Last synced: 6 months ago · JSON representation ·

Repository

Monitoring parallel file system usage in a high-performance computer cluster

Basic Info
Statistics
  • Stars: 0
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
computer-cluster exploratory-data-analysis high-performance-computing io-behavior lustre monitoring-computer-systems observability parallel-file-system time-series-analysis
Created over 3 years ago · Last pushed almost 3 years ago
Metadata Files
Readme License Citation

README.md

Master's Thesis

Title\ Monitoring parallel file system usage in a high-performance computer cluster

Author\ Jaan Tollander de Balsch

Supervisor\ Prof. Petteri Kaski

Advisor\ Dr. Sami Ilvonen

Degreeprogram\ Computer, Communication and Information Sciences

Major\ Computer Science

Keywords\ monitoring computer systems, observability, computer cluster, high-performance computing, parallel file system, Lustre, I/O behavior, time series analysis, exploratory data analysis

License\ This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

URN\ http://urn.fi/URN:NBN:fi:aalto-202303262552

Download the thesis (PDF)

Abstract

Many high-performance computer clusters, rely on a system-wide, shared, parallel file system for large storage capacity and bandwidth. A shared file system is available across the entire system, making it user-friendly but prone to problems from heavy use. Such use can cause congestion and slow down or even halt the whole system, harming all users who use the parallel file system. In this thesis, we investigate whether monitoring file system usage in a production system at CSC can help identify the causes of slowdowns, such as specific users or jobs. The long-goal at CSC is to build an automatic, real-time monitoring and warning system that system administrators can use to make decisions on alleviating the slowdowns. Specifically, we monitor the usage of the Lustre parallel file system with Lustre Jobstats feature in the Puhti cluster, which is a petascale cluster with a diverse user base. We explain the necessary details of the Puhti cluster and our monitoring system to understand the Lustre file system usage data. During the thesis, we discovered issues in the data quality from Lustre Jobstats. The issues affected identifiers in the data, making some data unreliable and limiting our ability to build an automatic, real-time analysis. Nevertheless, we obtained a feasible data set for explorative data analysis. We demonstrate 24 hours of monitoring data by visually demonstrating file system usage patterns at low and high-level. Furthermore, we show that we can use file system usage data to identify causes of relative changes in I/O trends, particularly large relative increases. Finally, we explore ideas for future work on monitoring file system usage with reliable data from longer periods.

Usage

The thesis shell script convert the Markdown content to PDF via LaTeX. It depends on the pandoc, texlive, texlive-latex-extra, texlive-lang-european and rsvg-convert. We can build the various documents format using the thesis script with the following arguments.

bash ./thesis pdf

We can use the preview for automatically running a build command if files in metadata or content files change. It depends on inotify-tools.

bash ./thesis preview pdf

Owner

  • Name: Jaan Tollander de Balsch
  • Login: jaantollander
  • Kind: user
  • Location: Finland
  • Company: CSC - IT Center for Science

Citation (CITATION.bib)

@mastersthesis{jaan2023,
  title={Monitoring parallel file system usage in a high-performance computer cluster},
  author={Tollander de Balsch, Jaan},
  year={2023},
  month={February},
  address={Espoo, Finland},
  school={Aalto University},
}

GitHub Events

Total
Last Year

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 502
  • Total Committers: 2
  • Avg Commits per committer: 251.0
  • Development Distribution Score (DDS): 0.022
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Jaan Tollander de Balsch j****h@c****i 491
Jaan Tollander de Balsch j****n@h****m 11
Committer Domains (Top 20 + Academic)
hey.com: 1 csc.fi: 1

Issues and Pull Requests

Last synced: 9 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels