preparing-your-mainframe-data-for-machine-learning

Mainframe Data Wrangling: Preparing Your Mainframe Data for Machine Learning

https://github.com/joshuapowell/preparing-your-mainframe-data-for-machine-learning

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.7%) to scientific vocabulary

Keywords

api-usability data-engineering data-wrangling large-scale-computing machine-learning mainframe
Last synced: 4 months ago · JSON representation ·

Repository

Mainframe Data Wrangling: Preparing Your Mainframe Data for Machine Learning

Basic Info
  • Host: GitHub
  • Owner: joshuapowell
  • License: apache-2.0
  • Language: TeX
  • Default Branch: main
  • Homepage:
  • Size: 178 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
api-usability data-engineering data-wrangling large-scale-computing machine-learning mainframe
Created about 1 year ago · Last pushed 9 months ago
Metadata Files
Readme License Citation

README.md

Mainframe Data Wrangling: Preparing Your Data for Use in Machine Learning Models

Mainframe computing continues to drive the global economy, with forty-five of the world's top fifty banks [1] handling critical transaction data through the IBM Z mainframe platform. While recent research highlights the importance of mainframe modernization rather than replacement [2], enterprises struggle to effectively utilize mainframe data for automation and optimization due to data-driven and communication-driven failures [3]. This challenge creates a significant gap between available mainframe capabilities and realized business value [4].

To address this gap, we conducted semi-structured interviews with eighteen participants across three roles: mainframe subject matter experts (SME) [n=6], mainframe individual contributor end-users [n=9], and mainframe people manager end-users [n=3]. The study, conducted between [dates], investigated two research questions: [RQ1] What are the primary use cases in which mainframe network and network security data impacts mean-time-to-resolution (MTTR) in top fifty banks? And [RQ2] What mainframe data sources and methods do end-users employ to resolve these network and network security issues?

Analysis revealed that 91% of participants [5] encountered data quality or completeness issues that impeded network problem resolution. This tutorial demonstrates how end-users can leverage exploratory data analysis techniques [6], mainframe data APIs [7], and open source data science tools [8] to prepare data for advanced analytics and machine learning applications. The presented methodology aims to reduce MTTR by addressing identified data-driven and communication-driven failure points [9], with specific focus on network security use cases [10-13].

Keywords: Data Engineering, Data Wrangling, API Usability, API Onboarding, Mainframe, Large-scale Computing, Machine Learning

References

  1. IBM (International Business Machines Corporation). (2023). IBM 2023 Annual Report.
  2. Wishart-Smith, H. (2024, November 13). Mainframes: the backbone of the worldwide economy. Forbes.
  3. Ryseff, J., de Bruhl, B., & Newberry, S. J. (2024). The Root Causes of Failure for Artificial Intelligence Projects and How They Can Succeed: Avoiding the Anti-Patterns of AI.
  4. IBM z/OS operating system. (Accessed: October 28, 2024). https://www.ibm.com/products/zos
  5. Mcgregor, S. E. (2022). Practical Python Data Wrangling and Data Quality. http://oreilly.com
  6. Powell, J.I., Broadcom Mainframe Software Division, Internal Study, April 2024
  7. Alam, A., Bales, R., Dumir, V., Kunze, N., Li, J., Mishra, S., Rivera, E., Wan, M., & Yu, Y. (2024). Turning Data into Insight with Machine Learning for IBM z/OS (First). International Business Machines Corporation.
  8. Broadcom Mainframe Developer Portal. (Accessed: October 28, 2024). https://integration.mainframe.broadcom.com/
  9. Harrell, M. (2024). Mainframe Application Developer Study.
  10. Kanvar, V., Tamilselvam, S., & Raghunath, K. N. (2024, August 8). Enabling communication via APIs for mainframe applications. arXiv.org. https://arxiv.org/abs/2408.04230
  11. Dau, A. T., V., Dao, H. T., Nguyen, A. T., Tran, H. T., Nguyen, P. X., & Bui, N. D. Q. (2024, August 5). XMainframe: a large language model for mainframe modernization. arXiv.org. https://arxiv.org/abs/2408.04660
  12. Raju, J., Modernizing Mainframe Workloads in Banking: Embracing the Power of Hyperscalers, International Journal of Computer Engineering and Technology (IJCET), 15(5), 2024, pp. 366-374.
  13. Raju, J., AI-Driven Transformation of Mainframe Environments: A Comprehensive Framework for Operational Resilience, International Journal of Engineering and Technology Research (IJETR), 9(2), 2024, pp. 420--433.

Owner

  • Name: Joshua Powell
  • Login: joshuapowell
  • Kind: user
  • Location: Pittsburgh, PA
  • Company: @broadcom

Researcher and engineer with deep expertise developing data products

Citation (CITATION.cff)

cff-version: 1.2.0
message: "Powell, J. I. \& Broadcom Mainframe Software. (2024, November 6). Mainframe data wrangling: Preparing your mainframe data for use in machine learning models. [Conference Tutorial]. Guide Share Europe (GSE) UK Conference 2024, - United Kingdom., DOI: 00.0000/00000000.0000.0000000"
authors:
- family-names: "Powell"
  given-names: "J.I."
  orcid: "https://orcid.org/0000-0002-0894-2399"
title: "Mainframe data wrangling: Preparing your mainframe data for use in machine learning models"
version: 0.0.1
doi: "00.0000/00000000.0000.0000000"
date-released: 2024-11-06
url: "https://github.com/joshuapowell/preparing-your-mainframe-data-for-machine-learning"

GitHub Events

Total
  • Push event: 14
  • Create event: 2
Last Year
  • Push event: 14
  • Create event: 2