https://github.com/cluebbers/using_r_for_hpda

Exploring R for high-performance data analytics, including memory management, GPU computing, parallel processing, benchmarks, case studies, and comparisons with Python.

https://github.com/cluebbers/using_r_for_hpda

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.6%) to scientific vocabulary

Keywords

benchmarking case-studies data-science gpu-computing high-performance-data-analytics memory-management parallel-processing python-comparison r
Last synced: 5 months ago · JSON representation

Repository

Exploring R for high-performance data analytics, including memory management, GPU computing, parallel processing, benchmarks, case studies, and comparisons with Python.

Basic Info
  • Host: GitHub
  • Owner: cluebbers
  • License: mit
  • Default Branch: main
  • Homepage:
  • Size: 938 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
benchmarking case-studies data-science gpu-computing high-performance-data-analytics memory-management parallel-processing python-comparison r
Created almost 2 years ago · Last pushed 9 months ago
Metadata Files
Readme License

README.md

Using R for High-Performance Data Analytics

Overview

This repository contains the seminar report and associated materials for the course "Newest Trends in High-Performance Data Analytics" at Georg-August-Universität Göttingen. The report investigates the use of R in high-performance data analytics (HPDA), focusing on memory management, GPU computing, parallel processing, and benchmarking.

Repository Structure

├── README.md ├── 2024-03-25_R_HPDA_Luebbers.pdf # Detailed insights into leveraging R for high-performance data analytics ├── NTHPDA.Rmd # R notebook containing example code and benchmarks

Report Highlights

  • Memory Management: Techniques to optimize R's memory usage for handling large datasets.
  • GPU Computing: Utilizing GPU for accelerated computations with R packages.
  • Parallel Processing: Methods to perform parallel computations to speed up data processing tasks.
  • Benchmarking: Evaluating the performance of various R functions and comparing them with Python.
  • Leveraging C++: Enhancing R's performance by integrating C++ code.
  • Computational Biology: Using R for high-performance data analysis in genomics and bioinformatics.
  • Comparative Analysis: Evaluating R's performance against Python for various data processing tasks.

Code

To run the example scripts, you need to have R installed on your system along with the necessary packages. You can install the required packages using the following commands:

  1. Install the required packages

R install.packages(c("forcats", "readr", "dplyr", "tidyr", "ggplot2", "tibble", "devtools"))

  1. Download the data

The data can be found here: https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Datasets/DVS/natality/Nat2018us.zip

  1. Knit the R Markdown file

Open the NTHPDA.Rmd file in RStudio and click the "Knit" button to generate the HTML report. Alternatively, you can use the following command in your R console:

R rmarkdown::render("NTHPDA.Rmd")

Future Work

Making gpuR work :)

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

For any questions or feedback, please contact Christopher L. Lübbers at c.luebbers@stud.uni-goettingen.de.

Owner

  • Login: cluebbers
  • Kind: user
  • Location: Göttingen
  • Company: University of Göttingen

studying Applied Data Science Interested in Natural Language Processing.

GitHub Events

Total
Last Year

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 6
  • Total Committers: 1
  • Avg Commits per committer: 6.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
cluebbers 1****s 6

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels