https://github.com/cluebbers/using_r_for_hpda
Exploring R for high-performance data analytics, including memory management, GPU computing, parallel processing, benchmarks, case studies, and comparisons with Python.
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.6%) to scientific vocabulary
Keywords
Repository
Exploring R for high-performance data analytics, including memory management, GPU computing, parallel processing, benchmarks, case studies, and comparisons with Python.
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
Using R for High-Performance Data Analytics
Overview
This repository contains the seminar report and associated materials for the course "Newest Trends in High-Performance Data Analytics" at Georg-August-Universität Göttingen. The report investigates the use of R in high-performance data analytics (HPDA), focusing on memory management, GPU computing, parallel processing, and benchmarking.
Repository Structure
├── README.md
├── 2024-03-25_R_HPDA_Luebbers.pdf # Detailed insights into leveraging R for high-performance data analytics
├── NTHPDA.Rmd # R notebook containing example code and benchmarks
Report Highlights
- Memory Management: Techniques to optimize R's memory usage for handling large datasets.
- GPU Computing: Utilizing GPU for accelerated computations with R packages.
- Parallel Processing: Methods to perform parallel computations to speed up data processing tasks.
- Benchmarking: Evaluating the performance of various R functions and comparing them with Python.
- Leveraging C++: Enhancing R's performance by integrating C++ code.
- Computational Biology: Using R for high-performance data analysis in genomics and bioinformatics.
- Comparative Analysis: Evaluating R's performance against Python for various data processing tasks.
Code
To run the example scripts, you need to have R installed on your system along with the necessary packages. You can install the required packages using the following commands:
- Install the required packages
R
install.packages(c("forcats", "readr", "dplyr", "tidyr", "ggplot2", "tibble", "devtools"))
- Download the data
The data can be found here: https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Datasets/DVS/natality/Nat2018us.zip
- Knit the R Markdown file
Open the NTHPDA.Rmd file in RStudio and click the "Knit" button to generate the HTML report.
Alternatively, you can use the following command in your R console:
R
rmarkdown::render("NTHPDA.Rmd")
Future Work
Making gpuR work :)
License
This project is licensed under the MIT License. See the LICENSE file for details.
Contact
For any questions or feedback, please contact Christopher L. Lübbers at c.luebbers@stud.uni-goettingen.de.
Owner
- Login: cluebbers
- Kind: user
- Location: Göttingen
- Company: University of Göttingen
- Repositories: 1
- Profile: https://github.com/cluebbers
studying Applied Data Science Interested in Natural Language Processing.
GitHub Events
Total
Last Year
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0