https://github.com/chainsawriot/rstyle
The evolution of R programming styles.
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.9%) to scientific vocabulary
Repository
The evolution of R programming styles.
Basic Info
Statistics
- Stars: 42
- Watchers: 4
- Forks: 2
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
rstyle

Citation
Please cite this as: Yen, C.Y., Chang, M.H.W., Chan, C.H. (2019) A Computational Analysis of the Dynamics of R Style Based on 94 Million Lines of Code from All CRAN Packages in the Past 20 Years. Paper presented at the useR! 2019 conference, Toulouse, France. doi:10.31235/osf.io/ts2wq
Preprint of this paper is available here.
Assumptions
- Clone the entire CRAN into
./cransubdirectory. [^1]
sh
rsync -rtlzv --delete cran.r-project.org::CRAN ./cran
It takes 220G of disk space.
- Create the code.db using the Makefile (Don't do that if you already have
code.db)
Files and dependencies
Key RDS files:
In data directory
target_meta.RDS - packages, one (randomly-selected) submission per year.
pkgsfunctionswithsyntaxfeature.RDS - package information with syntatic features.
R files:
0prep - collecting data and sampling
0prep01extractmetadata.R (requires: Cloned CRAN mirror): extract meta data from tarballs. Generate target_meta.RDS and final_meta.RDS in
datadirectory.cat code.sql | sqlite3 code.db : generate the schema of the SQLITE database - code.db. Generate code.db.
0prep02_dump.R (requires: Cloned CRAN mirror, target_meta.RDS): dump the source code, NAMESPACEs and DESCRIPTIONs into code.db. Generate code.db with data. It is very large (> 20G).
0prep03extractdesc.R (requires: Cloned CRAN mirror, targetmeta.RDS): add the text description also into targetmeta.RDS as a column desc. Generate target_meta.RDS (overwrite) in
datadirectory.
1functionnames - Analysis of function names
1functionnames01extractfunction_name.R (requires: code.db): extract names of all exported function from each package. Generate multiple fxdatayr...RDS files in
datadirectory.1functionnames02functionname_analysis.R (requires: fxdatayr...RDS files): analyse the style in function names by year. Generate fxstyleby_year.RDS in
datadirectory.1functionnames03functionname_vis.R (requires: fxstyleby_year.RDS): visualize the time trends of styles in function names. Generate images(END)
2syntax - Analysis of style elements
2syntax01extractfeatures.R (requires: targetmeta.RDS, code.db): extract syntactic features. This procedure is both CPU and I/O intensive. On a normal i5 computer, it would take a month to run. Generate *syntaxfeature_yr...RDS* files in
datadirectory.2syntax02genpkgsfunctionswithsyntaxfeature.R (requires: syntaxfeatureyr...RDS files): combine all .RDS files into one. Generate pkgsfunctionswithsyntaxfeature.RDS.
2syntax03_vis.R (requires: pkgsfunctionswithsyntaxfeature.RDS): Visualize the time trends of syntactic features. Generate images. (END)
3linelength - Analysis of line length
3linelength01_extraction.R (requires: code.db): generate comment_dist.RDS in
datadirectory.3linelength02_animation.R (requires: comment_dist.RDS): generate shiny app.
4communities - Community-based analysis
4communities01extractcran_dependency.R (requires: code.db): extract dependencies of packages from CRAN. Generate cran_dependency.RDS (END)
4communities02buildcran_graph.R (requires: crandependency.RDS): build CRAN dependency graph based on two fields, say "Import" and "Suggests." Generate **crangraph.RDS**(END)
4communities03detectcrancommunityby_walktrap.R (requires: crangraph.RDS): detect CRAN communities by using walktrap algorithm. Generate **commwalktrap.RDS** and comm_size.RDS. In addition, it examines the robustness of identified communities with respect to the choice of random seeds(END)
4communities04communitybasedfeaturescorrection.R (requires: pkgsfunctionswithsyntaxfeature.RDS, commwalktrap.RDS, commsize.RDS, commname.csv): assign community labels to each package, such that package-level summary of syntax features and naming features can be usage for analyzing the style variations among communites. Only apply to the largest 20 communites. Generate **commlargest_feature.RDS** (END)
4communities05viscommunityposterimages.R (requires: commlargestfeature.RDS, crangraph.RDS, commwalktrap.RDS, commsize.RDS, namingconvention.csv): visualize community-related analysis (END)
5conversion - Convert key RDS files to csv for preservation
- 5conversion01makecsv.R (requires: targetmeta.RDS, pkgsfunctionswithsyntax_feature.RDS): Convert RDS files to csv (END)
Related projects
- baaugwo - this project depends on this experimental package to extract meta data and dump code from R packages.
How to use the Docker to build and launch the docker instance?
Build the docker image using the provided Dockerfile
- it is way faster if one builds the docker image inside the directory docker because less data are copied.
sh cd docker/ ; docker build -t rstudio/rstyle -f Dockerfile . ; cd ../ ;
- it is way faster if one builds the docker image inside the directory docker because less data are copied.
By default, docker launches RStudio Server and mounts folders using root user. It makes user rstudio (the default user of RStudio server) with no write access.
One of the solutions of this problem is to make docker launching RStudio Server by using current the UID
sh
docker run -v $(pwd):/home/$USER/rstyle -e USER=$USER -e PASSWORD=xxxx -e USERID=$UID -p 8787:8787 rstudio/rstyle
- or you can launch a development dashboard by executing the following command:
sh
bash dev-tmux.sh
If one is developing under Window Subsystem for Linux (WSL), you may encounter a problem that docker cannot see the folder you mounted in the container. In that case, please try to soft link /mnt/c/ to the root directory as illustrated in this blog post.
And then clone this repository anywhere inside /c/Users/{YOURUSERNAME}. And then specify the `PATHRSTYLE` environment variable as shown below, such that you can launch the dashboard successfully.
sh
PATH_RSTYLE=/c/Users/{YOUR_USERNAME}/{PATH_TO_RSTYLE}/rstyle bash dev-tmux.sh
Label the names of identified communities by walktrap algorithm
We manually assigned a name to the largest identified communities by their 3 most important package members. We priorized importance of packages within a community by the algorithm PageRank.
| commid|commname |top | |-------:|:----------------------|:-----------------------------------| | 6|base |methods, stats, MASS | | 4|Rstudio |testthat, knitr, rmarkdown | | 28|Rcpp |Rcpp, tinytest, pinp | | 3|Statistical Analysis |survival, Formula, sandwich | | 9|Machine Learning |nnet, rpart, randomForest | | 16|Geography 1 |sp, rgdal, maptools | | 15|GNU |gsl, expint, mnormt | | 25|Bioconductor: Graph |graph, Rgraphviz, bnlearn | | 49|Text Analysis |tm, SnowballC, NLP | | 42|GUI |tcltk, tkrplot, tcltk2 | | 13|Infrastructure 1 |rsp, listenv, globals | | 17|Numerical Optimization |polynom, magic, numbers | | 40|Bioconductor: Genomics |Biostrings, IRanges, S4Vectors | | 77|RUnit |RUnit, ADGofTest, fAsianOptions | | 24|Survival Analysis |kinship2, CompQuadForm, coxme | | 2|Sparse Matrix |slam, ROI, registry | | 44|Infrastructure 2 |RGtk2, gWidgetstcltk, gWidgetsRGtk2 | | 75|Bioinformatics |limma, affy, marray | | 37|IO |RJSONIO, Rook, base64 | | 45|rJava |rJava, xlsxjars, openNLP |
[^1]: CRAN mirror HOWTO/FAQ
Owner
- Login: chainsawriot
- Kind: user
- Location: Germany
- Company: @gesistsa
- Website: http://www.chainsawriot.com
- Repositories: 241
- Profile: https://github.com/chainsawriot
GitHub Events
Total
Last Year
Issues and Pull Requests
Last synced: over 1 year ago
All Time
- Total issues: 16
- Total pull requests: 16
- Average time to close issues: 7 months
- Average time to close pull requests: 4 months
- Total issue authors: 3
- Total pull request authors: 2
- Average comments per issue: 1.44
- Average comments per pull request: 0.13
- Merged pull requests: 12
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- yenchiayi (9)
- chainsawriot (6)
- pymia (1)
Pull Request Authors
- yenchiayi (11)
- pymia (5)