gdsfmt
R Interface to CoreArray Genomic Data Structure (GDS) Files (Development version only)
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 8 DOI reference(s) in README -
○Academic publication links
-
✓Committers with academic emails
1 of 6 committers (16.7%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.5%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
R Interface to CoreArray Genomic Data Structure (GDS) Files (Development version only)
Basic Info
- Host: GitHub
- Owner: zhengxwen
- Language: C
- Default Branch: master
- Homepage: http://www.bioconductor.org/packages/gdsfmt
- Size: 13.5 MB
Statistics
- Stars: 20
- Watchers: 3
- Forks: 4
- Open Issues: 12
- Releases: 14
Topics
Metadata Files
README.md
gdsfmt: R Interface to CoreArray Genomic Data Structure (GDS) files
GNU Lesser General Public License, LGPL-3
Features
This package provides a high-level R interface to CoreArray Genomic Data Structure (GDS) data files, which are portable across platforms with hierarchical structure to store multiple scalable array-oriented data sets with metadata information. It is suited for large-scale datasets, especially for data which are much larger than the available random-access memory. The gdsfmt package offers the efficient operations specifically designed for integers of less than 8 bits, since a diploid genotype, like single-nucleotide polymorphism (SNP), usually occupies fewer bits than a byte. Data compression and decompression are available with relatively efficient random access. It is also allowed to read a GDS file in parallel with multiple R processes supported by the package parallel.
Bioconductor:
Release Version: v1.44.1
http://www.bioconductor.org/packages/release/bioc/html/gdsfmt.html
Package Vignettes
http://bioconductor.org/packages/release/bioc/vignettes/gdsfmt/inst/doc/gdsfmt.html
Citations
Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS (2012). A High-performance Computing Toolset for Relatedness and Principal Component Analysis of SNP Data. Bioinformatics. DOI: 10.1093/bioinformatics/bts606.
Zheng X, Gogarten S, Lawrence M, Stilp A, Conomos M, Weir BS, Laurie C, Levine D (2017). SeqArray -- A storage-efficient high-performance data format for WGS variant calls. Bioinformatics. DOI: 10.1093/bioinformatics/btx145.
Package Maintainer
Dr. Xiuwen Zheng
URL
https://bioconductor.org/packages/gdsfmt
https://github.com/zhengxwen/gdsfmt
Installation
Bioconductor repository:
R if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("gdsfmt")Development version from Github (for developers/testers only):
R library("devtools") install_github("zhengxwen/gdsfmt")Theinstall_github()approach requires that you build from source, i.e.makeand compilers must be installed on your system -- see the R FAQ for your operating system; you may also need to install dependencies manually.
Copyright Notice
- CoreArray C++ library, LGPL-3 License, 2007-2021, Xiuwen Zheng
- zlib, zlib License, 1995-2017, Jean-loup Gailly and Mark Adler
- LZ4, BSD 2-clause License, 2011-2019, Yann Collet
- liblzma, public domain, 2005-2018, Lasse Collin and other xz contributors
- README
GDS Command-line Tools
In the R environment, ```R install.packages("getopt", repos="http://cran.r-project.org") install.packages("optparse", repos="http://cran.r-project.org") install.packages("crayon", repos="http://cran.r-project.org")
if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("gdsfmt") ```
viewgds
viewgds is a shell script written in R (viewgds.R), to view the contents of a GDS file. The R packages gdsfmt, getopt and optparse should be installed before running viewgds, and the package crayon is optional.
Usage: viewgds [options] file
Installation with command line, ```sh curl -L https://raw.githubusercontent.com/zhengxwen/Documents/master/Program/viewgds.R > viewgds chmod +x viewgds
Or
wget -qO- --no-check-certificate https://raw.githubusercontent.com/zhengxwen/Documents/master/Program/viewgds.R > viewgds chmod +x viewgds ```
diffgds
diffgds is a shell script written in R (diffgds.R), to compare two files GDS files. The R packages gdsfmt, getopt and optparse should be installed before running diffgds.
Usage: diffgds [options] file1 file2
Installation with command line, ```sh curl -L https://raw.githubusercontent.com/zhengxwen/Documents/master/Program/diffgds.R > diffgds chmod +x diffgds
Or
wget -qO- --no-check-certificate https://raw.githubusercontent.com/zhengxwen/Documents/master/Program/diffgds.R > diffgds chmod +x diffgds ```
Examples
```R library(gdsfmt)
create a GDS file
f <- createfn.gds("test.gds")
add.gdsn(f, "int", val=1:10000) add.gdsn(f, "double", val=seq(1, 1000, 0.4)) add.gdsn(f, "character", val=c("int", "double", "logical", "factor")) add.gdsn(f, "logical", val=rep(c(TRUE, FALSE, NA), 50)) add.gdsn(f, "factor", val=as.factor(c(NA, "AA", "CC"))) add.gdsn(f, "bit2", val=sample(0:3, 1000, replace=TRUE), storage="bit2")
list and data.frame
add.gdsn(f, "list", val=list(X=1:10, Y=seq(1, 10, 0.25))) add.gdsn(f, "data.frame", val=data.frame(X=1:19, Y=seq(1, 10, 0.5)))
folder <- addfolder.gdsn(f, "folder") add.gdsn(folder, "int", val=1:1000) add.gdsn(folder, "double", val=seq(1, 100, 0.4))
show the contents
f
close the GDS file
closefn.gds(f) ```
File: test.gds (1.1K)
+ [ ]
|--+ int { Int32 10000, 39.1K }
|--+ double { Float64 2498, 19.5K }
|--+ character { Str8 4, 26B }
|--+ logical { Int32,logical 150, 600B } *
|--+ factor { Int32,factor 3, 12B } *
|--+ bit2 { Bit2 1000, 250B }
|--+ list [ list ] *
| |--+ X { Int32 10, 40B }
| \--+ Y { Float64 37, 296B }
|--+ data.frame [ data.frame ] *
| |--+ X { Int32 19, 76B }
| \--+ Y { Float64 19, 152B }
\--+ folder [ ]
|--+ int { Int32 1000, 3.9K }
\--+ double { Float64 248, 1.9K }
Also See
pygds: Python interface to CoreArray Genomic Data Structure (GDS) files
jugds.jl: Julia interface to CoreArray Genomic Data Structure (GDS) files
Owner
- Name: Xiuwen Zheng
- Login: zhengxwen
- Kind: user
- Location: Chicago
- Repositories: 13
- Profile: https://github.com/zhengxwen
GitHub Events
Total
- Issues event: 2
- Watch event: 2
- Issue comment event: 6
- Push event: 10
Last Year
- Issues event: 2
- Watch event: 2
- Issue comment event: 6
- Push event: 10
Committers
Last synced: 11 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Xiuwen Zheng | z****n@g****m | 471 |
| x.zheng | x****g@b****8 | 66 |
| d.tenenbaum | d****m@b****8 | 4 |
| Bioconductor Git-SVN Bridge | b****c@b****g | 3 |
| Nathan Weeks | w****s@i****u | 2 |
| m.carlson | m****n@b****8 | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 46
- Total pull requests: 3
- Average time to close issues: 28 days
- Average time to close pull requests: about 2 hours
- Total issue authors: 21
- Total pull request authors: 2
- Average comments per issue: 1.63
- Average comments per pull request: 0.33
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 2
- Pull requests: 0
- Average time to close issues: about 2 months
- Average time to close pull requests: N/A
- Issue authors: 2
- Pull request authors: 0
- Average comments per issue: 4.5
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- zhengxwen (14)
- sariya (2)
- smgogarten (1)
- bastistician (1)
- 229668880 (1)
- Sebass-DP (1)
- liqg (1)
- The-Jacob-Lopez (1)
- kforner (1)
- rsbivand (1)
- splaisan (1)
- qindan2008 (1)
- connorourke (1)
- rbutleriii (1)
- thierrygosselin (1)
Pull Request Authors
- zhengxwen (2)
- nathanweeks (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- bioconductor 262,938 total
- Total dependent packages: 18
- Total dependent repositories: 0
- Total versions: 9
- Total maintainers: 1
bioconductor.org: gdsfmt
R Interface to CoreArray Genomic Data Structure (GDS) Files
- Homepage: https://github.com/zhengxwen/gdsfmt
- Documentation: https://bioconductor.org/packages/release/bioc/vignettes/gdsfmt/inst/doc/gdsfmt.pdf
- License: LGPL-3
-
Latest release: 1.44.1
published 6 months ago
Rankings
Maintainers (1)
Dependencies
- R >= 2.15.0 depends
- methods * depends
- BiocGenerics * suggests
- Matrix * suggests
- RUnit * suggests
- crayon * suggests
- digest * suggests
- knitr * suggests
- markdown * suggests
- parallel * suggests
- rmarkdown * suggests
- actions/checkout v3 composite
- r-lib/actions/setup-r f57f1301a053485946083d7a45022b278929a78a composite