gdsfmt

R Interface to CoreArray Genomic Data Structure (GDS) Files (Development version only)

https://github.com/zhengxwen/gdsfmt

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 8 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
    1 of 6 committers (16.7%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.5%) to scientific vocabulary

Keywords

bioinformatics gds-format genomics infrastructure r

Keywords from Contributors

bioconductor derfinder rnaseq bioconductor-package fluorescence microscopy particles tracking ncbi-geo feature-detection
Last synced: 6 months ago · JSON representation

Repository

R Interface to CoreArray Genomic Data Structure (GDS) Files (Development version only)

Basic Info
Statistics
  • Stars: 20
  • Watchers: 3
  • Forks: 4
  • Open Issues: 12
  • Releases: 14
Topics
bioinformatics gds-format genomics infrastructure r
Created over 11 years ago · Last pushed 8 months ago
Metadata Files
Readme Changelog

README.md

gdsfmt: R Interface to CoreArray Genomic Data Structure (GDS) files

LGPLv3 GNU Lesser General Public License, LGPL-3

Availability Years-in-BioC R

Features

This package provides a high-level R interface to CoreArray Genomic Data Structure (GDS) data files, which are portable across platforms with hierarchical structure to store multiple scalable array-oriented data sets with metadata information. It is suited for large-scale datasets, especially for data which are much larger than the available random-access memory. The gdsfmt package offers the efficient operations specifically designed for integers of less than 8 bits, since a diploid genotype, like single-nucleotide polymorphism (SNP), usually occupies fewer bits than a byte. Data compression and decompression are available with relatively efficient random access. It is also allowed to read a GDS file in parallel with multiple R processes supported by the package parallel.

Bioconductor:

Release Version: v1.44.1

http://www.bioconductor.org/packages/release/bioc/html/gdsfmt.html

Help Documents

News

Package Vignettes

http://bioconductor.org/packages/release/bioc/vignettes/gdsfmt/inst/doc/gdsfmt.html

Citations

Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS (2012). A High-performance Computing Toolset for Relatedness and Principal Component Analysis of SNP Data. Bioinformatics. DOI: 10.1093/bioinformatics/bts606.

Zheng X, Gogarten S, Lawrence M, Stilp A, Conomos M, Weir BS, Laurie C, Levine D (2017). SeqArray -- A storage-efficient high-performance data format for WGS variant calls. Bioinformatics. DOI: 10.1093/bioinformatics/btx145.

Package Maintainer

Dr. Xiuwen Zheng

URL

https://bioconductor.org/packages/gdsfmt

https://github.com/zhengxwen/gdsfmt

Installation

  • Bioconductor repository: R if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("gdsfmt")

  • Development version from Github (for developers/testers only): R library("devtools") install_github("zhengxwen/gdsfmt") The install_github() approach requires that you build from source, i.e. make and compilers must be installed on your system -- see the R FAQ for your operating system; you may also need to install dependencies manually.

Copyright Notice

  • CoreArray C++ library, LGPL-3 License, 2007-2021, Xiuwen Zheng
  • zlib, zlib License, 1995-2017, Jean-loup Gailly and Mark Adler
  • LZ4, BSD 2-clause License, 2011-2019, Yann Collet
  • liblzma, public domain, 2005-2018, Lasse Collin and other xz contributors
  • README

GDS Command-line Tools

In the R environment, ```R install.packages("getopt", repos="http://cran.r-project.org") install.packages("optparse", repos="http://cran.r-project.org") install.packages("crayon", repos="http://cran.r-project.org")

if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("gdsfmt") ```

See More...

viewgds

viewgds is a shell script written in R (viewgds.R), to view the contents of a GDS file. The R packages gdsfmt, getopt and optparse should be installed before running viewgds, and the package crayon is optional.

Usage: viewgds [options] file

Installation with command line, ```sh curl -L https://raw.githubusercontent.com/zhengxwen/Documents/master/Program/viewgds.R > viewgds chmod +x viewgds

Or

wget -qO- --no-check-certificate https://raw.githubusercontent.com/zhengxwen/Documents/master/Program/viewgds.R > viewgds chmod +x viewgds ```

diffgds

diffgds is a shell script written in R (diffgds.R), to compare two files GDS files. The R packages gdsfmt, getopt and optparse should be installed before running diffgds.

Usage: diffgds [options] file1 file2

Installation with command line, ```sh curl -L https://raw.githubusercontent.com/zhengxwen/Documents/master/Program/diffgds.R > diffgds chmod +x diffgds

Or

wget -qO- --no-check-certificate https://raw.githubusercontent.com/zhengxwen/Documents/master/Program/diffgds.R > diffgds chmod +x diffgds ```

Examples

```R library(gdsfmt)

create a GDS file

f <- createfn.gds("test.gds")

add.gdsn(f, "int", val=1:10000) add.gdsn(f, "double", val=seq(1, 1000, 0.4)) add.gdsn(f, "character", val=c("int", "double", "logical", "factor")) add.gdsn(f, "logical", val=rep(c(TRUE, FALSE, NA), 50)) add.gdsn(f, "factor", val=as.factor(c(NA, "AA", "CC"))) add.gdsn(f, "bit2", val=sample(0:3, 1000, replace=TRUE), storage="bit2")

list and data.frame

add.gdsn(f, "list", val=list(X=1:10, Y=seq(1, 10, 0.25))) add.gdsn(f, "data.frame", val=data.frame(X=1:19, Y=seq(1, 10, 0.5)))

folder <- addfolder.gdsn(f, "folder") add.gdsn(folder, "int", val=1:1000) add.gdsn(folder, "double", val=seq(1, 100, 0.4))

show the contents

f

close the GDS file

closefn.gds(f) ```

File: test.gds (1.1K) + [ ] |--+ int { Int32 10000, 39.1K } |--+ double { Float64 2498, 19.5K } |--+ character { Str8 4, 26B } |--+ logical { Int32,logical 150, 600B } * |--+ factor { Int32,factor 3, 12B } * |--+ bit2 { Bit2 1000, 250B } |--+ list [ list ] * | |--+ X { Int32 10, 40B } | \--+ Y { Float64 37, 296B } |--+ data.frame [ data.frame ] * | |--+ X { Int32 19, 76B } | \--+ Y { Float64 19, 152B } \--+ folder [ ] |--+ int { Int32 1000, 3.9K } \--+ double { Float64 248, 1.9K }

Also See

pygds: Python interface to CoreArray Genomic Data Structure (GDS) files

jugds.jl: Julia interface to CoreArray Genomic Data Structure (GDS) files

Owner

  • Name: Xiuwen Zheng
  • Login: zhengxwen
  • Kind: user
  • Location: Chicago

GitHub Events

Total
  • Issues event: 2
  • Watch event: 2
  • Issue comment event: 6
  • Push event: 10
Last Year
  • Issues event: 2
  • Watch event: 2
  • Issue comment event: 6
  • Push event: 10

Committers

Last synced: 11 months ago

All Time
  • Total Commits: 547
  • Total Committers: 6
  • Avg Commits per committer: 91.167
  • Development Distribution Score (DDS): 0.139
Past Year
  • Commits: 15
  • Committers: 1
  • Avg Commits per committer: 15.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Xiuwen Zheng z****n@g****m 471
x.zheng x****g@b****8 66
d.tenenbaum d****m@b****8 4
Bioconductor Git-SVN Bridge b****c@b****g 3
Nathan Weeks w****s@i****u 2
m.carlson m****n@b****8 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 46
  • Total pull requests: 3
  • Average time to close issues: 28 days
  • Average time to close pull requests: about 2 hours
  • Total issue authors: 21
  • Total pull request authors: 2
  • Average comments per issue: 1.63
  • Average comments per pull request: 0.33
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 2
  • Pull requests: 0
  • Average time to close issues: about 2 months
  • Average time to close pull requests: N/A
  • Issue authors: 2
  • Pull request authors: 0
  • Average comments per issue: 4.5
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • zhengxwen (14)
  • sariya (2)
  • smgogarten (1)
  • bastistician (1)
  • 229668880 (1)
  • Sebass-DP (1)
  • liqg (1)
  • The-Jacob-Lopez (1)
  • kforner (1)
  • rsbivand (1)
  • splaisan (1)
  • qindan2008 (1)
  • connorourke (1)
  • rbutleriii (1)
  • thierrygosselin (1)
Pull Request Authors
  • zhengxwen (2)
  • nathanweeks (1)
Top Labels
Issue Labels
bug (15) enhancement (2) document (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • bioconductor 262,938 total
  • Total dependent packages: 18
  • Total dependent repositories: 0
  • Total versions: 9
  • Total maintainers: 1
bioconductor.org: gdsfmt

R Interface to CoreArray Genomic Data Structure (GDS) Files

  • Versions: 9
  • Dependent Packages: 18
  • Dependent Repositories: 0
  • Downloads: 262,938 Total
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Average: 2.7%
Downloads: 8.2%
Maintainers (1)
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • R >= 2.15.0 depends
  • methods * depends
  • BiocGenerics * suggests
  • Matrix * suggests
  • RUnit * suggests
  • crayon * suggests
  • digest * suggests
  • knitr * suggests
  • markdown * suggests
  • parallel * suggests
  • rmarkdown * suggests
.github/workflows/r.yml actions
  • actions/checkout v3 composite
  • r-lib/actions/setup-r f57f1301a053485946083d7a45022b278929a78a composite