https://github.com/cumbof/opengdc

An open-source Java tool to automatically extract and convert all clinical and genomic data from the Genomic Data Commons to BED, GTF, CSV, and JSON format

https://github.com/cumbof/opengdc

Science Score: 39.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.3%) to scientific vocabulary

Keywords

bed bioinformatics csv gdc gtf json target tcga
Last synced: 5 months ago · JSON representation

Repository

An open-source Java tool to automatically extract and convert all clinical and genomic data from the Genomic Data Commons to BED, GTF, CSV, and JSON format

Basic Info
Statistics
  • Stars: 0
  • Watchers: 5
  • Forks: 1
  • Open Issues: 0
  • Releases: 1
Archived
Topics
bed bioinformatics csv gdc gtf json target tcga
Created almost 9 years ago · Last pushed almost 5 years ago
Metadata Files
Readme License

README.md

OpenGDC

OpenGDC is an open-source Java tool for the automatic extraction, extension, and conversion in BED, GTF, CSV, JSON, and XML format of all the genomic experiments and clinical information from the Genomic Data Commons (GDC) portal https://gdc.cancer.gov/.

How to use

This is a NetBeans project. Just clone the repo, load it into NetBeans, set the GUI.java class as the main class, and compile (JRE 1.8 or higher is required). Double click on the produced JAR and start playing with OpenGDC.

Build a repository

The software includes a built-in mode to create a repository with all the original public available data of GDC and the converted once. To enable this mode, set the UpdateScheduler.java class as the main class of the project, and produce your JAR. This requires a date as argument like in the following example:

java -jar UpdateScheduler.jar 2020-01-01

The specified date is internally used to filter and retrieve the GDC data produced starting from Jan 01, 2020 (in this case). To create an automatic procedure to maintan the repository up to date, the most easy solution is to exploit crontab to schedule the execution of the software one time every X days. This can be done by creating a simple bash script like the following one:

```

!/bin/bash

datetime=$( tail -n 1 opengdc-history.txt ) java -jar UpdateScheduler.jar $datetime date +\%Y-\%m-\%d >> opengdc-history.txt ```

This script exploit an external TXT file opengdc-history.txt which take trace of the last day on which the execution of the software has been performed. We recommend to initialise the opengdc-history.txt file with just a single line containing a date as far as possible from the start up of the GDC program to initially build the repo with all the public available data.

Links

Credits

Please credit OpenGDC in your manuscript by citing:

Eleonora Cappelli, Fabio Cumbo, Anna Bernasconi, Arif Canakoglu, Stefano Ceri, Marco Masseroli, and Emanuel Weitschek. "OpenGDC: unifying, modeling, integrating cancer genomic data and clinical metadata" Appl. Sci. 2020, 10(18), 6367. https://doi.org/10.3390/app10186367

Owner

  • Name: Fabio Cumbo
  • Login: cumbof
  • Kind: user
  • Location: Cleveland, OH, USA
  • Company: Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic

Ph.D. in Computer Science and Automation Engineering, Postdoctoral Research Fellow @BlankenbergLab, GMI, LRI, Cleveland Clinic, USA

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: about 1 year ago

All Time
  • Total issues: 10
  • Total pull requests: 131
  • Average time to close issues: 5 months
  • Average time to close pull requests: 1 day
  • Total issue authors: 2
  • Total pull request authors: 2
  • Average comments per issue: 0.4
  • Average comments per pull request: 0.19
  • Merged pull requests: 121
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • cumbof (4)
  • eleonoracappelli (1)
Pull Request Authors
  • eleonoracappelli (64)
  • cumbof (8)
Top Labels
Issue Labels
enhancement (2) bug (1)
Pull Request Labels