https://github.com/cumbof/opengdc
An open-source Java tool to automatically extract and convert all clinical and genomic data from the Genomic Data Commons to BED, GTF, CSV, and JSON format
Science Score: 39.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.3%) to scientific vocabulary
Keywords
Repository
An open-source Java tool to automatically extract and convert all clinical and genomic data from the Genomic Data Commons to BED, GTF, CSV, and JSON format
Basic Info
- Host: GitHub
- Owner: cumbof
- License: mit
- Language: Java
- Default Branch: master
- Homepage: http://geco.deib.polimi.it/opengdc/
- Size: 237 MB
Statistics
- Stars: 0
- Watchers: 5
- Forks: 1
- Open Issues: 0
- Releases: 1
Topics
Metadata Files
README.md
OpenGDC
OpenGDC is an open-source Java tool for the automatic extraction, extension, and conversion in BED, GTF, CSV, JSON, and XML format of all the genomic experiments and clinical information from the Genomic Data Commons (GDC) portal https://gdc.cancer.gov/.
How to use
This is a NetBeans project. Just clone the repo, load it into NetBeans, set the GUI.java class as the main class, and compile (JRE 1.8 or higher is required). Double click on the produced JAR and start playing with OpenGDC.
Build a repository
The software includes a built-in mode to create a repository with all the original public available data of GDC and the converted once. To enable this mode, set the UpdateScheduler.java class as the main class of the project, and produce your JAR. This requires a date as argument like in the following example:
java -jar UpdateScheduler.jar 2020-01-01
The specified date is internally used to filter and retrieve the GDC data produced starting from Jan 01, 2020 (in this case). To create an automatic procedure to maintan the repository up to date, the most easy solution is to exploit crontab to schedule the execution of the software one time every X days. This can be done by creating a simple bash script like the following one:
```
!/bin/bash
datetime=$( tail -n 1 opengdc-history.txt ) java -jar UpdateScheduler.jar $datetime date +\%Y-\%m-\%d >> opengdc-history.txt ```
This script exploit an external TXT file opengdc-history.txt which take trace of the last day on which the execution of the software has been performed. We recommend to initialise the opengdc-history.txt file with just a single line containing a date as far as possible from the start up of the GDC program to initially build the repo with all the public available data.
Links
- Software web-page: http://geco.deib.polimi.it/opengdc/
- Public data repository: fttp://geco.deib.polimi.it/opengdc/
Credits
Please credit OpenGDC in your manuscript by citing:
Eleonora Cappelli, Fabio Cumbo, Anna Bernasconi, Arif Canakoglu, Stefano Ceri, Marco Masseroli, and Emanuel Weitschek. "OpenGDC: unifying, modeling, integrating cancer genomic data and clinical metadata" Appl. Sci. 2020, 10(18), 6367. https://doi.org/10.3390/app10186367
Owner
- Name: Fabio Cumbo
- Login: cumbof
- Kind: user
- Location: Cleveland, OH, USA
- Company: Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic
- Website: https://cumbof.github.io/
- Twitter: cumbofabio
- Repositories: 14
- Profile: https://github.com/cumbof
Ph.D. in Computer Science and Automation Engineering, Postdoctoral Research Fellow @BlankenbergLab, GMI, LRI, Cleveland Clinic, USA
GitHub Events
Total
Last Year
Issues and Pull Requests
Last synced: about 1 year ago
All Time
- Total issues: 10
- Total pull requests: 131
- Average time to close issues: 5 months
- Average time to close pull requests: 1 day
- Total issue authors: 2
- Total pull request authors: 2
- Average comments per issue: 0.4
- Average comments per pull request: 0.19
- Merged pull requests: 121
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- cumbof (4)
- eleonoracappelli (1)
Pull Request Authors
- eleonoracappelli (64)
- cumbof (8)