https://github.com/erictleung/gene-fusion-analysis
:stars: Parsing and basic network analysis of fusion genes from the Catalogue of Somatic Mutations in Cancer (COSMIC) database
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: nature.com -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.0%) to scientific vocabulary
Repository
:stars: Parsing and basic network analysis of fusion genes from the Catalogue of Somatic Mutations in Cancer (COSMIC) database
Basic Info
Statistics
- Stars: 3
- Watchers: 1
- Forks: 4
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Gene Fusion Network Analysis
This directory contains the cleaning of gene fusion data from the Catalogue of Somatic Mutations in Cancer (COMSIC) and basic statistical and network analysis on the resulting data.
System Specifications
Tested on Mac OS X 10.10.5 Yosemite.
- make
- R 3.2.3
Download, Run Analysis, and Generate Report
shell
$ git clone git@github.com:erictleung/gene-fusion-analysis.git
$ cd gene-fusion-analysis
$ make
Written Scripts
report/cosmic_fusion_extraction.Rmd- Cleans COSMIC data
- Writes file
results/newDescription.txtwith the gene fusion data - Takes in Ensembl and functional interaction network data for analysis
- Performs simple statistics and basic network statistics
Notes on Downloaded Files
raw-data/FIsInGene_121013_with_annotations.txt- Description:
- This is the functional interaction network file from the Reactome website. The file will be used so that the gene fusions can be overlaid on top to calculate metrics that will categorize the cancerous gene fusions.
- “Functional interactions (FIs) derived from Reactome, and other pathway and interaction databases.” We downloaded the Version 2013.
- Source: http://www.reactome.org/pages/download-data/
- Description:
bin/ensembl_GRCh37_BioMart_2014.08.29.pl- Description:
- This is a Perl script that Ensembl automatically generated, based on
the parameters I set:
- Associated Gene name
- Ensembl Transcript ID
- 5' UTR Start
- 5' UTR End
- Exon Chr Start (bp)
- Exon Chr End (bp)
- 3' UTR Start
- 3' UTR End
- Strand (directionality)
- Exon Rank in Transcript (which exon number it is)
- This is a Perl script that Ensembl automatically generated, based on
the parameters I set:
- Downloads:
raw-data/ensembl_GRCh37_BioMart_2014.08.29.csv - Source: http://grch37.ensembl.org/biomart/martview/
- Description:
raw-data/CosmicFusionExport_v69_310514.tsv- Description:
- "All gene fusion mutation data from the current release in a tab separated file."
- Source: cancer.sanger.ac.uk/cancergenome/projects/cosmic/download
- Description:
NOTE: Error in COSMIC Data
If you try and compile and run the cosmic_fusion_extraction.Rmd analysis, it
will not fail but it will if you start with the raw data. The reason being is
that the COSMIC gene fusion data set (CosmicFusionExport_v69_310514.tsv) is
missing an open bracket on Line 11620.
The converted .csv version of the data (CosmicFusionExport_v69_310514.csv)
included in this analysis is manually edited so that the .Rmd analysis file
will run correctly.
Related Literature
- Wu, Chia-Chin, et al. "Identification of cancer fusion drivers using network fusion centrality." Bioinformatics 29.9 (2013): 1174-1181.
- Wang, Xiao-Song, et al. "An integrative approach to reveal driver gene fusions from paired-end sequencing data in cancer." Nature biotechnology 27.11 (2009): 1005-1011.
Analysis Directory Structure
``` . ├── Makefile ├── README.md ├── bin │ └── ensemblGRCh37BioMart2014.08.29.pl ├── data │ └── CosmicFusionExportv69310514.csv ├── raw-data │ ├── CosmicFusionExportv69310514.tsv │ ├── FIsInGene121013withannotations.txt │ └── ensemblGRCh372014.08.29.csv ├── report │ ├── cosmicfusionextraction.Rmd │ └── cosmicfusionextraction.html └── results └── newDescription.txt
5 directories, 10 files ```
Owner
- Name: Eric Leung
- Login: erictleung
- Kind: user
- Location: New York, NY
- Website: https://erictleung.com
- Repositories: 169
- Profile: https://github.com/erictleung
Data science generalist. Sharing knowledge and optimizing tools for learning and growth. Open-source and open-data advocate. Community learner.
GitHub Events
Total
Last Year
Committers
Last synced: about 1 year ago
Top Committers
| Name | Commits | |
|---|---|---|
| Eric Leung | e****c@e****m | 7 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: about 1 year ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0