https://github.com/alexslemonade/identifier-refinery
Tools and assets for easy gene identifier conversion
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 2 DOI reference(s) in README -
✓Academic publication links
Links to: ncbi.nlm.nih.gov, zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.2%) to scientific vocabulary
Last synced: 10 months ago
·
JSON representation
Repository
Tools and assets for easy gene identifier conversion
Basic Info
- Host: GitHub
- Owner: AlexsLemonade
- License: other
- Language: R
- Default Branch: master
- Homepage: https://github.com/AlexsLemonade/identifier-refinery
- Size: 15.9 MB
Statistics
- Stars: 2
- Watchers: 6
- Forks: 1
- Open Issues: 0
- Releases: 0
Created almost 8 years ago
· Last pushed over 7 years ago
https://github.com/AlexsLemonade/identifier-refinery/blob/master/
 # identifier-refinery [](https://zenodo.org/record/1322711) Tools and assets for easy and reproducible gene identifier conversion. ## Methods This repository is used to build matrices which can convert between different gene identifiers. These conversion matrices are built by: * Randomly choosing raw CEL files from NCBI GEO for a given platform accession code (in `/cels`) * Reading the CEL header and joining Brainarray (e.g., `hgu133plus2hsensgprobe`) and Bioconductor (e.g., `hgu133plus2.db`) (x, y) coordinates * Finding intersecting probe identifiers * Extracting supported identifiers and probe IDs from the Bioconductor package * Filtering on probe IDs and Ensembl Gene IDs in Brainarray * Writing the output to a conversion TSV file * Check that all output conversion TSV files have a shared SHA1 ## Repository Contents ### Source Files The `cels` directory contains raw CEL files taken from GEO. The list of supported platforms is in `supported_microarray_platforms.csv`. Source files can be acquired by running the `acquire_cels.py` script. ### Docker Image The conversion scripts are run on custom Docker images. Two Dockerfiles are provided in this repository - `base` Docker image, which is used to install the required R dependencies, and the `pd` image, which is used to build the required databases for a given platform. ### Conversion Scripts A `build_and_convert.py` script is provided, which build a unique Docker image for each package, mount the downloaded CEL files as a volume, and then run the gene conversion script `R/gene_convert.R` inside the image and output the master conversion matrix. Output TSV files live in `cels/out/`. ## Reproducing The entire process can be reproduced by running the following command script from a fresh checkout of this repository. It will take some time: ``` $ ./generate_matricies_from_scratch.sh ``` You can also choose to only build a specific platform, ex.,: ``` $ ./generate_matricies_from_scratch.sh celegans ``` ## Identifiers Released assets in this repository are availble under the DOI, `10.5281/zenodo.1322711`, which can be seen on Zenodo [here](https://zenodo.org/record/1322711). This accession is up to date as of https://github.com/AlexsLemonade/identifier-refinery/commit/cace2849baf2666f21ec32f5eee6208d6ec19294. ## Related Projects * [AlexsLemonade/refinebio](https://github.com/AlexsLemonade/refinebio) ## Copyright `identifier-refinery` output assets are released under a [CC0 1.0 Universal](https://creativecommons.org/publicdomain/zero/1.0/legalcode) license. All code is released under the BSD 3-clause license. Input assets are property of the original providers to NCBI GEO, but may be [freely downloaded and redistributed](https://www.ncbi.nlm.nih.gov/geo/info/disclaimer.html) unless otherwise noted.
Owner
- Name: Alex's Lemonade Stand Foundation
- Login: AlexsLemonade
- Kind: organization
- Website: https://www.alexslemonade.org
- Repositories: 70
- Profile: https://github.com/AlexsLemonade
Childhood Cancer Data Lab of ALSF
GitHub Events
Total
Last Year
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 6
- Total pull requests: 4
- Average time to close issues: 14 days
- Average time to close pull requests: 5 days
- Total issue authors: 3
- Total pull request authors: 1
- Average comments per issue: 2.33
- Average comments per pull request: 0.75
- Merged pull requests: 4
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- Miserlou (3)
- jaclyn-taroni (2)
- kurtwheeler (1)
Pull Request Authors
- jaclyn-taroni (4)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
requirements.txt
pypi
- GEOparse *
- pathlib2 *