create-bridgedb-diseases

Code to create local mapping files for disease IDs

https://github.com/denisesl22/create-bridgedb-diseases

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 7 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.1%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Code to create local mapping files for disease IDs

Basic Info
  • Host: GitHub
  • Owner: DeniseSl22
  • License: other
  • Language: Groovy
  • Default Branch: master
  • Size: 2.47 MB
Statistics
  • Stars: 0
  • Watchers: 2
  • Forks: 3
  • Open Issues: 0
  • Releases: 0
Created over 6 years ago · Last pushed almost 3 years ago
Metadata Files
Readme License Citation

README.md

Create BridgeDb Identity Mapping files

This Groovy script creates a Derby file for BridgeDb [1,2] for use in PathVisio, etc.

The script will be tested with Wikidata [3,4] from November 2019, and is based on the createbridgedbmetabolites repository

We're greatfull for all that worked on identifier mappings in this/these project(s):

  • http://wikidata.org/

Everyone can contribute ID mappings to Wikidata.

Releases

The files are released via the BridgeDb Website: https://www.bridgedb.org/data/gene_database/

The mapping files are also archived on Figshare: https://figshare.com/search?q=diseases+bridgedb+mapping+database&quick=1

License

This repository: New BSD.

Derby License -> http://db.apache.org/derby/license.html BridgeDb License -> http://www.bridgedb.org/browser/trunk/LICENSE-2.0.txt

Run the script and test the results

  1. add the jars to your classpath, e.g. on Linux with:

export CLASSPATH=`ls -1 *.jar | tr '\n' ':'`

  1. make sure the Wikidata files are saved

2.1 ID mappings

A set of SPARQL queries have been compiled and saved in the wikidata/ folder. These queries can be manually executed at http://query.wikidata.org/. These queries download mappings from Wikidata for OMIM (omim.rq), Disease Ontology (do.rq), UMLS CUI (cui.rq), Orphanet (orpha.rq), MeSH descriptor IDs (mesh.rq)-> coming soon.

However, you can also use the below curl command line operations.

curl -H "Accept: text/csv" --data-urlencode query@wikidata/omim.rq -G https://query.wikidata.org/bigdata/namespace/wdq/sparql -o omim2wikidata.csv curl -H "Accept: text/csv" --data-urlencode query@wikidata/do.rq -G https://query.wikidata.org/bigdata/namespace/wdq/sparql -o do2wikidata.csv curl -H "Accept: text/csv" --data-urlencode query@wikidata/cui.rq -G https://query.wikidata.org/bigdata/namespace/wdq/sparql -o cui2wikidata.csv curl -H "Accept: text/csv" --data-urlencode query@wikidata/orpha.rq -G https://query.wikidata.org/bigdata/namespace/wdq/sparql -o orpha2wikidata.csv curl -H "Accept: text/csv" --data-urlencode query@wikidata/mesh.rq -G https://query.wikidata.org/bigdata/namespace/wdq/sparql -o mesh2wikidata.csv curl -H "Accept: text/csv" --data-urlencode query@wikidata/icd9.rq -G https://query.wikidata.org/bigdata/namespace/wdq/sparql -o icd92wikidata.csv curl -H "Accept: text/csv" --data-urlencode query@wikidata/icd10.rq -G https://query.wikidata.org/bigdata/namespace/wdq/sparql -o icd102wikidata.csv curl -H "Accept: text/csv" --data-urlencode query@wikidata/icd11.rq -G https://query.wikidata.org/bigdata/namespace/wdq/sparql -o icd112wikidata.csv curl -H "Accept: text/csv" --data-urlencode query@wikidata/mondo.rq -G https://query.wikidata.org/bigdata/namespace/wdq/sparql -o mondo2wikidata.csv

4.2 Get Disease Labels

With a similar SPARQL query (names.rq) the disease labels (English only) can be downloaded as simple TSV and saved as "names2wikidata.tsv" (note that this file is TAB separated):

curl -H "Accept: text/tab-separated-values" --data-urlencode query@wikidata/names.rq -G https://query.wikidata.org/bigdata/namespace/wdq/sparql -o names2wikidata.tsv

  1. Update the createDerby.groovy file with the new version numbers ("DATASOURCEVERSION" field) and run the script with Groovy: #Update line

export CLASSPATH=`ls -1 *.jar | tr '\n' ':'` groovy createDerby.groovy

  1. Test the resulting Derby file by opening it in PathVisio

  2. Use the BridgeDb QC tool to compare it with the previous mapping file

The BridgeDb repository has a tool to perform quality control (qc) on ID mapping files:

sh qc.sh old.bridge new.bridge

  1. Upload the data to Figshare and update the following pages:
  • http://www.bridgedb.org/mapping-databases/hmdb-metabolite-mappings/ #Update link
  • http://bridgedb.org/data/gene_database/ #Update link
  1. Tag this repository with the DOI of the latest release.

To ensure we know exactly which repository version was used to generate a specific release, the latest commit used for that release is tagged with the DOI on Figshare. To list all current tags:

git tag

To make a new tag, run:

git tag $DOR `

where $DOI is replaced with the DOI of the release.

  1. Inform downstream projects

At least the following projects need to be informed about the availability of the new mapping database:

  • BridgeDb webservice
  • WikiPathways RDF generation team (Jenkins server)
  • WikiPathways indexer (supporting the WikiPathways web service)

References

  1. http://bridgedb.org/
  2. Van Iersel, M. P., Pico, A. R., Kelder, T., Gao, J., Ho, I., Hanspers, K., Conklin, B. R., Evelo, C. T., Jan. 2010. The BridgeDb framework: standardized access to gene, protein and metabolite identifier mapping services. BMC bioinformatics 11 (1), 5+. http://dx.doi.org/10.1186/1471-2105-11-5
  3. Vrandečić, Denny. "Wikidata: a new platform for collaborative data collection." Proceedings of the 21st International Conference on World Wide Web. ACM, 2012. https://doi.org/10.1145/2187980.2188242
  4. Mietchen D, Hagedorn G, Willighagen E, Rico M, Gómez-Pérez A, Aibar E, Rafes K, Germain C, Dunning A, Pintscher L, Kinzler D (2015) Enabling Open Science: Wikidata for Research (Wiki4R). Research Ideas and Outcomes 1: e7573. https://doi.org/10.3897/rio.1.e7573

Owner

  • Name: De
  • Login: DeniseSl22
  • Kind: user
  • Location: Maastricht
  • Company: UM @BiGCAT-UM

PhD candidate in chem/bio/informatics @UM/BiGCaT

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: BridgeDb
message: 'If you use this software, please cite it as below.'
type: software
authors:
  - given-names: 'Martijn P.'
    family-names: Van Iersel
  - given-names: Alexander R.
    family-names: Pico
  - given-names: Thomas
    family-names: Kelder
  - given-names: Jianjiong
    family-names: Gao
  - given-names: Ho
    family-names: Isaac
  - given-names: Kristina
    family-names: Hanspers
  - given-names: Bruce R.
    family-names: Conklin
  - given-names: Chris T.
    family-names: Evelo
identifiers:
  - type: url
    value: 'https://github.com/bridgedb/BridgeDb'
    description: Source code repository for BridgeDb
  - type: doi
    value: 10.1186/1471-2105-11-5
    description: >-
      Article: The BridgeDb framework: standardized
      access to gene, protein and metabolite
      identifier mapping services
repository-code: 'https://github.com/bridgedb/BridgeDb'
url: 'https://bridgedb.github.io/'
preferred-citation:
  type: article
  authors:
  - given-names: 'Martijn P.'
    family-names: Van Iersel
  - given-names: Alexander R.
    family-names: Pico
  - given-names: Thomas
    family-names: Kelder
  - given-names: Jianjiong
    family-names: Gao
  - given-names: Ho
    family-names: Isaac
  - given-names: Kristina
    family-names: Hanspers
  - given-names: Bruce R.
    family-names: Conklin
  - given-names: Chris T.
    family-names: Evelo
  doi: "10.1186/1471-2105-11-5"
  journal: "BMC Bioinformatics"
  title: "The BridgeDb framework: standardized access to gene, protein and metabolite identifier mapping services"
  issue: 5
  volume: 11
  year: 2010
keywords:
  - identifier mapping
  - Genes
  - Proteins
  - Metabolites
  - Biological data
license: Apache-2.0
version: 3.0.13
date-released: '2022-01-14'

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: 12 months ago

All Time
  • Total issues: 3
  • Total pull requests: 3
  • Average time to close issues: over 1 year
  • Average time to close pull requests: 13 days
  • Total issue authors: 2
  • Total pull request authors: 2
  • Average comments per issue: 1.33
  • Average comments per pull request: 1.33
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • DeniseSl22 (2)
  • egonw (1)
Pull Request Authors
  • egonw (1)
  • hbasaric (1)
Top Labels
Issue Labels
Pull Request Labels