https://github.com/bgeedb/wikidata_bgeedb-bot

The BgeeDB Wikidata bot

https://github.com/bgeedb/wikidata_bgeedb-bot

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 6 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.7%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

The BgeeDB Wikidata bot

Basic Info
  • Host: GitHub
  • Owner: BgeeDB
  • License: cc0-1.0
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 461 KB
Statistics
  • Stars: 1
  • Watchers: 5
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 5 years ago · Last pushed over 1 year ago
Metadata Files
Readme License

README.md

DOI DOI Bluesky Mastodon

Wikidata BgeeDB-bot

This software tool is a bot to insert gene expression data from the Bgee database into Wikidata. Currently, only existing wikidata gene entries from Ensembl and wikidata anatomic entities (e.g. stomach) with a cross-reference to UBERON ontology are considered (including Cell ontology). This bot inserts to wikidata gene pages "expressed in" statements. For example, see the statement "expressed in" at BRAF gene wikidata page: https://www.wikidata.org/wiki/Q17853226.

Note that at most 10 "expression in" statements are included per gene page. The 10 exclusive UBERON anatomic entities (terms prefixed with UBERON and CL) where the gene is expressed.

Editing and generating configuration file

The properties.template contains all variables needed to be set up for running this bot. Variables: * WDUSER: the wikidata username. * WDPASS: the wikidata password.

  • EXPRESSIONCALLSFILE: the TSV file path containing the "is expressed in" relations to insert in Wikidata ordered by descending gene expression score. When considering the EasyBgee v14.2, we can execute the SQL query getorderedisexpressedin over the EasyBgee MySQL database to generate the "is expressed in" relations as a TSV file with the following heading: gene_id uberon_id where UBERON ids are defined by removing their prefix UBERON: when it exists (e.g. UBERON:0002369 => 0002369) and for the other ids that are not prefixed with UBERON:, the : is replaced with _ such as the following example: modified from CL:0000711 to CL_0000711. For example, an INPUT_BGEE_DATA_TSV file with two entries is show below. gene_id uberon_id ENSMUSG00000000001 0002369 ENSMUSG00000000001 CL_0000711

For further information about the variables to set, refer to the properties.template. Before executing any make command this file must be renamed from properties.template to properties.

After editing the properties file, if you do not have pipenv installed in your python3.10 (or superior) interpreter, run first the make command below in the current project directory.

make install_pipenv If pipenv is already installed, run the make command below in the current project directory:

make REMARK A temporary file called count.tmp is generated to be able to restart the execution from the last successful Wikidata insertion. To rerun the bot from the beginning, this file must be removed.

DEPRECATED Execute the make command line below to generate the relations "is expressed in" for human and mouse genes from EasyBgee v14.2 make get_input_expression_data The output file is placed at the file path defined in the EXPRESSION_CALLS_FILE variable.

Owner

  • Name: BgeeDB
  • Login: BgeeDB
  • Kind: organization

GitHub Events

Total
  • Push event: 2
Last Year
  • Push event: 2