GeneScape

GeneScape: A Python package for gene ontology visualization - Published in JOSS (2024)

https://github.com/ialbert/genescape-central

Science Score: 98.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 8 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Scientific Fields

Sociology Social Sciences - 87% confidence
Last synced: 4 months ago · JSON representation ·

Repository

Gene Ontology subgraph visualizations

Basic Info
  • Host: GitHub
  • Owner: ialbert
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 118 MB
Statistics
  • Stars: 21
  • Watchers: 3
  • Forks: 0
  • Open Issues: 3
  • Releases: 5
Created about 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme Contributing License Citation

README.md

GeneScape: Gene Function Visualization

DOI

GeneScape is a software tool for visualizing gene functions. Users enter a list of genes, the software then draws a subgraph of the Gene Ontology (GO) terms associated with the genes.

GeneScape is a Python-based Shiny application that be run both at the command line and also via a graphical user interface.

GeneScape Tree

Publication:

The Shiny version of the software can be accessed at:

  • https://biostar.shinyapps.io/genescape/

Usage limits may apply to the public interface. For unlimited use, install and run the software locally.

Local installation

Users can also run the program on their system by installing the software via pip:

console pip install genescape

After installation, the Shiny interface can be started via:

console genescape web

Visit the http://localhost:8000 URL in your browser to see the interface.

Command line use

The program can also be used at the command line to generate images or annotations:

  • genescape tree draws informative Gene Ontology (GO) subgraphs
  • genescape annotate annotates a list of genes with GO functions
  • genescape web provides a web interface for the tree command

What does GeneScape do?

GeneScape works the following way:

  1. It first reads genes from an Input List
  2. Then extracts the Annotations associated with the input genes
  3. Finally, it builds and visualizes the functional subtree tree based on these Annotations.

Note: Even short lists of genes (under ten genes) can create large trees. Filter by minimum coverage (how many genes share the function) or functional patterns (functions that match a pattern).

GeneScape will try to find a reasonable coverage threshold when that threshold is not explicitly specified.

Node Labeling

The labels in the graph carry additional information on the number of genes in the input that carry that function and are an indicator of the specificity of the function in the organism. For example, the label:

GO:0004866 endopeptidase inhibitor activity [39] (1/5)

The label indicates that the function endopeptidase inhibitor activity was seen as an annotation to 39 of all genes in the original association file (for humans, there are over 19K gene symbols). Thus, the [39] is a characteristic of annotation of the organism.

The (1/5) means that 1 out of 5 genes in the input list carry this annotation. Thus, the value is a characteristic of the input list. The mincov (minimum coverage) filter is applied to the coverage value to filter out functions under a threshold.

Node Coloring

The colors in the tree carry additional meaning:

  • Light green nodes represent functions that are in the input list.
  • Dark green nodes are functions present in the input and are leaf nodes in the terminology, the most granular annotation possible

A dark green means the term is a leaf node, the most specific annotation possible. In both cases, the green color indicates that the function was present in the input list.

Each subtree in a different GO category has a different color: - Biological Process (BP) - Molecular Function (MF) - Cellular Component (CC)

The subtree coloring is meant to help you understand the level of detail and the specificity of the functional terms you visualize.

Numbers such as 1/4 mean how many genes in the input carry that function.

Reducing the tree size

The trees can get huge, even for a small number of genes.

One can greatly reduce the size of the graph by removing functions that are not well represented in the input list or by focusing the graph to contain only functions that match a pattern.

Setting the mincov to 2 or higher is often enough to simplify the graph to a manageable size.

The filtering conditions that users can apply are:

  1. a pattern that matches the Function columns
  2. a minimum Coverage that means the minimum number of genes that carry that function
  3. a GO subtree

Filters are applied during the annotation step and will filter the GO terms derived from the gene list.

In the Shiny interface, use the coverage filter to remove functions not well represented in the input list. Recall that coverage represents the number of genes in the input list that carry that function. You can see the counts for each annotation in the Function Annotations box as the first column.

Command line requirements

The graphviz software must be installed to generate images from the command line. You can install it via conda

console conda install graphviz

or via apt or brew.

Those unable to install the graphviz package can save the output as a .dot file:

console genescape tree --test -o output.dot

Then use an online tool like viz-js to visualize the graph.

genescape tree

We packaged test data with the software so you can test it like so:

console genescape tree --test

Which will generate a tree visualization of the test data.

GeneScape output

Reducing the graph size

We can pass the tree visualizer a list of genes or a list of GO IDs, or even a mix of both.

We run the tree command to visualize the relationships between the GO terms that include all coverages:

console genescape tree genes.txt --mincov 1

The resulting functional graph might be huge for many (most) gene lists.

The software will try to find a reasonable coverage threshold for the input genes if no coverage is specified.

GeneScape output

We can narrow down the visualization in multiple ways; for example, we can select only terms that match the word lipid :

console genescape tree -m lipid --mincov 2 genes.txt

When filtered as shown above, the output is much more manageable:

GeneScape output

genetrack annotation

The annotator operates on gene names. Suppose you have a list of gene names in the format:

Cyp1a1 Sphk2 Sptlc2 Smpd3

The command:

console genescape annotate genelist.txt

will produce the output:

Coverage,Function,GO,Genes 3,protein binding,GO:0005515,CYP1A1|SMPD3|SPHK2 2,cytoplasm,GO:0005737,SMPD3|SPHK2 2,mitochondrial inner membrane,GO:0005743,CYP1A1|SPHK2 2,endoplasmic reticulum membrane,GO:0005789,CYP1A1|SPTLC2 2,sphingolipid biosynthetic process,GO:0030148,SPHK2|SPTLC2 2,intracellular membrane-bounded organelle,GO:0043231,CYP1A1|SPHK2 2,sphingosine biosynthetic process,GO:0046512,SPHK2|SPTLC2

genescape build

The software is currently packaged indices for a number of organisms.

To build an index for a different organism, download the GAF association file from the Gene Ontology website.

  • https://geneontology.org/docs/download-go-annotations/

To build the new index use:

console genescape build --gaf mydata.gaf.gz --obo go.basic.gz -i mydata.index.gz

To use the custom index, pass the -i (--index) option to any of the commands, web, tree and annotate like so:

console genescape web --index mydata.index.gz

See the --help for more options.

Odds and ends

It is possible to mix gene and ontology terms. The following is a valid input:

console GO:0005488 GO:0005515 Cyp1a1 Sphk2 Sptlc2

Testing

Tests are run via a Makefile as:

console make test

Additional customizations

The software can be customized by creating a copy of the config.toml file and setting the GENESCAPE_CONFIG environment variable to point to the new configuration file.

In this file the lines that have an index type will be used to build the dropdown menu in the web interface.

Contributing

See CONTRIBUTING.md for information on how to contribute to the development of GeneScape.

Citation:

@article{GeneScape, author = {Albert, Istvan}, doi = {10.21105/joss.06624}, journal = {Journal of Open Source Software}, month = jun, number = {98}, pages = {6624}, title = {{GeneScape: A Python package for gene ontology visualization}}, url = {https://joss.theoj.org/papers/10.21105/joss.06624}, volume = {9}, year = {2024} }

License

genescape is distributed under the terms of the MIT license.

Owner

  • Name: Istvan Albert
  • Login: ialbert
  • Kind: user
  • Location: State College, PA

JOSS Publication

GeneScape: A Python package for gene ontology visualization
Published
June 05, 2024
Volume 9, Issue 98, Page 6624
Authors
Istvan Albert ORCID
Bioinformatics Consulting Center, Pennsylvania State University, United States of America, Department of Biochemistry and Molecular Biology, Pennsylvania State University, United States of America
Editor
AHM Mahfuzur Rahman ORCID
Tags
biology bioinformatics functional analysis

Citation (CITATION.cff)

cff-version: "1.2.0"
authors:
- family-names: Albert
  given-names: Istvan
  orcid: "https://orcid.org/0000-0001-8366-984X"
doi: 10.5281/zenodo.11245264
message: If you use this software, please cite our article in the
  Journal of Open Source Software.
preferred-citation:
  authors:
  - family-names: Albert
    given-names: Istvan
    orcid: "https://orcid.org/0000-0001-8366-984X"
  date-published: 2024-06-05
  doi: 10.21105/joss.06624
  issn: 2475-9066
  issue: 98
  journal: Journal of Open Source Software
  publisher:
    name: Open Journals
  start: 6624
  title: "GeneScape: A Python package for gene ontology visualization"
  type: article
  url: "https://joss.theoj.org/papers/10.21105/joss.06624"
  volume: 9
title: "GeneScape: A Python package for gene ontology visualization"

GitHub Events

Total
  • Issues event: 1
  • Watch event: 3
  • Issue comment event: 1
Last Year
  • Issues event: 1
  • Watch event: 3
  • Issue comment event: 1

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 324
  • Total Committers: 2
  • Avg Commits per committer: 162.0
  • Development Distribution Score (DDS): 0.006
Past Year
  • Commits: 5
  • Committers: 1
  • Avg Commits per committer: 5.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Istvan Albert i****t@g****m 322
aswathy a****b@g****m 2

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 18
  • Total pull requests: 0
  • Average time to close issues: 3 days
  • Average time to close pull requests: N/A
  • Total issue authors: 4
  • Total pull request authors: 0
  • Average comments per issue: 2.22
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 2
  • Pull requests: 0
  • Average time to close issues: about 3 hours
  • Average time to close pull requests: N/A
  • Issue authors: 2
  • Pull request authors: 0
  • Average comments per issue: 1.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • j-andrews7 (12)
  • sridhar0605 (4)
  • sartoriusana (1)
  • zx8754 (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 28 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 19
  • Total maintainers: 1
pypi.org: genescape
  • Versions: 19
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 28 Last month
Rankings
Dependent packages count: 10.1%
Average: 38.7%
Dependent repos count: 67.4%
Maintainers (1)
Last synced: 4 months ago

Dependencies

pyproject.toml pypi
  • click *
  • networkx *
  • pydot *
  • pygraphviz *
.github/workflows/draft-pdf.yml actions
  • actions/checkout v4 composite
  • actions/upload-artifact v1 composite
  • openjournals/openjournals-draft-action master composite
src/genescape/shiny/tree/requirements.txt pypi
  • genescape *