https://github.com/akikuno/tsumugi-dev

Web tool for visualizing phenotype-similarity gene networks

https://github.com/akikuno/tsumugi-dev

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.5%) to scientific vocabulary

Keywords

bioinformatics impc knockout-mouse network phenotype-genotype-information web-tool
Last synced: 5 months ago · JSON representation

Repository

Web tool for visualizing phenotype-similarity gene networks

Basic Info
Statistics
  • Stars: 3
  • Watchers: 1
  • Forks: 0
  • Open Issues: 21
  • Releases: 13
Topics
bioinformatics impc knockout-mouse network phenotype-genotype-information web-tool
Created about 4 years ago · Last pushed 6 months ago
Metadata Files
Readme License

README.md

Tsumugi Logo

License Test DOI Contact

日本語版 README はこちら

TSUMUGI (Trait-driven Surveillance for Mutation-based Gene module Identification) is a web tool that leverages knockout (KO) mouse phenotype data from the International Mouse Phenotyping Consortium (IMPC) to extract and visualize gene modules based on phenotypic similarity.

The tool is publicly available online for anyone to use 👇️

🔗 https://larc-tsukuba.github.io/tsumugi/

TSUMUGI derives from the Japanese concept of "weaving together gene groups that form phenotypes."

📖 How to Use TSUMUGI

💬 Top Page

TSUMUGI supports three types of input:

1. Phenotype

When you input a phenotype of interest, TSUMUGI searches for gene groups with similar overall phenotype profiles among genes whose KO mice exhibit that phenotype.
Phenotype names are based on Mammalian Phenotype Ontology (MPO).

List of currently searchable phenotypes in TSUMUGI:
👉 Phenotype List

2. Gene

When you specify a single gene, TSUMUGI searches for other gene groups whose KO mice have similar phenotype profiles to that gene's KO mice.
Gene names follow gene symbols registered in MGI.

List of currently searchable gene names in TSUMUGI:
👉 Gene List

3. Gene List

Accepts input of multiple genes.
Gene lists should be entered separated by line breaks.

[!NOTE] Gene List differs from single Gene input in that it extracts phenotypically similar genes among the genes within the list.

[!CAUTION] If no phenotypically similar genes are found, No similar phenotypes were found among the entered genes. alert will be displayed and processing will stop.

If phenotypically similar genes exceed 200, Too many genes submitted. Please limit the number to 200 or fewer. alert will be displayed and processing will stop to prevent browser overload.

📥 Raw Data Download (TSUMUGI_{version}_raw_data)

You can download raw data of phenotypic similarity between gene pairs (in Gzip-compressed CSV format or Parquet format).

Contents include:

  • Paired gene names (Gene1, Gene2)
  • Phenotypic similarity between pairs (Jaccard Similarity)
  • Number of shared phenotypes between pairs (Number of shared phenotype)
  • List of shared phenotypes between pairs (List of shared phenotype)

[!CAUTION] File size is approximately 50-100MB. Download may take some time.

We recommend using Parquet format when working with Polars or Pandas.
You can load the data as follows:

Polars

```bash

Install Polars and PyArrow using conda

conda create -y -n env-tsumugi polars pyarrow conda activate env-tsumugi ```

```python

Load Parquet file using Polars

import polars as pl dftsumugi = pl.readparquet("TSUMUGI{version}raw_data.parquet") ```

Pandas

```bash

Install Pandas and PyArrow using conda

conda create -y -n env-tsumugi pandas pyarrow conda activate env-tsumugi ```

```python

Load Parquet file using Pandas

import pandas as pd dftsumugi = pd.readparquet("TSUMUGI{version}raw_data.parquet") ```

🌐 Network Visualization

Based on the input, the page transitions and the network is automatically drawn.

[!IMPORTANT] Gene pairs with 2 or more shared abnormal phenotypes AND phenotypic similarity of 0.2 or higher are subject to visualization.

Network Panel

Nodes (Points)

Each node represents one gene.
Clicking displays a list of abnormal phenotypes observed in that gene's KO mice.
You can freely adjust positions by dragging.

Edges (Lines)

Clicking an edge shows details of shared phenotypes.

Control Panel

The left control panel allows you to adjust network display.

Filter by Phenotypic Similarity

The Phenotypes similarity slider allows you to set thresholds for gene pairs displayed in the network based on edge phenotypic similarity (Jaccard coefficient).
Similarity minimum and maximum values are converted to a 1-10 scale, allowing 10-level filtering.

[!NOTE] For details on phenotypic similarity, please see:
👉 🔍 Calculation Method for Phenotypically Similar Gene Groups

Filter by Phenotype Severity

The Phenotype severity slider allows you to adjust node display based on phenotype severity (effect size) in KO mice.
Higher effect sizes indicate stronger phenotypic impact.
This also scales the effect size range to 1-10, allowing 10-level filtering.

[!NOTE] When IMPC phenotype evaluation is binary (present/absent) (e.g., abnormal embryo development: list of binary phenotypes available here) or when gene name is input, the Phenotypes severity slider is not available.

Specify Genotype

You can specify the genotype of KO mice exhibiting phenotypes:

  • Homo: Phenotypes seen in homozygous mice
  • Hetero: Phenotypes seen in heterozygous mice
  • Hemi: Phenotypes seen in hemizygous mice

Specify Sex

You can extract sex-specific phenotypes:

  • Female: Female-specific phenotypes
  • Male: Male-specific phenotypes

Specify Life Stage

You can specify life stages when phenotypes appear:

  • Embryo: Phenotypes appearing during embryonic stage
  • Early: Phenotypes appearing at 0-16 weeks of age
  • Interval: Phenotypes appearing at 17-48 weeks of age
  • Late: Phenotypes appearing at 49+ weeks of age

Markup Panel

Highlight Human Disease-Related Genes (Highlight: Human Disease)

You can highlight genes related to human diseases.
The relationship between KO mice and human diseases uses public data from IMPC Disease Models Portal.

Search Gene Names (Search: Specific Gene)

You can search for gene names included in the network.

Adjust Network Display Style (Layout & Display)

You can adjust the following elements:

  • Network layout (layout)
  • Font size (Font size)
  • Edge thickness (Edge width)
  • Distance between nodes (*Cose layout only) (Node repulsion)

Export

You can export current network images and data in PNG, CSV and GraphML formats.
CSV includes connected component (module) IDs and lists of phenotypes shown by each gene's KO mice.
GraphML is a format compatible with the desktop version of Cytoscape, allowing you to import the network into Cytoscape for further analysis.

🔍 Calculation Method for Phenotypically Similar Gene Groups

Data Source

IMPC dataset uses statistical-results-ALL.csv.gz from Release-23.0.
Information about columns included in the dataset: Data fields

Preprocessing

Extract gene-phenotype pairs where KO mouse phenotype P-values (p_value, female_ko_effect_p_value, or male_ko_effect_p_value) are 0.0001 or below.
- Genotype-specific phenotypes are annotated with homo, hetero, or hemi - Sex-specific phenotypes are annotated with female or male

Phenotypic Similarity Calculation

Jaccard coefficient is used as the phenotypic similarity metric.
This is a similarity measure that expresses the proportion of shared phenotypes as a 0-1 numerical value.

Jaccard(A, B) = |A ∩ B| / |A ∪ B|

For example, suppose gene A and gene B KO mice have the following abnormal phenotypes:

A: {abnormal embryo development, abnormal heart morphology, abnormal kidney morphology} B: {abnormal embryo development, abnormal heart morphology, abnormal lung morphology}

In this case, there are 2 shared phenotypes and 4 total unique phenotypes, so the Jaccard coefficient is calculated as follows:

Jaccard(A, B) = 2 / 4 = 0.5

✉️ Contact

For questions or requests, please feel free to contact us:

Owner

  • Name: Akihiro Kuno
  • Login: akikuno
  • Kind: user
  • Location: Tsukuba, Ibaraki, Japan
  • Company: University of Tsukuba

Bioinformatician working at the Laboratory Animal Resource Center

GitHub Events

Total
  • Create event: 17
  • Release event: 8
  • Issues event: 100
  • Watch event: 2
  • Delete event: 8
  • Member event: 2
  • Issue comment event: 65
  • Push event: 270
  • Pull request event: 20
Last Year
  • Create event: 17
  • Release event: 8
  • Issues event: 100
  • Watch event: 2
  • Delete event: 8
  • Member event: 2
  • Issue comment event: 65
  • Push event: 270
  • Pull request event: 20

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 59
  • Total pull requests: 10
  • Average time to close issues: about 1 month
  • Average time to close pull requests: less than a minute
  • Total issue authors: 1
  • Total pull request authors: 1
  • Average comments per issue: 0.73
  • Average comments per pull request: 0.0
  • Merged pull requests: 8
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 57
  • Pull requests: 10
  • Average time to close issues: 30 days
  • Average time to close pull requests: less than a minute
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 0.72
  • Average comments per pull request: 0.0
  • Merged pull requests: 8
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • akikuno (55)
Pull Request Authors
  • akikuno (9)
Top Labels
Issue Labels
request💡 (29) solved✅️ (20) bug🐛 (15) maintenance✨️ (7) UI🎨 (5) project🎯 (2) question❓ (1) enhancement🚀 (1) experiment🧪 (1)
Pull Request Labels