Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.6%) to scientific vocabulary
Last synced: 9 months ago · JSON representation ·

Repository

SEgene package

Basic Info
  • Host: GitHub
  • Owner: hamamoto-lab
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 1.31 MB
Statistics
  • Stars: 1
  • Watchers: 2
  • Forks: 1
  • Open Issues: 0
  • Releases: 15
Created over 1 year ago · Last pushed 11 months ago
Metadata Files
Readme Changelog License Citation

README.md

SEgene

(For the Japanese version of this README, please see README_ja.md.)

SEgene (pronounced "S-E-gene") is a platform designed to identify and analyze Super-Enhancer-to-gene links (SE-to-gene links) by incorporating the peak-to-gene links approach, a statistical method that uncovers correlations between genes and peak regions. This repository contains tools and scripts for SEgene.

Features

  • Analyze the relationship between super-enhancers and genes
  • Visualize data using graph theory
  • Interactive analysis with Jupyter Notebook

Program Structure

SEgene currently consists of four primary components (P2GL data preparation, P2GL correlation analysis, Super-enhancer analysis, and Region evaluation analysis). Additionally, development versions of programs are available and are continuously being developed and enhanced. See the Development Versions section for more information.

Workflow Overview

The following diagram illustrates the complete workflow of SEgene:

```mermaid graph TD %% --- Definitions ------------------------- classDef tool fill:#d8eefe,stroke:#1c4b82,stroke-width:1px,rx:10,ry:10 classDef data fill:#ffffff,stroke:#1c4b82,stroke-dasharray:4 4 classDef annotation fill:#ffe0b2,stroke:#1c4b82,stroke-dasharray:4 4 classDef toolLabel font-size:12px,font-weight:bold

%% --- Data Sources ----------------- A1["ChIP-seq
BAM files"]:::data A2["Gene expression
(Salmon TPM)"]:::data A3["Region files
(BED/SAF/merge_SE)"]:::annotation A4["Gene annotation
(GTF)"]:::annotation

%% --- STEP 1: Data Preparation --------------- subgraph Step1["Step 1: P2GL Preparation"] direction TB P1["SEgenepeakprep"]:::tool P2["SEgenegeneprep"]:::tool A1 --> P1 A3 --> P1 A2 --> P2 A4 --> P2 end

%% --- STEP 2: Peak-to-Gene Analysis -------- subgraph Step2["Step 2: P2GL Analysis"] direction TB Q1["Peak signal TSV"]:::data Q2["Processed gene TPM"]:::data P3["peaktogene_links"]:::tool P1 --> Q1 P2 --> Q2 Q1 & Q2 --> P3 end

%% --- STEP 3: SE-to-Gene Analysis ------------- subgraph Step3["Step 3: SE Analysis"] Q3["Peak-gene correlation"]:::data P4["SEtogene_links"]:::tool Q4["SE regions
(mergeSE.tsv)"]:::data P3 --> Q3 Q3 & Q2 --> P4 P4 --> Q4 end

%% --- STEP 4: Region Analysis --------------- subgraph Step4["Step 4: Region Evaluation"] P5["SEgene_RegionAnalyzer"]:::tool Q5["Analysis results"]:::data Q4 --> P5 P5 --> Q5 end

%% --- Click Events -------------- click P1 "https://github.com/hamamoto-lab/SEgene/tree/main/SEgenepeakprep" click P2 "https://github.com/hamamoto-lab/SEgene/tree/main/SEgenegeneprep" click P3 "https://github.com/hamamoto-lab/SEgene/tree/main/peaktogenelinks" click P4 "https://github.com/hamamoto-lab/SEgene/tree/main/SEtogenelinks" click P5 "https://github.com/hamamoto-lab/SEgene/tree/main/SEgeneregionanalyzer"

```

P2GL Data Preparation

  • SEgene_peakprep
    Quantifies and normalizes signal values from ChIP‑seq data (BAM files) for specified genomic regions
    (supports Standard log2‑CPM, edgeR‑normalized CPM, and BigWig methods).
    For edgeR‑normalized CPM details, see
    SEgene_peakprep/cpm_calcnorm_README.md
  • SEgene_geneprep: Adds genomic region information to RNA-seq data (TPM-TSV files) for P2GL input preparation.

P2GL Correlation Analysis

  • peaktogene_links: Integrates ChIP-seq data with gene expression data to obtain correlation information between enhancer peaks and gene expression.

Super-enhancer Analysis

  • SEtogene_links: Evaluates and analyzes super-enhancers using the correlation information obtained through P2GL.
  • cli_tools (auxiliary tools): Analyzes correlations between SE regions identified by SEtogene_links and gene expression.

Region Evaluation Analysis (Optional)

  • SEgene_RegionAnalyzer: Allows for detailed characterization of identified SE regions and their integration with public databases.

Development Versions

  • SEgene_analyzer: Development version of SEgeneregionanalyzer with enhanced CLI interface, modern Python packaging, and advanced features including caching and comprehensive testing.
  • SEgeneanalyzererna: eRNAbase-specific analysis tool derived from SEgeneregionpackage, providing specialized functionality for eRNA database integration and region overlap analysis with enhanced CLI interface.

Usage

For installation and usage instructions, please refer to the respective README files:

Changelog

This project follows Semantic Versioning, and major updates are documented in:

Refer to these files for information on added features, fixes, and other notable changes in each version.

Libraries and Licenses

This project imports and relies on a variety of open-source libraries. Below is a list of the libraries used and their respective licenses:

Python

R

Python‑R Interop

  • rpy2 – GNU GPL v2+ License
    Provides the Python ↔︎ R interface used by the edgeR‑normalized CPM feature.

Julia

Genomics Tools

  • Bedtools - MIT License
    Bedtools is used for genome arithmetic operations and is accessed via the Python wrapper library PyBedTools.
  • samtools - MIT License
    samtools is used for BAM file analysis and gathering statistics.
  • featureCounts - GPL License
    featureCounts is used for counting reads in defined genomic regions.
  • deeptools - BSD License
    deeptools is used for BAM to bigWig conversion and signal extraction in the BigWig method.

For a full list of dependencies of SEtogene_links, refer to SEtogene_links/environment.yml.

Base Image and Dependency Management

This project uses Miniforge3 for dependency management in SEtogene_links. Miniforge3 is a minimal installer for Conda Forge, which provides a community-driven collection of packages for Conda.

Usage Environments (SEtogene_links)

  • Docker: The SEtogene_links component uses the condaforge/miniforge3 Docker image as its base.
  • Standalone Installation: Miniforge3 can also be installed directly on your local system.

Package Sources

  • Conda Forge
    • Conda Forge provides a community-maintained collection of packages with wide platform support.
    • Licensed under the BSD 3-Clause License.
  • Bioconda
    • A channel for the Conda package manager specializing in bioinformatics software.

Citation

If you use this tool in your research, please cite:

Shinkai, N., Asada, K., Machino, H., Takasawa, K., Takahashi, S., Kouno, N., Komatsu, M., Hamamoto, R., & Kaneko, S. (2025). SEgene identifies links between super enhancers and gene expression across cell types. npj Systems Biology and Applications, 11(1), 49. https://doi.org/10.1038/s41540-025-00533-x

For detailed citation information and additional references, please refer to the CITATION file.

License

This program is released under the MIT License. For more details, please refer to the LICENSE file.

Owner

  • Name: hamamoto-lab
  • Login: hamamoto-lab
  • Kind: organization

Citation (CITATION)

# Citation

**Please cite the following paper when using this repository:**

Shinkai, N., Asada, K., Machino, H., Takasawa, K., Takahashi, S., Kouno, N., Komatsu, M., Hamamoto, R., & Kaneko, S. (2025). SEgene identifies links between super enhancers and gene expression across cell types. *npj Systems Biology and Applications*, 11(1), 49. https://doi.org/10.1038/s41540-025-00533-x

## BibTeX

```bibtex
@article{shinkai2025segene,
  title={SEgene identifies links between super enhancers and gene expression across cell types},
  author={Shinkai, Norio and Asada, Ken and Machino, Hidenori and Takasawa, Ken and Takahashi, Satoshi and Kouno, Nobuji and Komatsu, Masaaki and Hamamoto, Ryuji and Kaneko, Syuzo},
  journal={npj Systems Biology and Applications},
  volume={11},
  number={1},
  pages={49},
  year={2025},
  publisher={Nature Publishing Group},
  doi={10.1038/s41540-025-00533-x}
}
```

## Data Availability

Test sample data and additional resources are available on Figshare: https://doi.org/10.6084/m9.figshare.28171127

# Additional Information

## Methodology

In addition, for the method implemented in the `peak_to_gene_links` folder, we have codified the approach based on the methodology described in the following paper:

- Corces MR, Granja JM, Shams S, Louie BH, Seoane JA, Zhou W, Silva TC, Groeneveld C, Wong CK, Cho SW, Satpathy AT, Mumbach MR, Hoadley KA, Robertson AG, Sheffield NC, Felau I, Castro MAA, Berman BP, Staudt LM, Zenklusen JC, Laird PW, Curtis C, Greenleaf WJ, Chang HY; Cancer Genome Atlas Analysis Network. The chromatin accessibility landscape of primary human cancers. *Science*. 2018 Oct 26;362(6413):eaav1898. doi: [10.1126/science.aav1898](https://www.science.org/doi/10.1126/science.aav1898). PMID: 30361341; PMCID: PMC6408149.

## Data Sources

Additionally, the test sample data linked from this repository on Figshare is derived from data obtained from the following public dataset:

- **GSE156614**: Available at [NCBI GEO](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE156614).

This data was processed and used to generate the test samples in this repository, as described in the following publications:

- Li QL, Lin X, Yu YL, Chen L, Qi-Xin Hu, Meng Chen, Nan Cao, Chen Zhao, Chen-Yu Wang, Cheng-Wei Huang, Lian-Yun Li, Mei Ye, Min Wu. Genome-wide profiling in colorectal cancer identifies PHF19 and TBC1D16 as oncogenic super enhancers. Nat Commun (IF: 14.92; Q1). 2021 Nov 4;12(1):6407. doi: [10.1038/s41467-021-26600-5](https://www.nature.com/articles/s41467-021-26600-5). PMID: 34737287; PMCID: PMC8568941.

GitHub Events

Total
  • Release event: 12
  • Watch event: 1
  • Push event: 15
  • Public event: 1
  • Create event: 13
Last Year
  • Release event: 12
  • Watch event: 1
  • Push event: 15
  • Public event: 1
  • Create event: 13

Dependencies

SE_to_gene_links/docker/Dockerfile docker
  • condaforge/miniforge3 24.9.2-0 build
SE_to_gene_links/docker-compose.yml docker
  • se_gene latest
peak_to_gene_links/docker/Dockerfile docker
  • ubuntu 20.04 build
SE_to_gene_links/environment.yml pypi
  • japanize-matplotlib ==1.1.3
SE_to_gene_links/pyproject.toml pypi
SE_to_gene_links/setup.py pypi