Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.6%) to scientific vocabulary
Repository
SEgene package
Basic Info
- Host: GitHub
- Owner: hamamoto-lab
- License: mit
- Language: Python
- Default Branch: main
- Size: 1.31 MB
Statistics
- Stars: 1
- Watchers: 2
- Forks: 1
- Open Issues: 0
- Releases: 15
Metadata Files
README.md
SEgene
(For the Japanese version of this README, please see README_ja.md.)
SEgene (pronounced "S-E-gene") is a platform designed to identify and analyze Super-Enhancer-to-gene links (SE-to-gene links) by incorporating the peak-to-gene links approach, a statistical method that uncovers correlations between genes and peak regions. This repository contains tools and scripts for SEgene.
Features
- Analyze the relationship between super-enhancers and genes
- Visualize data using graph theory
- Interactive analysis with Jupyter Notebook
Program Structure
SEgene currently consists of four primary components (P2GL data preparation, P2GL correlation analysis, Super-enhancer analysis, and Region evaluation analysis). Additionally, development versions of programs are available and are continuously being developed and enhanced. See the Development Versions section for more information.
Workflow Overview
The following diagram illustrates the complete workflow of SEgene:
```mermaid graph TD %% --- Definitions ------------------------- classDef tool fill:#d8eefe,stroke:#1c4b82,stroke-width:1px,rx:10,ry:10 classDef data fill:#ffffff,stroke:#1c4b82,stroke-dasharray:4 4 classDef annotation fill:#ffe0b2,stroke:#1c4b82,stroke-dasharray:4 4 classDef toolLabel font-size:12px,font-weight:bold
%% --- Data Sources -----------------
A1["ChIP-seq
BAM files"]:::data
A2["Gene expression
(Salmon TPM)"]:::data
A3["Region files
(BED/SAF/merge_SE)"]:::annotation
A4["Gene annotation
(GTF)"]:::annotation
%% --- STEP 1: Data Preparation --------------- subgraph Step1["Step 1: P2GL Preparation"] direction TB P1["SEgenepeakprep"]:::tool P2["SEgenegeneprep"]:::tool A1 --> P1 A3 --> P1 A2 --> P2 A4 --> P2 end
%% --- STEP 2: Peak-to-Gene Analysis -------- subgraph Step2["Step 2: P2GL Analysis"] direction TB Q1["Peak signal TSV"]:::data Q2["Processed gene TPM"]:::data P3["peaktogene_links"]:::tool P1 --> Q1 P2 --> Q2 Q1 & Q2 --> P3 end
%% --- STEP 3: SE-to-Gene Analysis -------------
subgraph Step3["Step 3: SE Analysis"]
Q3["Peak-gene correlation"]:::data
P4["SEtogene_links"]:::tool
Q4["SE regions
(mergeSE.tsv)"]:::data
P3 --> Q3
Q3 & Q2 --> P4
P4 --> Q4
end
%% --- STEP 4: Region Analysis --------------- subgraph Step4["Step 4: Region Evaluation"] P5["SEgene_RegionAnalyzer"]:::tool Q5["Analysis results"]:::data Q4 --> P5 P5 --> Q5 end
%% --- Click Events -------------- click P1 "https://github.com/hamamoto-lab/SEgene/tree/main/SEgenepeakprep" click P2 "https://github.com/hamamoto-lab/SEgene/tree/main/SEgenegeneprep" click P3 "https://github.com/hamamoto-lab/SEgene/tree/main/peaktogenelinks" click P4 "https://github.com/hamamoto-lab/SEgene/tree/main/SEtogenelinks" click P5 "https://github.com/hamamoto-lab/SEgene/tree/main/SEgeneregionanalyzer"
```
P2GL Data Preparation
- SEgene_peakprep
Quantifies and normalizes signal values from ChIP‑seq data (BAM files) for specified genomic regions
(supports Standard log2‑CPM, edgeR‑normalized CPM, and BigWig methods).
For edgeR‑normalized CPM details, see
SEgene_peakprep/cpm_calcnorm_README.md - SEgene_geneprep: Adds genomic region information to RNA-seq data (TPM-TSV files) for P2GL input preparation.
P2GL Correlation Analysis
- peaktogene_links: Integrates ChIP-seq data with gene expression data to obtain correlation information between enhancer peaks and gene expression.
Super-enhancer Analysis
- SEtogene_links: Evaluates and analyzes super-enhancers using the correlation information obtained through P2GL.
- cli_tools (auxiliary tools): Analyzes correlations between SE regions identified by SEtogene_links and gene expression.
Region Evaluation Analysis (Optional)
- SEgene_RegionAnalyzer: Allows for detailed characterization of identified SE regions and their integration with public databases.
Development Versions
- SEgene_analyzer: Development version of SEgeneregionanalyzer with enhanced CLI interface, modern Python packaging, and advanced features including caching and comprehensive testing.
- SEgeneanalyzererna: eRNAbase-specific analysis tool derived from SEgeneregionpackage, providing specialized functionality for eRNA database integration and region overlap analysis with enhanced CLI interface.
Usage
For installation and usage instructions, please refer to the respective README files:
- SEgene_peakprep
- SEgene_geneprep
- peaktogene_links
- SEtogene_links
- SEgene_RegionAnalyzer
- SEgene_analyzer (Development version)
- SEgeneanalyzererna (Development version)
Changelog
This project follows Semantic Versioning, and major updates are documented in:
Refer to these files for information on added features, fixes, and other notable changes in each version.
Libraries and Licenses
This project imports and relies on a variety of open-source libraries. Below is a list of the libraries used and their respective licenses:
Python
- Version: Python 3.10
- Libraries:
- Biopython - Biopython License Agreement
- Pandas - BSD License
- Matplotlib - PSF and BSD License
- Seaborn - BSD License
- Scipy - BSD License
- Statsmodels - BSD License
- PyBedTools - MIT License
- PyRanges - MIT License
- PyGenomeViz - MIT License
- Jupyter - BSD License
- IPython - BSD License
- NetworkX - BSD License
- Pillow - HPND License
- NumPy - BSD License
- PySide6 - LGPL License
- HDF5 - BSD License
- requests - Apache 2.0 License
- urllib3 - MIT License
- japanize-matplotlib - MIT License
- Tornado - Apache 2.0 License
- Traitlets - BSD License
- Pygments - BSD License
- bleach - Apache 2.0 License
- BeautifulSoup4 - MIT License
- Jedi - MIT License
- Prometheus-client - Apache 2.0 License
- DefusedXML - PSF License
- pytz - MIT License
- pyyaml - MIT License
- six - MIT License
- MarkupSafe - BSD License
- Certifi - Mozilla Public License 2.0
- idna - MIT License
- argon2-cffi - MIT License
- zipp - MIT License
R
- Version: R 4.2.2
- Libraries:
- BiocManager - GPL License
- data.table - MPL-2.0 License
- openxlsx - MIT License
- optparse - GPL License
- pbmcapply - MIT License
- stringr - MIT License
- GenomicRanges - GPL License
- rhdf5 - Artistic License 2.0
- edgeR - GPL License
Python‑R Interop
- rpy2 – GNU GPL v2+ License
Provides the Python ↔︎ R interface used by the edgeR‑normalized CPM feature.
Julia
- Version: Julia 1.8.3
- Libraries:
Genomics Tools
- Bedtools - MIT License
Bedtools is used for genome arithmetic operations and is accessed via the Python wrapper library PyBedTools. - samtools - MIT License
samtools is used for BAM file analysis and gathering statistics. - featureCounts - GPL License
featureCounts is used for counting reads in defined genomic regions. - deeptools - BSD License
deeptools is used for BAM to bigWig conversion and signal extraction in the BigWig method.
For a full list of dependencies of SEtogene_links, refer to SEtogene_links/environment.yml.
Base Image and Dependency Management
This project uses Miniforge3 for dependency management in SEtogene_links. Miniforge3 is a minimal installer for Conda Forge, which provides a community-driven collection of packages for Conda.
Usage Environments (SEtogene_links)
- Docker: The SEtogene_links component uses the
condaforge/miniforge3Docker image as its base.- Base Image: condaforge/miniforge3
- Licensed under the BSD 3-Clause License.
- Base Image: condaforge/miniforge3
- Standalone Installation: Miniforge3 can also be installed directly on your local system.
- Installation instructions can be found on the Miniforge GitHub page.
Package Sources
- Conda Forge
- Conda Forge provides a community-maintained collection of packages with wide platform support.
- Licensed under the BSD 3-Clause License.
- Bioconda
- A channel for the Conda package manager specializing in bioinformatics software.
Citation
If you use this tool in your research, please cite:
Shinkai, N., Asada, K., Machino, H., Takasawa, K., Takahashi, S., Kouno, N., Komatsu, M., Hamamoto, R., & Kaneko, S. (2025). SEgene identifies links between super enhancers and gene expression across cell types. npj Systems Biology and Applications, 11(1), 49. https://doi.org/10.1038/s41540-025-00533-x
For detailed citation information and additional references, please refer to the CITATION file.
License
This program is released under the MIT License. For more details, please refer to the LICENSE file.
Owner
- Name: hamamoto-lab
- Login: hamamoto-lab
- Kind: organization
- Repositories: 4
- Profile: https://github.com/hamamoto-lab
Citation (CITATION)
# Citation
**Please cite the following paper when using this repository:**
Shinkai, N., Asada, K., Machino, H., Takasawa, K., Takahashi, S., Kouno, N., Komatsu, M., Hamamoto, R., & Kaneko, S. (2025). SEgene identifies links between super enhancers and gene expression across cell types. *npj Systems Biology and Applications*, 11(1), 49. https://doi.org/10.1038/s41540-025-00533-x
## BibTeX
```bibtex
@article{shinkai2025segene,
title={SEgene identifies links between super enhancers and gene expression across cell types},
author={Shinkai, Norio and Asada, Ken and Machino, Hidenori and Takasawa, Ken and Takahashi, Satoshi and Kouno, Nobuji and Komatsu, Masaaki and Hamamoto, Ryuji and Kaneko, Syuzo},
journal={npj Systems Biology and Applications},
volume={11},
number={1},
pages={49},
year={2025},
publisher={Nature Publishing Group},
doi={10.1038/s41540-025-00533-x}
}
```
## Data Availability
Test sample data and additional resources are available on Figshare: https://doi.org/10.6084/m9.figshare.28171127
# Additional Information
## Methodology
In addition, for the method implemented in the `peak_to_gene_links` folder, we have codified the approach based on the methodology described in the following paper:
- Corces MR, Granja JM, Shams S, Louie BH, Seoane JA, Zhou W, Silva TC, Groeneveld C, Wong CK, Cho SW, Satpathy AT, Mumbach MR, Hoadley KA, Robertson AG, Sheffield NC, Felau I, Castro MAA, Berman BP, Staudt LM, Zenklusen JC, Laird PW, Curtis C, Greenleaf WJ, Chang HY; Cancer Genome Atlas Analysis Network. The chromatin accessibility landscape of primary human cancers. *Science*. 2018 Oct 26;362(6413):eaav1898. doi: [10.1126/science.aav1898](https://www.science.org/doi/10.1126/science.aav1898). PMID: 30361341; PMCID: PMC6408149.
## Data Sources
Additionally, the test sample data linked from this repository on Figshare is derived from data obtained from the following public dataset:
- **GSE156614**: Available at [NCBI GEO](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE156614).
This data was processed and used to generate the test samples in this repository, as described in the following publications:
- Li QL, Lin X, Yu YL, Chen L, Qi-Xin Hu, Meng Chen, Nan Cao, Chen Zhao, Chen-Yu Wang, Cheng-Wei Huang, Lian-Yun Li, Mei Ye, Min Wu. Genome-wide profiling in colorectal cancer identifies PHF19 and TBC1D16 as oncogenic super enhancers. Nat Commun (IF: 14.92; Q1). 2021 Nov 4;12(1):6407. doi: [10.1038/s41467-021-26600-5](https://www.nature.com/articles/s41467-021-26600-5). PMID: 34737287; PMCID: PMC8568941.
GitHub Events
Total
- Release event: 12
- Watch event: 1
- Push event: 15
- Public event: 1
- Create event: 13
Last Year
- Release event: 12
- Watch event: 1
- Push event: 15
- Public event: 1
- Create event: 13
Dependencies
- condaforge/miniforge3 24.9.2-0 build
- se_gene latest
- ubuntu 20.04 build
- japanize-matplotlib ==1.1.3