gseapy

Gene Set Enrichment Analysis in Python

https://github.com/zqfang/gseapy

Science Score: 59.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: nature.com
  • Committers with academic emails
    2 of 23 committers (8.7%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.7%) to scientific vocabulary

Keywords

enrichment-analysis gsea python3 rust

Keywords from Contributors

scverse bioinformatics
Last synced: 6 months ago · JSON representation

Repository

Gene Set Enrichment Analysis in Python

Basic Info
  • Host: GitHub
  • Owner: zqfang
  • License: bsd-3-clause
  • Language: Python
  • Default Branch: master
  • Homepage: http://gseapy.rtfd.io/
  • Size: 101 MB
Statistics
  • Stars: 650
  • Watchers: 11
  • Forks: 131
  • Open Issues: 33
  • Releases: 41
Topics
enrichment-analysis gsea python3 rust
Created about 10 years ago · Last pushed 6 months ago
Metadata Files
Readme License

README.rst

GSEApy
========

GSEApy: Gene Set Enrichment Analysis in Python.
------------------------------------------------

.. image:: https://badge.fury.io/py/gseapy.svg
    :target: https://badge.fury.io/py/gseapy

.. image:: https://img.shields.io/conda/vn/bioconda/GSEApy.svg?style=plastic
    :target: http://bioconda.github.io

.. image:: https://anaconda.org/bioconda/gseapy/badges/downloads.svg   
    :target: https://anaconda.org/bioconda/gseapy

.. image:: https://github.com/zqfang/GSEApy/workflows/GSEApy/badge.svg?branch=master
    :target: https://github.com/zqfang/GSEApy/actions
    :alt: Action Status

.. image:: http://readthedocs.org/projects/gseapy/badge/?version=master
    :target: http://gseapy.readthedocs.io/en/master/?badge=master
    :alt: Documentation Status

.. image:: https://img.shields.io/badge/license-MIT-blue.svg
    :target:  https://img.shields.io/badge/license-MIT-blue.svg

.. image:: https://img.shields.io/pypi/pyversions/gseapy.svg
    :alt: PyPI - Python Version


**Release notes** : https://github.com/zqfang/GSEApy/releases

`Tutorial for scRNA-seq datasets `_

`Tutorial for general usage `_


Citation
------------------------------------
::

    Zhuoqing Fang, Xinyuan Liu, Gary Peltz, GSEApy: a comprehensive package for performing gene set enrichment analysis in Python, 
    Bioinformatics, 2022;, btac757, https://doi.org/10.1093/bioinformatics/btac757



GSEApy is a Python/Rust implementation for **GSEA** and wrapper for **Enrichr**.
--------------------------------------------------------------------------------------------

GSEApy can be used for **RNA-seq, ChIP-seq, Microarray** data. It can be used for convenient GO enrichment and to produce **publication quality figures** in python.


GSEApy has 7 sub-commands available: ``gsea``, ``prerank``, ``ssgsea``, ``gsva``, ``replot`` ``enrichr``, ``biomart``.


:gsea:    The ``gsea`` module produces `GSEA  `_ results.  The input requries a txt file(FPKM, Expected Counts, TPM, et.al), a cls file, and gene_sets file in gmt format.
:prerank: The ``prerank`` module produces **Prerank tool** results.  The input expects a pre-ranked gene list dataset with correlation values, provided in .rnk format, and gene_sets file in gmt format.  ``prerank`` module is an API to `GSEA` pre-rank tools.
:ssgsea: The ``ssgsea`` module performs **single sample GSEA(ssGSEA)** analysis.  The input expects a pd.Series (indexed by gene name), or a pd.DataFrame (include ``GCT`` file) with expression values and a ``GMT`` file. For multiple sample input, ssGSEA reconigzes gct format, too. ssGSEA enrichment score for the gene set is described by `D. Barbie et al 2009 `_.
:gsva: The ``gsva`` module performs `GSVA `_ method by `Hänzelmann et al `_. The input is same to ssgsea.
:replot: The ``replot`` module reproduce GSEA desktop version results.  The only input for GSEApy is the location to ``GSEA`` Desktop output results.
:enrichr: The ``enrichr`` module enable you perform gene set enrichment analysis using ``Enrichr`` API. Enrichr is open source and freely available online at: http://amp.pharm.mssm.edu/Enrichr . It runs very fast.
:biomart: The ``biomart`` module helps you convert gene ids using BioMart API.


Please use 'gseapy COMMAND -h' to see the detail description for each option of each module.


The full ``GSEA`` is far too extensive to describe here; see
`GSEA  `_ documentation for more information. All files' formats for GSEApy are identical to ``GSEA`` desktop version.



Why GSEApy
-----------------------------------------------------

I would like to use Pandas to explore my data, but I did not find a convenient tool to
do gene set enrichment analysis in python. So, here are my reasons:

* **Ability to run inside python interactive console without having to switch to R!!!**
* User friendly for both wet and dry lab users.
* Produce or reproduce publishable figures.
* Perform batch jobs easy.
* Easy to use in bash shell or your data analysis workflow, e.g. snakemake.


GSEApy vs GSEA(Broad) output
-----------------------------------------------
Using the same data for ``GSEAPreranked``, and ``GSEApy`` reproduce similar results.


.. image:: docs/Preank.py.vs.broad.jpg
    :width: 400


See more output here: `Example `_


Installation
------------

| Install gseapy package from bioconda or pip.


.. code:: shell

   # if you have conda (MacOS_x86-64 and Linux only)
   $ conda install -c bioconda gseapy
   # Windows and MacOS_ARM64(M1/2-Chip)
   $ pip install gseapy


| If pip install failed, use

.. code:: shell

   # you need to install rust first to compile the code
   curl https://sh.rustup.rs -sSf | sh -s -- -y
   # export rust compiler 
   export PATH="$PATH:$HOME/.cargo/bin"
   # install
   $ pip install git+git://github.com/zqfang/gseapy.git#egg=gseapy


Dependency
--------------
* Python 3.7+

Mandatory
~~~~~~~~~

* build
    * Rust: For gseapy > 0.11.0, Rust compiler is needed
    * setuptools-rust
* run
    * Numpy >= 1.13.0
    * Scipy
    * Pandas
    * Matplotlib
    * Requests

Run GSEApy
-----------------


For command line usage:
~~~~~~~~~~~~~~~~~~~~~~~

.. code:: bash


  # An example to reproduce figures using replot module.
  $ gseapy replot -i ./Gsea.reports -o test


  # An example to run GSEA using gseapy gsea module
  $ gseapy gsea -d exptable.txt -c test.cls -g gene_sets.gmt -o test

  # An example to run Prerank using gseapy prerank module
  $ gseapy prerank -r gsea_data.rnk -g gene_sets.gmt -o test

  # An example to run ssGSEA using gseapy ssgsea module
  $ gseapy ssgsea -d expression.txt -g gene_sets.gmt -o test

  # An example to run GSVA using gseapy ssgsea module
  $ gseapy gsva -d expression.txt -g gene_sets.gmt -o test

  # An example to use enrichr api
  # see details for -g input -> ``get_library_name`` 
  $ gseapy enrichr -i gene_list.txt -g KEGG_2016 -o test



Run gseapy inside python console:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1. Prepare expression.txt, gene_sets.gmt and test.cls required by GSEA, you could do this

.. code:: python

    import gseapy

    # run GSEA.
    gseapy.gsea(data='expression.txt', gene_sets='gene_sets.gmt', cls='test.cls', outdir='test')

    # run prerank
    gseapy.prerank(rnk='gsea_data.rnk', gene_sets='gene_sets.gmt', outdir='test')

    # run ssGSEA
    gseapy.ssgsea(data="expression.txt", gene_sets= "gene_sets.gmt", outdir='test')

    # run GSVA
    gseapy.gsva(data="expression.txt", gene_sets= "gene_sets.gmt", outdir='test')

    # An example to reproduce figures using replot module.
    gseapy.replot(indir='./Gsea.reports', outdir='test')


2. If you prefer to use Dataframe, dict, list in interactive python console, you could do this.

see detail here: `Example `_

.. code:: python


    # assign dataframe, and use enrichr library data set 'KEGG_2016'
    expression_dataframe = pd.DataFrame()

    sample_name = ['A','A','A','B','B','B'] # always only two group,any names you like

    # assign gene_sets parameter with enrichr library name or gmt file on your local computer.
    gseapy.gsea(data=expression_dataframe, gene_sets='KEGG_2016', cls= sample_names, outdir='test')

    # prerank tool
    gene_ranked_dataframe = pd.DataFrame()
    gseapy.prerank(rnk=gene_ranked_dataframe, gene_sets='KEGG_2016', outdir='test')

    # ssGSEA
    gseapy.ssgsea(data=expression_dataframe, gene_sets='KEGG_2016', outdir='test')

    # gsva
    gseapy.gsva(data=expression_dataframe, gene_sets='KEGG_2016', outdir='test')


3. For ``enrichr`` , you could assign a list, pd.Series, pd.DataFrame object, or a txt file (should be one gene name per row.)

.. code:: python

    # assign a list object to enrichr
    gl = ['SCARA3', 'LOC100044683', 'CMBL', 'CLIC6', 'IL13RA1', 'TACSTD2', 'DKKL1', 'CSF1',
         'SYNPO2L', 'TINAGL1', 'PTX3', 'BGN', 'HERC1', 'EFNA1', 'CIB2', 'PMP22', 'TMEM173']

    gseapy.enrichr(gene_list=gl, gene_sets='KEGG_2016', outdir='test')

    # or a txt file path.
    gseapy.enrichr(gene_list='gene_list.txt', gene_sets='KEGG_2016',
                   outdir='test', cutoff=0.05, format='png' )


GSEApy supported gene set libaries :
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To see the full list of gseapy supported gene set libraries, please click here: `Library `_

Or use ``get_library_name`` function inside python console.

.. code:: python

    #see full list of latest enrichr library names, which will pass to -g parameter:
    names = gseapy.get_library_name()

    # show top 20 entries.
    print(names[:20])


   ['Genome_Browser_PWMs',
   'TRANSFAC_and_JASPAR_PWMs',
   'ChEA_2013',
   'Drug_Perturbations_from_GEO_2014',
   'ENCODE_TF_ChIP-seq_2014',
   'BioCarta_2013',
   'Reactome_2013',
   'WikiPathways_2013',
   'Disease_Signatures_from_GEO_up_2014',
   'KEGG_2016',
   'TF-LOF_Expression_from_GEO',
   'TargetScan_microRNA',
   'PPI_Hub_Proteins',
   'GO_Molecular_Function_2015',
   'GeneSigDB',
   'Chromosome_Location',
   'Human_Gene_Atlas',
   'Mouse_Gene_Atlas',
   'GO_Cellular_Component_2015',
   'GO_Biological_Process_2015',
   'Human_Phenotype_Ontology',]



Dev 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code:: shell


        # test rust extension only 
        cargo test --features=extension-module
        # test whole package
        python setup.py test



Bug Report
~~~~~~~~~~~~~~~~~~~~~~~~~~~

If you would like to report any bugs when use gseapy, don't hesitate to create an issue on github here.


To get help of GSEApy
------------------------------------

1. See `Frequently Asked Questions `_

2. Visit the document site at `Examples `_

3. The GSEApy discussion channel: `Q&A `_ 

Owner

  • Name: Zhuoqing Fang
  • Login: zqfang
  • Kind: user
  • Location: Stanford

Computational genomics and machine learning with graphs

GitHub Events

Total
  • Create event: 8
  • Commit comment event: 2
  • Release event: 8
  • Issues event: 48
  • Watch event: 84
  • Delete event: 3
  • Issue comment event: 79
  • Push event: 55
  • Pull request event: 4
  • Fork event: 12
Last Year
  • Create event: 8
  • Commit comment event: 2
  • Release event: 8
  • Issues event: 48
  • Watch event: 84
  • Delete event: 3
  • Issue comment event: 79
  • Push event: 55
  • Pull request event: 4
  • Fork event: 12

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 1,094
  • Total Committers: 23
  • Avg Commits per committer: 47.565
  • Development Distribution Score (DDS): 0.689
Past Year
  • Commits: 133
  • Committers: 3
  • Avg Commits per committer: 44.333
  • Development Distribution Score (DDS): 0.015
Top Committers
Name Email Commits
Zhuoqing Fang f****g@s****n 340
zqfang f****q@s****u 304
Zhuoqing Fang f****g@1****m 284
zqfang f****8@g****m 95
zqfang m****g@s****u 41
Charles Tapley Hoyt c****t@g****m 6
falexwolf f****f@g****e 3
Dave Lahr d****r@f****m 3
Pearce Kieser 5****r 2
Yuxing Liao y****o@g****m 2
Zhuoqing Fang z****1@o****m 2
austinmckay a****y@s****o 1
Austin McKay a****3@g****m 1
Jacob Kimmel j****l@g****m 1
hsiao yi h****4@g****m 1
sorrge s****e@g****m 1
Fabian Fröhlich f****n@s****m 1
justin finkle j****e@t****m 1
pirakd d****k@g****m 1
engelsdaniel 5****l 1
Fairlie Reese f****e@g****m 1
oreh h****h@g****m 1
Abhijit Deo 7****g 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 173
  • Total pull requests: 22
  • Average time to close issues: 4 months
  • Average time to close pull requests: about 1 month
  • Total issue authors: 145
  • Total pull request authors: 17
  • Average comments per issue: 3.01
  • Average comments per pull request: 1.09
  • Merged pull requests: 19
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 29
  • Pull requests: 6
  • Average time to close issues: 25 days
  • Average time to close pull requests: 7 days
  • Issue authors: 26
  • Pull request authors: 4
  • Average comments per issue: 2.03
  • Average comments per pull request: 0.67
  • Merged pull requests: 6
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • hsiaoyi0504 (4)
  • 136s (3)
  • ghost (3)
  • Skychou-source (2)
  • LeilyR (2)
  • loganylchen (2)
  • Nazim0001 (2)
  • solo7773 (2)
  • enryH (2)
  • lxy04 (2)
  • Floreuzan (2)
  • zqfang (2)
  • nargesr (2)
  • sreichl (2)
  • wangjiawen2013 (2)
Pull Request Authors
  • 136s (4)
  • wrb2012 (2)
  • engelsdaniel (2)
  • brenshanny (2)
  • byemaxx (2)
  • Alireza-Majd (2)
  • quasi-deus (2)
  • TheAustinator (1)
  • Pearcekieser (1)
  • cthoyt (1)
  • meereeum (1)
  • fvalle1 (1)
  • abhi-glitchhg (1)
  • pirakd (1)
  • hsiaoyi0504 (1)
Top Labels
Issue Labels
fixed (3) enhancement (3) New Feature (3) question (3) FAQ (3) done (2) bug (2) critical (2) discussion (2) help wanted (1)
Pull Request Labels

Packages

  • Total packages: 3
  • Total downloads:
    • pypi 36,649 last-month
  • Total docker downloads: 1,924
  • Total dependent packages: 41
    (may contain duplicates)
  • Total dependent repositories: 78
    (may contain duplicates)
  • Total versions: 146
  • Total maintainers: 1
pypi.org: gseapy

Gene Set Enrichment Analysis in Python

  • Versions: 46
  • Dependent Packages: 41
  • Dependent Repositories: 78
  • Downloads: 36,649 Last month
  • Docker Downloads: 1,924
Rankings
Dependent packages count: 0.4%
Dependent repos count: 1.7%
Docker downloads count: 1.7%
Average: 2.3%
Downloads: 2.7%
Stargazers count: 2.9%
Forks count: 4.4%
Maintainers (1)
Last synced: 6 months ago
proxy.golang.org: github.com/zqfang/gseapy
  • Versions: 50
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 5.4%
Average: 5.6%
Dependent repos count: 5.8%
Last synced: 6 months ago
proxy.golang.org: github.com/zqfang/GSEApy
  • Versions: 50
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 5.4%
Average: 5.6%
Dependent repos count: 5.8%
Last synced: 6 months ago

Dependencies

docs/docs-requirements.txt pypi
  • bioservices *
  • cython *
  • docutils <0.18
  • gseapy *
  • ipykernel *
  • ipython *
  • joblib *
  • matplotlib *
  • nbsphinx *
  • numpy *
  • pandas *
  • pyyaml *
  • requests *
  • scipy *
  • sphinx *
  • sphinx_rtd_theme *
requirements.txt pypi
  • bioservices *
  • joblib *
  • matplotlib >=1.4.3
  • numpy >=1.9.0
  • pandas >=0.16
  • requests *
  • scipy *
setup.py pypi
  • bioservices *
  • joblib *
  • matplotlib *
  • numpy >=1.13.0
  • pandas *
  • requests *
  • scipy *
test-requirements.txt pypi
  • cython * test
  • nose * test
  • numpydoc * test
  • pytest * test
  • pyyaml * test
  • sphinx * test
.github/workflows/test.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v1 composite
.github/workflows/wheels.yml actions
  • actions-rs/toolchain v1 composite
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
  • actions/upload-artifact v2 composite
  • docker/setup-qemu-action v1 composite
  • joerick/cibuildwheel v2.10.1 composite