https://github.com/cwieder/py-sspa

Single sample pathway analysis tools for omics data

Science Score: 46.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 4 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
✓
Committers with academic emails
1 of 5 committers (20.0%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (15.2%) to scientific vocabulary

Keywords

bioinformatics identifier-mapping metabolomics omics pathway-analysis pathway-databases pathway-enrichment-analysis

Last synced: 5 months ago · JSON representation

Repository

Single sample pathway analysis tools for omics data

Basic Info

Host: GitHub
Owner: cwieder
License: gpl-3.0
Language: Jupyter Notebook
Default Branch: main
Homepage:
Size: 8.07 MB

Statistics

Stars: 12
Watchers: 2
Forks: 5
Open Issues: 1
Releases: 3

Topics

bioinformatics identifier-mapping metabolomics omics pathway-analysis pathway-databases pathway-enrichment-analysis

Created about 4 years ago · Last pushed over 1 year ago

Metadata Files

Readme Contributing License

sspa

sspa_logo

PyPI - Downloads

Single sample pathway analysis toolkit

sspa provides a Python interface for metabolomics pathway analysis. In addition to conventional methods over-representation analysis (ORA) and gene/metabolite set enrichment analysis (GSEA), it also provides a wide range of single-sample pathway analysis (ssPA) methods.

Features

Over-representation analysis
Metabolite set enrichment analysis (based on GSEA)
Single-sample pathway analysis
Compound identifier conversion
Pathway database download (KEGG, Reactome, and PathBank)

Although this package is designed to provide a user-friendly interface for metabolomics pathway analysis, the methods are also applicable to other datatypes such as normalised RNA-seq and proteomics data.

Documentation and tutorials

This README provides a quickstart guide to the package and its functions. For new users we highly recommend following our full walkthrough notebook tutorial available on Google Colab which provides a step-by-step guide to using the package.

Click the link above and save a copy of the Colab notebook to your Google Drive. Alternatively, you can download the notebook from the Colab tutorial as an '.ipynb' file and run it locally using Jupyter Notebook or Jupyter Lab.

Documentation is available on our Read the Docs page. This includes a function API reference.

Quickstart

pip install sspa Load Reactome pathways python reactome_pathways = sspa.process_reactome(organism="Homo sapiens")

Load some example metabolomics data in the form of a pandas DataFrame:

python covid_data_processed = sspa.load_example_data(omicstype="metabolomics", processed=True)

Generate pathway scores using kPCA method

python kpca_scores = sspa.sspa_kpca(reactome_pathways, min_entity=2).fit_transform(covid_data_processed.iloc[:, :-2])

Loading example data

Note we provide processed and non-processed versins of the COVID example metabolomics dataset (Su et al. 2020, Cell). The processed version (set processed=True) already has ChEBI identifiers as column names, whereas the non-processed version has metabolite names.

python covid_data = sspa.load_example_data(omicstype="metabolomics", processed=False)

Here we demonstrate some simple pre-processing for this dataset in order to enable conventional and ssPA pathway analysis:

```python

Keep only metabolites (exclude metadata columns)

covidvalues = coviddata.iloc[:, :-2]

Remove metabolites with too many NA values

datafilt = covidvalues.loc[:, covid_values.isin([' ', np.nan, 0]).mean() < 0.5]

Impute using the median

imputedmat = datafilt.fillna(data_filt.median())

Log transform the data

log2mat = np.log2(imputedmat)

Standardise the data (metabolite values) using z-score (mean=0, std=1) by subtracting the mean and dividing by the standard deviation

processeddata = (log2mat - log2mat.mean(axis=0)) / log2mat.std(axis=0) ```

Loading pathways

```python

Pre-loaded pathways

Reactome v78

reactomepathways = sspa.processreactome(organism="Homo sapiens")

KEGG v98

kegghumanpathways = sspa.process_kegg(organism="hsa") ```

Load a custom GMT file (extension .gmt or .csv) python custom_pathways = sspa.process_gmt("wikipathways-20220310-gmt-Homo_sapiens.gmt")

Download latest version of pathways ```python

download KEGG latest metabolomics pathways

keggmouselatest = sspa.processkegg("mmu", downloadlatest=True, filepath=".")

download Reactome latest metabolomics pathways

reactomemouselatest = sspa.processreactome("Mus musculus", downloadlatest=True, filepath=".", omics_type='metabolomics')

download Pathbank latest metabolomics pathways

pathbankhumanlatest = sspa.processpathbank("Homo sapiens", downloadlatest=True, filepath=".", omics_type='metabolomics') ```

Download latest version of multi-omics pathways - For Reactome, users can specify the omics types required via the 'identifiers' argument. Leaving this to None downloads all omics (ChEBI, UniProt, Gene Symbol). Users can specify any combination of ['chebi', 'uniprot', 'gene_symbol']. - For KEGG, multi-omics pathways are represented by KEGG gene and KEGG compound identifiers.

```python

download multi-omics pathways from Reactome (ChEBI, UniProt, Gene Symbol)

reactomehumanmo = sspa.processreactome('Homo sapiens', downloadlatest=True, filepath=".", omicstype='multiomics', identifiers=['chebi', 'uniprot', 'genesymbol'])

download multi-omics pathways from Reactome (ChEBI and UniProt)

reactomehumanmo = sspa.processreactome('Homo sapiens', downloadlatest=True, filepath=".", omics_type='multiomics', identifiers=['chebi', 'uniprot'])

download multi-omics pathways from KEGG (KEGG gene and KEGG compound)

keggmouselatest = sspa.processkegg("mmu", downloadlatest=True, filepath=".", omics_type='multiomics') ```

Identifier harmonization

Note: KEGG pathways use KEGG compound IDs, Reactome and Pathbank pathways use ChEBI and UniProt (for proteins) ```python

download the conversion table

compoundnames = processeddata.columns.tolist() conversiontable = sspa.identifierconversion(inputtype="name", compoundlist=compound_names)

map the identifiers to your dataset

processeddatamapped = sspa.mapidentifiers(conversiontable, outputidtype="ChEBI", matrix=processed_data) ```

Conventional pathway analysis

Over-representation analysis (ORA) ```python ora = sspa.sspaora(processeddatamapped, coviddata["Group"], reactomepathways, 0.05, DAtesttype='ttest', custom_background=None)

perform ORA

orares = ora.overrepresentation_analysis()

get t-test results

ora.ttest_res

obtain list of differential molecules input to ORA

ora.DAtestres ```

Gene Set Enrichment Analysis (GSEA), applicable to any type of omics data

python sspa.sspa_gsea(processed_data_mapped, covid_data['Group'], reactome_pathways)

Single sample pathway analysis methods

All ssPA methods now have a fit(), transform() and fit_transform() method for compatibility with SciKitLearn. This allows integration of ssPA transformation with various machine learning functions in SKLearn such as Pipeline and GridSearchCV. Specifically for sspa.sspa_ssClustPA, sspa.sspa_SVD, and sspa.sspa_KPCA methods the model can be fit on the training data and the test data is transformed using the fitted model.

```python

ssclustPA

ssclustpares = sspa.sspassClustPA(reactomepathways, minentity=2).fittransform(processeddata_mapped)

kPCA

kpcascores = sspa.sspakpca(reactomepathways, minentity=2).fittransform(processeddata_mapped)

z-score (Lee et al. 2008)

zscoreres = sspa.sspazscore(reactomepathways, minentity=2).fittransform(processeddata_mapped)

SVD (PLAGE, Tomfohr et al. 2005)

svdres = sspa.sspasvd(reactomepathways, minentity=2).fittransform(processeddata_mapped)

ssGSEA (Barbie et al. 2009)

ssgseares = sspa.sspassGSEA(reactomepathways, minentity=2).fittransform(processeddata_mapped) ```

License

GNU GPL 3.0

Citing us

If you found this package useful, please consider citing us:

ssPA package @article{Wieder22a, author = {Cecilia Wieder and Nathalie Poupin and Clément Frainay and Florence Vinson and Juliette Cooke and Rachel PJ Lai and Jacob G Bundy and Fabien Jourdan and Timothy MD Ebbels}, doi = {10.5281/ZENODO.6959120}, month = {8}, title = {cwieder/py-ssPA: v1.0.4}, url = {https://zenodo.org/record/6959120}, year = {2022}, }

Single-sample pathway analysis in metabolomics ``` @article{Wieder2022, author = {Cecilia Wieder and Rachel P J Lai and Timothy M D Ebbels}, doi = {10.1186/s12859-022-05005-1}, issn = {1471-2105}, issue = {1}, journal = {BMC Bioinformatics}, pages = {481}, title = {Single sample pathway analysis in metabolomics: performance evaluation and application}, volume = {23}, url = {https://doi.org/10.1186/s12859-022-05005-1}, year = {2022}, }

```

Contributing

Read our contributor's guide to get started

Contributors

We are grateful for our contributors who help develop and maintain py-ssPA: - Maëlick Brochut @mbrochut

News and updates

### [v1.0.2] - 4/12/23 - Enable download of Pathbank pathways (metabolite and protein) via the `process_pathbank()` function ### [v1.0.0] - 25/08/23 - Add compatability with SciKitLearn by implementing `fit()`, `transform()` and `fit_transform()` methods for all ssPA methods. This allows integration of ssPA transformation with various machine learning functions in SKLearn such as `Pipeline` and `GridSearchCV`. Specifically for `sspa.sspa_ssClustPA`, `sspa.sspa_SVD`, and `sspa.sspa_KPCA` methods the model can be fit on the training data and the test data is transformed using the fitted model. - Fixed ID conversion bug in `sspa.map_identifiers()` due to MetaboAnalyst API URL change ### [v0.2.4] - 04/07/23 Enable the download of multi-omics (ChEBI and UniProt) Reactome pathways for multi-omics integration purposes. Enable `omics_type='multiomics'` to download: ``` reactome_mouse_latest_mo = sspa.process_reactome("Mus musculus", download_latest=True, filepath=".", omics_type='multiomics') ``` ### [v0.2.3] - 23/06/23 - @mbrochut Bug fix in KEGG pathway downloader - @mbrochut Add tqdm progress bar for long KEGG downloads ### [v0.2.1] - 05/01/23 - Removal of rpy2 dependency for improved compatibility across systems - Use [GSEApy](https://github.com/zqfang/GSEApy) as backend for GSEA and ssGSEA - Minor syntax changes - `ora.ttest_res` is now `ora.DA_test_res` (as we can implement t-test or MWU tests) - `sspa_fgsea()` is now `sspa_gsea()` and uses gseapy as the backend rather than R fgsea - `sspa_gsva()` is temporarily deprecated due to the need for the rpy2 compatability - use the [GSVA R package](https://bioconductor.org/packages/release/bioc/html/GSVA.html)

Owner

Login: cwieder
Kind: user
Location: London, UK
Company: Imperial College London

Repositories: 15
Profile: https://github.com/cwieder

GitHub Events

Total

Issues event: 3
Watch event: 2
Issue comment event: 1
Push event: 7
Fork event: 1

Last Year

Issues event: 3
Watch event: 2
Issue comment event: 1
Push event: 7
Fork event: 1

Committers

Last synced: over 1 year ago

All Time

Total Commits: 199
Total Committers: 5
Avg Commits per committer: 39.8
Development Distribution Score (DDS): 0.06

Past Year

Commits: 15
Committers: 3
Avg Commits per committer: 5.0
Development Distribution Score (DDS): 0.533

Top Committers

Name	Email	Commits
cwieder	c**9@g**m	187
cwieder	c**9@i**k	5
Maelick	m**t@g**m	3
cwieder	5****r	2
mbrochut	4****t	2

Committer Domains (Top 20 + Academic)

ic.ac.uk: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 2
Total pull requests: 3
Average time to close issues: 5 minutes
Average time to close pull requests: 2 days
Total issue authors: 2
Total pull request authors: 1
Average comments per issue: 0.5
Average comments per pull request: 0.67
Merged pull requests: 3
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 2
Pull requests: 0
Average time to close issues: 5 minutes
Average time to close pull requests: N/A
Issue authors: 2
Pull request authors: 0
Average comments per issue: 0.5
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

priyanka-1802 (1)
KCrux (1)

Pull Request Authors

mbrochut (3)

Top Labels

Issue Labels

Pull Request Labels

Dependencies

requirements.txt pypi

numpy >=1.21.4
pandas >=1.3.5
requests >=2.26.0
rpy2 >=3.4.5
scikit-learn >=1.0.1
scipy >=1.7.3
setuptools >=58.0.4
sspa >=0.1.0
statsmodels >=0.13.1

.github/workflows/sspa-docs.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite

.github/workflows/sspa-tests.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite

setup.py pypi

gseapy *
numpy *
pandas *
requests *
scikit-learn *
scipy *
setuptools *
sspa *
statsmodels *

https://github.com/cwieder/py-sspa

Science Score: 46.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

sspa

Single sample pathway analysis toolkit

Features

Documentation and tutorials

Quickstart

Loading example data

Keep only metabolites (exclude metadata columns)

Remove metabolites with too many NA values

Impute using the median

Log transform the data

Standardise the data (metabolite values) using z-score (mean=0, std=1) by subtracting the mean and dividing by the standard deviation

Loading pathways

Pre-loaded pathways

Reactome v78

KEGG v98

download KEGG latest metabolomics pathways

download Reactome latest metabolomics pathways

download Pathbank latest metabolomics pathways

download multi-omics pathways from Reactome (ChEBI, UniProt, Gene Symbol)

download multi-omics pathways from Reactome (ChEBI and UniProt)

download multi-omics pathways from KEGG (KEGG gene and KEGG compound)

Identifier harmonization

download the conversion table

map the identifiers to your dataset

Conventional pathway analysis

perform ORA

get t-test results

obtain list of differential molecules input to ORA

Single sample pathway analysis methods

ssclustPA

kPCA

z-score (Lee et al. 2008)

SVD (PLAGE, Tomfohr et al. 2005)

ssGSEA (Barbie et al. 2009)

License

Citing us

Contributing

Contributors

News and updates

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies