pytximport

Feature-rich Python implementation of the tximport package for gene count estimation.

https://github.com/complextissue/pytximport

Science Score: 39.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 8 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.2%) to scientific vocabulary

Keywords

anndata bioinformatics deseq2 python rna rna-seq scverse tximport

Last synced: 6 months ago · JSON representation

Repository

Feature-rich Python implementation of the tximport package for gene count estimation.

Basic Info

Host: GitHub
Owner: complextissue
License: gpl-3.0
Language: Python
Default Branch: main
Homepage: https://pytximport.complextissue.com/en/stable/start.html
Size: 108 MB

Statistics

Stars: 36
Watchers: 1
Forks: 2
Open Issues: 0
Releases: 11

Topics

anndata bioinformatics deseq2 python rna rna-seq scverse tximport

Created over 1 year ago · Last pushed 7 months ago

Metadata Files

Readme Contributing Funding License Code of conduct Citation

pytximport

GitHub Actions Workflow Status Conda Downloads Pepy Total Downloads

pytximport is a Python package for efficient (gene-)count estimation from transcript quantification files produced by pseudoalignment/quasi-mapping tools such as salmon, kallisto, rsem and others. pytximport is a port of the popular tximport Bioconductor R package.

Manuscript & Documentation

The pytximport manuscript can be accessed at: https://doi.org/10.1093/bioinformatics/btae700. Detailled documentation is made available at: https://pytximport.complextissue.com.

Installation

The recommended way to install pytximport is through Bioconda:

bash mamba install -c bioconda pytximport

pytximport can also be installed via pip:

bash pip install pytximport

While not required, we recommend users also install pyarrow for faster import of tab-separated value-based quantification files:

bash mamba install -c conda-forge pyarrow-core

or:

bash pip install pyarrow

Quick Start

You can either import the tximport function in your Python files:

```python from pytximport import tximport from pytximport.utils import createtranscriptgene_map

transcriptgenemap = createtranscriptgene_map(species="human")

results = tximport( filepaths, datatype="salmon", transcriptgenemap=transcriptgenemap, ) ```

Or use it from the command line:

bash pytximport -i ./sample_1.sf -i ./sample_2.sf -t salmon -m ./tx2gene_map.tsv -o ./output_counts.csv

Common options are:

-i: The path to an quantification file. To provide multiple input files, use -i input1.sf -i input2.sf ....
-t: The type of quantification file, e.g. salmon, kallisto and others.
-m: The path to the transcript to gene map. Either a tab-separated (.tsv) or comma-separated (.csv) file. Expected column names are transcript_id and gene_id.
-o: The output path to save the resulting counts to.
-of: The format of the output file. Either csv or h5ad.
-ow: Provide this flag to overwrite an existing file at the output path.
-c: The method to calculate the counts from the abundance. Leave empty to use counts. For differential gene expression analysis, we recommend using length_scaled_tpm. For differential transcript expression analysis, we recommend using scaled_tpm. For differential isoform usage analysis, we recommend using dtu_scaled_tpm.
-ir: Provide this flag to make use of inferential replicates. Will use the median of the inferential replicates.
-gl: Provide this flag when importing gene-level counts from RSEM files.
-tx: Provide this flag to return transcript-level instead of gene-summarized data. Incompatible with gene-level input and counts_from_abundance=length_scaled_tpm.
--help: Display all configuration options.

Transcript-to-gene mappings can also be generated from the command line:

bash pytximport create-map -i ./data/annotation.gtf -o tx2gene.csv -ow

Command options are:

-i: The path to an annotation file in GTF format.
-o: The output path to save the resulting transcript-to-gene mapping to.
-ow: Provide this flag to overwrite an existing file at the output path.
--help: Display all configuration options.

Motivation

The tximport package has become a main stay in the bulk RNA sequencing community and has been used in hundreds of scientific publications. However, its accessibility has remained limited since it requires the R programming language and cannot be used from within Python scripts or the command line. Other tools of the bulk RNA sequencing analysis stack, like DESeq2 (in the form of PyDESeq2), decoupler, liana and others all have Python versions. Additionally, pseudoalignment tools like salmon and kallisto can be installed via conda and can be used from the command line. tximport thus constitutes the missing link in many common analysis workflows. pytximport fills this gap and allows these workflows to be entirely done in Python, which is preinstalled on most development machines, and from the command line.

Citation

Please cite both the original publication as well as this Python implementation:

Kuehl, M., Wong, M. N., Wanner, N., Bonn, S., & Puelles, V. G. (2024). Gene count estimation with pytximport enables reproducible analysis of bulk RNA sequencing data in Python. Bioinformatics, btae700. https://doi.org/10.1093/bioinformatics/btae700
Charlotte Soneson, Michael I. Love, Mark D. Robinson. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences, F1000Research, 4:1521, December 2015. doi: 10.12688/f1000research.7563.1

License

The software is provided under the GNU General Public License version 3. Please consult LICENSE for further information.

Differences

Generally, outputs from pytximport correspond to the outputs from tximport within the accuracy allowed by multiple floating point operations and small implementation differences in its dependencies when using the same configuration. If you observe larger discrepancies, please open an issue.

While the outputs are identical within floating point tolerance for the same configuration, there remain some differences between the packages:

Features unique to pytximport: - Generating transcript-to-gene maps, either from a BioMart server or an annotation.gtf file. Use create_transcript_gene_map or create_transcript_gene_map_from_annotation from pytximport.utils. - Command line interface. Type pytximport --help into your terminal to explore all options. - AnnData-support, enabling seamless integration with the scverse. - SummarizedExperiment-support to represent outputs in familiar Bioconductor data structures available through the BiocPy ecosystem. - Saving outputs directly to file (use the output_path argument). - Removing transcript versions from both the quantification files and the transcript-to-gene map when ignore_transcript_version is provided. - Post-hoc biotype-filtering using pytximport.utils.filter_by_biotype.

Features unique to tximport: - Alevin single-cell RNA-seq data support

Argument order and argument defaults may differ between the implementations.

Contributing

Contributions are welcome. Contributors are asked to follow the Contributor Covenant Code of Conduct.

To set up pytximport for development on your machine, we recommend to git clone the dev branch:

bash git clone --depth 1 -b dev https://github.com/complextissue/pytximport.git cd pytximport pyenv local 3.12 make create-venv source .venv/source/activate make install-dev

Since pytximport is linted and formatted, the repository contains a list of recommended VS Code extensions in .vscode/extensions.json. If you are using a different editor, please make sure to set up your environment to use the same linters and formatters.

For new features and non-obvious bug fixes, we kindly ask that you create a GitHub issue before submitting a PR.

Running the tests locally

Please follow the steps described in the "Contributing" section. Once you have setup your development environment, you can run the unit tests locally:

bash make coverage-report

Building the documentation locally

The documentation can be build locally by navigating to the docs folder and running: make html. This requires that the development requirements of the package as well as the package itself have been installed in the same virtual environment and that pandoc has been added, e.g. by running brew install pandoc on macOS operating systems.

Development status

pytximport is still in development and has not yet reached version 1.0.0 in the SemVer versioning scheme. While it should work for almost all use cases and we regularly compare outputs against the R implementation, breaking changes between minor versions may occur. If you encounter any problems, please open a GitHub issue. If you are a Python developer, we welcome pull requests implementing missing features, adding more extensive unit tests and bug fixes.

Data sources

The quantification files used for the unit tests are partly adopted from tximportData which in turn used a subsample of the GEUVADIS data: Lappalainen, T., Sammeth, M., Friedlnder, M. R., t Hoen, P. A., Monlong, J., Rivas, M. A., ... & Dermitzakis, E. T. (2013). Transcriptome and genome sequencing uncovers functional variation in humans. Nature, 501(7468), 506-511.

Other test and example files, such as those used in the vignette, are based on the following work: Braun, F., Abed, A., Sellung, D., Rogg, M., Woidy, M., Eikrem, O., ... & Huber, T. B. (2023). Accumulation of -synuclein mediates podocyte injury in Fabry nephropathy. The Journal of clinical investigation, 133(11).

Owner

Name: Complex Tissue lab
Login: complextissue
Kind: organization
Location: Denmark

Website: https://pure.au.dk/portal/en/persons/victor-puelles(98b4de9d-1e23-44c3-9d5c-f5611c35d08a).html
Repositories: 1
Profile: https://github.com/complextissue

Research group led by Prof. Victor Puelles, PhD, at Aarhus University.

GitHub Events

Total

Create event: 3
Issues event: 2
Release event: 2
Watch event: 28
Delete event: 1
Issue comment event: 6
Push event: 25
Pull request event: 16
Fork event: 1

Last Year

Create event: 3
Issues event: 2
Release event: 2
Watch event: 28
Delete event: 1
Issue comment event: 6
Push event: 25
Pull request event: 16
Fork event: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 2
Total pull requests: 5
Average time to close issues: about 17 hours
Average time to close pull requests: 12 minutes
Total issue authors: 2
Total pull request authors: 2
Average comments per issue: 0.5
Average comments per pull request: 0.2
Merged pull requests: 4
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 2
Pull requests: 5
Average time to close issues: about 17 hours
Average time to close pull requests: 12 minutes
Issue authors: 2
Pull request authors: 2
Average comments per issue: 0.5
Average comments per pull request: 0.2
Merged pull requests: 4
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

serenalotreck (1)
jkanche (1)
fredsamhaak (1)

Pull Request Authors

maltekuehl (25)
Zethson (1)
jkanche (1)

Top Labels

Issue Labels

bug (1) enhancement (1)

Pull Request Labels

enhancement (1)

Packages

Total packages: 1
Total downloads:
- pypi 95 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 14
Total maintainers: 1

pypi.org: pytximport

A python implementation of `tximport` to transform transcript into gene counts

Homepage: https://github.com/complextissue/pytximport
Documentation: https://pytximport.complextissue.com/en/stable/
License: GNU General Public License v3 (GPLv3)
Latest release: 0.12.0
published 11 months ago

Versions: 14
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 95 Last month

Rankings

Dependent packages count: 10.9%

Average: 36.0%

Dependent repos count: 61.2%

Maintainers (1)

mkuehl

Last synced: 6 months ago

Dependencies

pyproject.toml pypi

anndata *
click *
dask *
flox *
h5py *
numpy *
pandas *
tqdm *
xarray *

.github/workflows/ci.yml actions

actions/cache v3 composite
actions/checkout v3 composite
actions/setup-python v3 composite
codecov/codecov-action v4.0.1 composite

.github/workflows/publish.yml actions

actions/checkout v4 composite
actions/setup-python v3 composite

requirements.frozen.txt pypi

CacheControl ==0.14.0
Jinja2 ==3.1.4
MarkupSafe ==2.1.5
PyYAML ==6.0.2
Pygments ==2.18.0
RapidFuzz ==3.10.0
Sphinx ==8.0.2
adjustText ==1.2.0
alabaster ==1.0.0
anndata ==0.10.9
appnope ==0.1.4
array_api_compat ==1.8
astroid ==3.3.4
asttokens ==2.4.1
attrs ==24.2.0
babel ==2.16.0
bandit ==1.7.10
beautifulsoup4 ==4.12.3
black ==24.8.0
bleach ==6.1.0
build ==1.2.2
cattrs ==24.1.2
certifi ==2024.8.30
cffi ==1.17.1
cfgv ==3.4.0
charset-normalizer ==3.3.2
cleo ==2.1.0
click ==8.1.7
comm ==0.2.2
contourpy ==1.3.0
coverage ==7.6.1
crashtest ==0.4.1
cycler ==0.12.1
debugpy ==1.8.6
decorator ==5.1.1
decoupler ==1.8.0
defusedxml ==0.7.1
distlib ==0.3.8
docrep ==0.3.2
docutils ==0.21.2
dulwich ==0.21.7
executing ==2.1.0
fastjsonschema ==2.20.0
filelock ==3.16.1
flake8 ==7.1.1
flake8-comprehensions ==3.15.0
flake8-docstrings ==1.7.0
flit ==3.9.0
flit_core ==3.9.0
flox ==0.9.13
fonttools ==4.54.1
furo ==2024.8.6
future ==1.0.0
h5py ==3.12.1
identify ==2.6.1
idna ==3.10
imagesize ==1.4.1
inflect ==7.4.0
iniconfig ==2.0.0
installer ==0.7.0
ipykernel ==6.29.5
ipython ==8.27.0
ipywidgets ==8.1.5
isort ==5.13.2
jaraco.classes ==3.4.0
jedi ==0.19.1
joblib ==1.4.2
jsonschema ==4.23.0
jsonschema-specifications ==2023.12.1
jupyter_client ==8.6.3
jupyter_core ==5.7.2
jupyterlab_pygments ==0.3.0
jupyterlab_widgets ==3.0.13
keyring ==24.3.1
kiwisolver ==1.4.7
llvmlite ==0.43.0
markdown-it-py ==3.0.0
matplotlib ==3.9.2
matplotlib-inline ==0.1.7
mccabe ==0.7.0
mdit-py-plugins ==0.4.2
mdurl ==0.1.2
mistune ==3.0.2
more-itertools ==10.5.0
msgpack ==1.1.0
mypy ==1.11.2
mypy-extensions ==1.0.0
myst-parser ==4.0.0
natsort ==8.4.0
nbclient ==0.10.0
nbconvert ==7.16.4
nbformat ==5.10.4
nbsphinx ==0.9.5
nest-asyncio ==1.6.0
nodeenv ==1.9.1
numba ==0.60.0
numpy ==1.26.4
numpy-groupies ==0.11.2
omnipath ==1.0.8
packaging ==24.1
pandas ==2.2.3
pandas-stubs ==2.2.2.240909
pandoc ==2.4
pandocfilters ==1.5.1
parso ==0.8.4
pathspec ==0.12.1
pbr ==6.1.0
pexpect ==4.9.0
pillow ==10.4.0
pkginfo ==1.11.1
platformdirs ==4.3.6
pluggy ==1.5.0
plumbum ==1.8.3
ply ==3.11
poetry ==1.8.3
poetry-core ==1.9.0
poetry-plugin-export ==1.8.0
pre-commit ==3.8.0
prompt_toolkit ==3.0.48
psutil ==6.0.0
ptyprocess ==0.7.0
pure_eval ==0.2.3
pyarrow ==17.0.0
pybiomart ==0.2.0
pycodestyle ==2.12.1
pycparser ==2.22
pydeseq2 ==0.4.11
pydocstyle ==6.3.0
pyflakes ==3.2.0
pyparsing ==3.1.4
pyproject_hooks ==1.2.0
pytest ==8.3.3
python-dateutil ==2.9.0.post0
pytz ==2024.2
pyzmq ==26.2.0
referencing ==0.35.1
requests ==2.32.3
requests-cache ==1.2.1
requests-toolbelt ==1.0.0
rich ==13.8.1
rpds-py ==0.20.0
scikit-learn ==1.5.2
scipy ==1.14.1
seaborn ==0.13.2
shellingham ==1.5.4
six ==1.16.0
snowballstemmer ==2.2.0
soupsieve ==2.6
sphinx-autoapi ==3.3.2
sphinx-autodoc-typehints ==2.4.4
sphinx-basic-ng ==1.0.0b2
sphinx-copybutton ==0.5.2
sphinx-rtd-theme ==0.5.1
sphinx_design ==0.6.1
sphinxcontrib-applehelp ==2.0.0
sphinxcontrib-devhelp ==2.0.0
sphinxcontrib-htmlhelp ==2.1.0
sphinxcontrib-jsmath ==1.0.1
sphinxcontrib-qthelp ==2.0.0
sphinxcontrib-serializinghtml ==2.0.0
stack-data ==0.6.3
stevedore ==5.3.0
threadpoolctl ==3.5.0
tinycss2 ==1.3.0
tokenize-rt ==6.0.0
tomli_w ==1.0.0
tomlkit ==0.13.2
toolz ==0.12.1
tornado ==6.4.1
tqdm ==4.66.5
traitlets ==5.14.3
trove-classifiers ==2024.9.12
typeguard ==4.3.0
types-pytz ==2024.2.0.20240913
typing_extensions ==4.12.2
tzdata ==2024.2
url-normalize ==1.4.3
urllib3 ==2.2.3
virtualenv ==20.26.6
wcwidth ==0.2.13
webencodings ==0.5.1
widgetsnbextension ==4.0.13
wrapt ==1.16.0
xarray ==2024.9.0
xattr ==1.1.0

pytximport

Science Score: 39.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

pytximport

Manuscript & Documentation

Installation

Quick Start

Motivation

Citation

License

Differences

Contributing

Running the tests locally

Building the documentation locally

Development status

Data sources

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: pytximport

Rankings

Maintainers (1)

Dependencies