https://github.com/biocore/qadabra

Snakemake workflow for comparison of differential abundance ranks

Science Score: 33.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: nature.com
✓
Committers with academic emails
1 of 4 committers (25.0%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (17.3%) to scientific vocabulary

Keywords

bioinformatics differential-abundance machine-learning metagenomics microbiome pipeline snakemake workflow

Last synced: 9 months ago · JSON representation

Repository

Snakemake workflow for comparison of differential abundance ranks

Basic Info

Host: GitHub
Owner: biocore
License: bsd-3-clause
Language: Python
Default Branch: main
Homepage:
Size: 15.8 MB

Statistics

Stars: 13
Watchers: 5
Forks: 4
Open Issues: 11
Releases: 4

Topics

bioinformatics differential-abundance machine-learning metagenomics microbiome pipeline snakemake workflow

Created about 4 years ago · Last pushed about 2 years ago

Metadata Files

Readme License

Qadabra: Quantitative Analysis of Differential Abundance Ranks

(Pronounced ka-da-bra)

Qadabra is a Snakemake workflow for running and comparing several differential abundance (DA) methods on the same microbiome dataset.

Importantly, Qadabra focuses on both FDR corrected p-values and feature ranks and generates visualizations of differential abundance results.

Schematic

Please note this software is currently a work in progress. Your patience is appreciated as we continue to develop and enhance its features. Please leave an issue on GitHub should you run into any errors.

Installation

Option 1: Pip install from PyPI

pip install qadabra

Qadabra requires the following dependencies: * snakemake * click * biom-format * pandas * numpy * cython * iow

Check out the tutorial for more in-depth instructions on installation.

Option 2: Install from source (this GitHub repository)

Prerequisites

Before you begin, ensure you have Git and the necessary build tools installed on your system.

Clone the Repository git clone https://github.com/biocore/qadabra.git

Navigate to repo root directory where the setup.py file is located and then install QADABRA in editable mode cd qadabra pip install -e .

Usage

1. Creating the workflow directory

Qadabra can be used on multiple datasets at once. First, we want to create the workflow directory to perform differential abundance with all methods:

qadabra create-workflow --workflow-dest <directory_name>

This command will initialize the workflow, but we still need to point to our dataset(s) of interest.

2. Adding a dataset

We can add datasets one-by-one with the add-dataset command:

qadabra add-dataset \ --workflow-dest <directory_name> \ --table <directory_name>/data/table.biom \ --metadata <directory_name>/data/metadata.tsv \ --tree <directory_name>/data/my_tree.nwk \ --name my_dataset \ --factor-name case_control \ --target-level case \ --reference-level control \ --confounder confounding_variable(s) <confounding_var> \ --verbose

Let's walkthrough the arguments provided here, which represent the inputs to Qadabra:

workflow-dest: The location of the workflow that we created earlier
table: Feature table (features by samples) in BIOM format
metadata: Sample metadata in TSV format
tree: Phylogenetic tree in .nwk or other tree format (optional)
name: Name to give this dataset
factor-name: Metadata column to use for differential abundance
target-level: The value in the chosen factor to use as the target
reference-level: The reference level to which we want to compare our target
confounder: Any confounding variable metadata columns (optional)
verbose: Flag to show all preprocessing performed by Qadabra

Your dataset should now be added as a line in my_qadabra/config/datasets.tsv.

You can use qadabra add-dataset --help for more details. To add another dataset, just run this command again with the new dataset information.

3. Running the workflow

The previous commands will create a subdirectory, my_qadabra in which the workflow structure is contained. From the command line, execute the following to start the workflow: snakemake --use-conda --cores <number of cores preferred> <other options> Please read the Snakemake documentation for how to run Snakemake best on your system.

When this process is completed, you should have directories figures, results, and log. Each of these directories will have a separate folder for each dataset you added.

4. Generating a report

After Qadabra has finished running, you can generate a Snakemake report of the workflow with the following command:

snakemake --report report.zip

This will create a zipped directory containing the report. Unzip this file and open the report.html file to view the report containing results and visualizations in your browser.

Tutorial

See the tutorial page for a walkthrough on using Qadabra workflow with a microbiome dataset.

FAQs

Coming soon: An FAQs page of commonly asked question on the statistics and code pertaining to Qadabra.

Citation

The manuscript for Qadabra is currently in progress. Please cite this GitHub page if Qadabra is used for your analysis. This project is licensed under the BSD-3 License. See the license file for details.

Owner

Name: biocore
Login: biocore
Kind: organization
Location: Cyberspace

Website: http://biocore.github.io/
Repositories: 76
Profile: https://github.com/biocore

Collaboratively developed bioinformatics software.

GitHub Events

Total

Last Year

Committers

Last synced: almost 2 years ago

All Time

Total Commits: 96
Total Committers: 4
Avg Commits per committer: 24.0
Development Distribution Score (DDS): 0.375

Past Year

Commits: 15
Committers: 4
Avg Commits per committer: 3.75
Development Distribution Score (DDS): 0.333

Top Committers

Name	Email	Commits
Gibraan Rahman	g**n@e**u	60
Yang Chen	6****2	33
Gibraan Rahman	g**n@g**m	2
AmandaBirmingham	l**s@i**m	1

Committer Domains (Top 20 + Academic)

imladris.com: 1 eng.ucsd.edu: 1

Issues and Pull Requests

Last synced: about 1 year ago

All Time

Total issues: 41
Total pull requests: 51
Average time to close issues: 23 days
Average time to close pull requests: 16 days
Total issue authors: 8
Total pull request authors: 3
Average comments per issue: 0.85
Average comments per pull request: 0.14
Merged pull requests: 47
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

gibsramen (3)
411an13 (3)
sherlyn99 (3)
ahdilmore (2)
bdpessem (1)

Pull Request Authors

yangchen2 (9)
gibsramen (5)
AmandaBirmingham (1)

Top Labels

Issue Labels

Pull Request Labels

Dependencies

.github/workflows/main.yml actions

actions/checkout v2 composite
conda-incubator/setup-miniconda v2 composite

setup.py pypi

biom-format *
click *
iow *
pandas >=1.0.0
scikit-bio *

https://github.com/biocore/qadabra

Science Score: 33.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Qadabra: Quantitative Analysis of Differential Abundance Ranks

(Pronounced ka-da-bra)

Installation

Option 1: Pip install from PyPI

Option 2: Install from source (this GitHub repository)

Usage

1. Creating the workflow directory

2. Adding a dataset

3. Running the workflow

4. Generating a report

Tutorial

FAQs

Citation

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies