BioProv - A provenance library for bioinformatics workflows

BioProv - A provenance library for bioinformatics workflows - Published in JOSS (2021)

https://github.com/vinisalazar/bioprov

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in JOSS metadata
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

biological-data biopython prov provenance python w3c-prov
Last synced: 4 months ago · JSON representation

Repository

A provenance library for bioinformatics workflows 🧬 🔀 📝

Basic Info
Statistics
  • Stars: 14
  • Watchers: 2
  • Forks: 1
  • Open Issues: 0
  • Releases: 9
Topics
biological-data biopython prov provenance python w3c-prov
Created over 5 years ago · Last pushed about 4 years ago
Metadata Files
Readme Changelog Contributing License

README.md

BioProv - W3C-PROV provenance documents for bioinformatics

Code | PyPI Version | lint | Code style ---------------|--|--|-- Tests | Build Status | tests | Coverage Status Docs | Docs status | License | binder

BioProv is a Python library for W3C-PROV representation of bioinformatics workflows. It enables you to quickly write workflows and to describe relationships between samples, files, users and programs.

Please see the tutorials for a more detailed introduction and visit ReadTheDocs for the complete documentation.

Quickstart

```py

import bioprov as bp

Create samples and file objects

sample = bp.Sample("mysample") genome = bp.File("mysample.fasta", "genome") sample.add_files(genome)

Create programs

output = sample.files["blastout"] = bp.File("mysample.blast.tsv", "blastout") blastn = bp.Program("blastn", params={"-query": sample.files["genome"], "-db": "mydb.fasta", "-out": output} ) sample.add_programs(blastn)

Run programs

sample.run_programs()

Save your project

proj = bp.Project((sample,), tag="exampleproject") proj.tojson()

Create PROV documents

prov = bp.BioProvDocument(proj)

Save in PROVN or graphical format

prov.writeprovn() # human-readable text format prov.dot.writepdf() # graphical format ```

BioProv also has a command-line application to run preset workflows.

``` $ bioprov -h usage: bioprov [-h] [--showconfig | --showdb | --cleardb | -v | -l] {genomeannotation,blastn,kaiju} ...

BioProv command-line application. Choose a command to begin.

optional arguments: -h, --help show this help message and exit --showconfig Show location of config file. --showdb Show location of database file. --clear_db Clears all records in database. -v, --version Show BioProv version -l, --list List Projects in the BioProv database.

workflows: {genome_annotation,blastn,kaiju}

```

BioProv is built with the Biopython and Pandas libraries.

You can import data into BioProv using Pandas objects.

```py

Read csv straight into BioProv

samples = bp.readcsv("mydataframe.tsv", sep="\t", sequencefile_cols="assembly")

Alternatively, use a pandas DataFrame

df = pd.readcsv("mydataframe.tsv", sep="\t")

[...] manipulate your df

df["assembly"] = "assembly_directory/" + df["assembly"]

Now load from your df

project = bp.fromdf(df, sequencefilecols="assembly", sourcefile="mydataframe.tsv")

samples becomes a Project dict-like object

sample1 = project['sample1']

You can also export your sample and associated files and attributes as a dataframe

project.to_csv() ```

Installation

```sh

Install from pip

$ pip install bioprov

Install from conda

$ conda install -c conda-forge -c bioconda bioprov

Install from source

$ git clone https://github.com/vinisalazar/bioprov && cd bioprov; # download $ conda env create -f environment.yaml && conda activate bioprov; # install dependencies $ pip install . && pytest; # install and test ```

Important! BioProv requires Prodigal to be tested. Otherwise tests will fail.

Contributions are welcome!

BioProv is in active development and no warranties are provided (please see the License).

Dependencies

BioProv requires the follow dependencies to run. Also see the setup and environment files.

  • biopython
  • coolname
  • coveralls
  • dataclasses
  • pandas
  • prodigal
  • prov
  • provstore-api
  • pydot
  • pytest
  • pytest-cov
  • tqdm
  • tinydb

Owner

  • Name: Vini Salazar
  • Login: vinisalazar
  • Kind: user
  • Location: Melbourne / Floripa

Marine microbiology and open source bioinformatics

JOSS Publication

BioProv - A provenance library for bioinformatics workflows
Published
November 09, 2021
Volume 6, Issue 67, Page 3622
Authors
Vinícius W. Salazar ORCID
Department of Systems and Computer Engineering, COPPE, Federal University of Rio de Janeiro, Institute of Biology, Federal University of Rio de Janeiro
João Vitor Ferreira Cavalcante ORCID
Bioinformatics Multidisciplinary Environment - BioME, IMD, Federal University of Rio Grande do Norte
Daniel de Oliveira ORCID
Institute of Computing, Fluminense Federal University
Fabiano Thompson ORCID
Institute of Biology, Federal University of Rio de Janeiro
Marta Mattoso ORCID
Department of Systems and Computer Engineering, COPPE, Federal University of Rio de Janeiro
Editor
Jacob Schreiber ORCID
Tags
W3C-PROV BioPython pipelines reproducibility PROV JSON

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 733
  • Total Committers: 2
  • Avg Commits per committer: 366.5
  • Development Distribution Score (DDS): 0.065
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Vini Salazar v****s@g****m 685
jvfe j****v@g****m 48

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 8
  • Total pull requests: 44
  • Average time to close issues: 4 days
  • Average time to close pull requests: 4 days
  • Total issue authors: 3
  • Total pull request authors: 2
  • Average comments per issue: 2.63
  • Average comments per pull request: 0.59
  • Merged pull requests: 43
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • jvfe (4)
  • vinisalazar (3)
  • maximtrp (1)
Pull Request Authors
  • vinisalazar (23)
  • jvfe (21)
Top Labels
Issue Labels
enhancement (2) good first issue (2)
Pull Request Labels
hacktoberfest-accepted (9)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 210 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 26
  • Total maintainers: 1
pypi.org: bioprov

BioProv - Provenance capture for bioinformatics workflows

  • Versions: 26
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 210 Last month
Rankings
Dependent packages count: 10.0%
Stargazers count: 15.6%
Forks count: 19.1%
Average: 19.5%
Dependent repos count: 21.7%
Downloads: 31.0%
Maintainers (1)
Last synced: 4 months ago

Dependencies

environment.yml conda
  • biopython
  • coolname
  • coveralls
  • dataclasses
  • pandas
  • prodigal
  • prov
  • provstore-api
  • pydot
  • pytest
  • pytest-cov
  • tinydb
  • tqdm
docs/requirements.txt pypi
  • myst_parser *
  • sphinx_rtd_theme *
setup.py pypi
  • biopython *
  • coolname *
  • coveralls *
  • dataclasses *
  • pandas *
  • prov *
  • provstore-api *
  • pydot *
  • pytest *
  • pytest-cov *
  • tinydb *
  • tqdm *