https://github.com/clinical-genomics/housekeeper

File data orchestrator

https://github.com/clinical-genomics/housekeeper

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.4%) to scientific vocabulary

Keywords

data file orchestrator

Keywords from Contributors

bioconda genomics coverage sambamba bioinformatics
Last synced: 5 months ago · JSON representation

Repository

File data orchestrator

Basic Info
  • Host: GitHub
  • Owner: Clinical-Genomics
  • License: mit
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 1.61 MB
Statistics
  • Stars: 2
  • Watchers: 10
  • Forks: 0
  • Open Issues: 2
  • Releases: 79
Topics
data file orchestrator
Created over 9 years ago · Last pushed 10 months ago
Metadata Files
Readme Changelog License Codeowners

README.md

Housekeeper

Housekeeper tests Coverage Status CodeFactor Code style: black

Store, tag, fetch, and archive files with ease 🗃

Housekeeper is a tool that aims to provide:

  • a backend for storing versioned bundles of files
  • different interfaces (Python, CLI, REST) for fetching files based on tags
  • a way to backup and retrieve bundles from long-term storage

Installation

Housekeeper written in Python 3.6+ and is available on the Python Package Index (PyPI).

bash poetry install

If you would like to install the latest development version:

bash git clone https://github.com/Clinical-Genomics/housekeeper cd housekeeper poetry install

Contributing

Housekeeper is using GitHub flow branching model as described in our development manual.

Documentation

Command line interface

Config file

Housekeeper supports a basic YAML config. The following options are supported:

```yaml

database: mysql+pymysql://userName:passWord@domain.com/database root: /path/to/root/dir ```

The root option is used to store files within the Housekeeper context.

Command: init

Setup (or reset) the database. It will simply setup all the tables in the database. You can reset an existing database by using the --reset option.

bash housekeeper --database "sqlite:///hk.sqlite3" init Success! New tables: bundle, file, file_tag_link, tag, version

Command: include

Include (hard-link) all files of an existing bundle version into Housekeeper and the root path.

bash housekeeper myBundle

This will only work if the bundle only has a single version which can be "imported". If you want to import a specific version of a bundle you can use the --version option.

Command: delete files

Delete files that are not on disk anymore like his: housekeeper delete files --tag fastq --notondisk

Remove all bam files before a certain date: housekeeper delete files --tag bam --before 2017-06-15

Remove fastq files from a flowcell: housekeeper delete files --tag fastq --tag H0HKKALXX

It'll always ask for confirmation, unless you add --yes: housekeeper delete files --bundle sillyfish --yes

If you do not provide a --tag or --bundle, essentially deleting everything, the function will not let you do that.

Owner

  • Name: Clinical Genomics
  • Login: Clinical-Genomics
  • Kind: organization
  • Location: Stockholm, Sweden

GitHub Events

Total
  • Create event: 10
  • Release event: 5
  • Issues event: 4
  • Delete event: 5
  • Issue comment event: 32
  • Push event: 22
  • Pull request review comment event: 2
  • Pull request review event: 6
  • Pull request event: 11
Last Year
  • Create event: 10
  • Release event: 5
  • Issues event: 4
  • Delete event: 5
  • Issue comment event: 32
  • Push event: 22
  • Pull request review comment event: 2
  • Pull request review event: 6
  • Pull request event: 11

Committers

Last synced: almost 3 years ago

All Time
  • Total Commits: 472
  • Total Committers: 16
  • Avg Commits per committer: 29.5
  • Development Distribution Score (DDS): 0.286
Top Committers
Name Email Commits
Robin Andeer r****r@g****m 337
Kenny Billiau K****u@s****e 60
Måns Magnusson m****n@s****e 22
Kenny Billiau k****u@s****e 16
Patrik Grenfeldt p****t@s****e 8
Barry Stokman b****n@s****e 6
Clinical Genomics Bot c****m@g****m 6
Sebastian Diaz j****a@s****e 3
Henrik Stranneheim h****m@s****e 3
Sebastian Allard s****d@s****e 3
barrystokman b****n@g****m 2
Barry Stokman 2****n@u****m 2
hiseq clinical h****l@c****e 1
Sebastian Allard s****s@g****m 1
Vincent Janvid v****d@s****e 1
Mikael Laaksonen m****n@s****e 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 61
  • Total pull requests: 130
  • Average time to close issues: 12 months
  • Average time to close pull requests: about 1 month
  • Total issue authors: 23
  • Total pull request authors: 15
  • Average comments per issue: 2.3
  • Average comments per pull request: 2.38
  • Merged pull requests: 109
  • Bot issues: 0
  • Bot pull requests: 4
Past Year
  • Issues: 0
  • Pull requests: 13
  • Average time to close issues: N/A
  • Average time to close pull requests: 4 days
  • Issue authors: 0
  • Pull request authors: 4
  • Average comments per issue: 0
  • Average comments per pull request: 2.69
  • Merged pull requests: 12
  • Bot issues: 0
  • Bot pull requests: 1
Top Authors
Issue Authors
  • seallard (14)
  • moonso (7)
  • diitaz93 (5)
  • robinandeer (4)
  • Vince-janv (3)
  • emiliaol (3)
  • islean (3)
  • henrikstranneheim (2)
  • barrystokman (2)
  • emmser (2)
  • ChrOertlin (2)
  • annaengstrom (2)
  • Mropat (2)
  • moahaegglund (2)
  • karlnyr (1)
Pull Request Authors
  • henrikstranneheim (40)
  • seallard (37)
  • islean (16)
  • moonso (11)
  • Vince-janv (10)
  • ingkebil (9)
  • patrikgrenfeldt (5)
  • diitaz93 (4)
  • dependabot[bot] (3)
  • ChrOertlin (2)
  • mikaell (2)
  • pbiology (1)
  • Mropat (1)
  • beatrizsavinhas (1)
  • barrystokman (1)
Top Labels
Issue Labels
Bug (10) Effort S (9) Gain S (9) Enhancement (9) Urgency S (4) Effort M (3) Needs Refinement (3) Refactoring (3) Gain L (2) Good first issue (2) Urgency M (1) on hold (1) Gain M (1)
Pull Request Labels
Enhancement (13) Project Task (6) Effort S (1) Gain S (1) Urgency S (1)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 1,826 last-month
  • Total dependent packages: 1
  • Total dependent repositories: 2
  • Total versions: 108
  • Total maintainers: 4
pypi.org: housekeeper

File data orchestrator

  • Versions: 108
  • Dependent Packages: 1
  • Dependent Repositories: 2
  • Downloads: 1,826 Last month
Rankings
Dependent packages count: 4.7%
Downloads: 4.9%
Dependent repos count: 11.6%
Average: 15.8%
Stargazers count: 27.8%
Forks count: 29.8%
Maintainers (4)
Last synced: 6 months ago

Dependencies

requirements-dev.txt pypi
  • pytest * development
  • pytest-mock * development
requirements.txt pypi
  • Alchy *
  • Click <7
  • SQLAlchemy *
  • coloredlogs *
  • marshmallow *
  • pyyaml *
  • rich *
.github/workflows/build_and_publish.yml actions
  • actions/checkout v2.6.0 composite
  • actions/setup-python v2 composite
  • docker/build-push-action v3.2.0 composite
  • pypa/gh-action-pypi-publish master composite
.github/workflows/coveralls.yml actions
  • actions/checkout v2.6.0 composite
  • actions/setup-python v4.3.1 composite
.github/workflows/pythonapp.yml actions
  • actions/checkout v2.6.0 composite
  • actions/setup-python v4.3.1 composite
  • samuelmeuli/lint-action v1 composite
Dockerfile docker
  • python 3.7-slim build