Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.9%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: molinerisLab
- Language: Python
- Default Branch: main
- Size: 408 KB
Statistics
- Stars: 1
- Watchers: 0
- Forks: 2
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
_______ ______ _______
| \ / \ | \
| ▒▒▒▒▒▒▒\| ▒▒▒▒▒▒\| ▒▒▒▒▒▒▒\
| ▒▒ | ▒▒| ▒▒__| ▒▒| ▒▒__/ ▒▒
| ▒▒ | ▒▒| ▒▒ ▒▒| ▒▒ ▒▒
| ▒▒ | ▒▒| ▒▒▒▒▒▒▒▒| ▒▒▒▒▒▒▒
| ▒▒__/ ▒▒| ▒▒ | ▒▒| ▒▒
| ▒▒ ▒▒| ▒▒ | ▒▒| ▒▒
\▒▒▒▒▒▒▒ \▒▒ \▒▒ \▒▒
Data Analysis Project Management
_____________________________________
DAP is a project management tool, suited for bioinformatics or data anaysis project in general, with a focus on Snakemake workflows.
DAP encapsulates some conventions on how to organize Snakemake projects and offers as set of tools to manage them.
DAP conventions aim to: * Facilitate project versioning * Creation, editing of versions, isolation of version-specific logic and configuration. * Push toward sustainable use of conda environments. * one project <--> one conda environment * environment stored locally inside the project tree. * environment automatically activated upon user entering the project directory.
Usage
Installation
Linux and Mac OSX are supported.
Install dap with conda
Dap can be installed from Anaconda: https://anaconda.org/molinerislab/dap with the following command
conda install -c molinerislab dap
we advise to install it in the base environment or in a specific one. This environment will be needed to create new projects.
Create a new project
dap create creates a new project in the current working directory. It creates:
* A project directory structure.
* A set of templates Snakefiles and config files.
* A local conda environment.
* A git repository.
dap create [--source_env=MyEnv.yaml] [--remote-git-repo=https://..] ProjectName ProjectVersion
- ProjectName: the name of the project, which will correspond to the directory and the git repository created.
- ProjectVersion: initial version of the project.
- [--source_env=MyEnvironment]: optional, yaml file of an existing conda environment. The environment will be cloned. If not specified, a new, empty project environment will be created.
- [--remote-git-repo=https://..]: optional, the project repository is connected to the remote repository.
Work inside a project
Once the project is created with dap create, direnv is used to automatically set up the workspace context upon user entering the project directory.
* Upon first entering the project directory, the user needs to authorize direnv with direnv allow.
* Once direnv is allowed, every time the user enters the project:
* PRJ_ROOT will point to the root of the project.
* The system PATH will include PRJ_ROOT/workflow/scripts.
* The project conda environment is activated.
Create new version
dap clone creates a new project's version by cloning an existing one. It needs to be executed inside the project directory.
dap clone [--link-All-Data] SourceVersion NewVersion
* SourceVersion: Name of the version to be cloned.
* NewVersion: Name of the new version.
* [--link-All-Data]: If enabled, copy links to files outside of the project as well - es. links to datasets.
The command creates a new directory worspaces/{NewVersion}. Here, for each link inside worspaces/{OldVersion}: * If the link refers to a non-version specific file: the link is copied, the original file is not changed. * If the link refers to a version specific file: the version-specific file is copied, with updated name, and a link to the new file is created. By convention, version specific files' names end with _{VersionName}.
Keep conda environments updated for git users
DAP projects use a locally stored conda environment, which can be found at workflow/env/env.
The local environment is not included in the git repository by default, which can cause issues with reproducibility: as the user develops the pipeline, new packages are installed. But if these packages are not carefully tracked, new users cloning the repository will find a broken pipeline.
but it can be reconstructed from the env.yaml file.
To help managing environments, dap offers two commands:
* dap export-env: exports current environment into an env.yaml file, which is included in the git repository.
* dap build-env: builds the environment from the env.yaml file.
Make tests
Trying to push for sustainable workflow definition, DAP encourages users to build tests for their versions.
Tests have two objectives: to validate the pipeline and to serve as examples for new users.
Each test should exhibit these characteristics: * It runs with a single command. * It requires no configuration before running. * It is self-contained, no need to provide data.. * It is as fast and lightweight as possible.
DAP provides a command
dap make-test <template_version> <test_name>
This command creates a test using a given version as a template. It works similarly to dap clone but with some differences:
* Links to files outside the workflow are not allowed. Tests should be ready to run on different machines after git clone.
* Files are copied: while standard version cloning does not copy files, tests are self contained, so input files for the pipeline are copied to the test directory.
Warning: files inside the tests directory are included in the git repository. It is recommended to set up tests with small input files.
Dap clone and sub-versions
SourceVersion and NewVersion might refer to subfolders inside the dataset/ directory, using the common '/' syntax. For example it's possible to have a version named humans/v1. In this case the following operations are allowed: * dap clone humans/v1 humans/v2 --> simply clones the humans/v1 version into humans/v2. * dap clone humans/v1 baldmonkeys/v1 --> clones the humans/v1 version into baldmonkeys/v1. If baldmonkey directory does not exist, it creates it. * dap clone humans baldmonkeys --> clones the entire humans directory into the new directory. Any version inside the human directory will be cloned. These operations are not allowed: * dap clone humans humans/v3 --> cannot clone entire directory into a subdirectory of it. * dap clone humans/v1 humans --> cannot clone directory into parent (or ancestor) directory.
The DAP tree
The directory structure of a DAP project is made of two main components in its root: * The Workflow directory, containing the entire project's logic and configuration, both global and version-specific. * The Workflow directory is not where the user stores the input files, results and it's not where the user works. * The results directory is where versions are kept and where the user running the workflow works.

The workflow directory has sub-directories for the configuration files, environment, rules and scripts.
The results directory has sub-directories for all the versions created. Inside each version: * Snakefile is a symbolic link to workflows/rules/Snakefile * Snakefile_versioned.sk is a symbolic link to workflows/rules/Snakefileversioned{VERSION_NAME} * config_global.yaml is a symbolic link to workflows/config/config_global.yaml * config.yaml is a symbolic link to workflows/config/config{VERSIONNAME}.yaml.
Basically, for rules and configuration, the user finds both global and version-specific files inside its version's directory. These files are links to files stored in the workflow directory, and the original files are managed by DAP.
Version-specific rules and configurations always override global ones.
Convert from old dap projects
The directory structure of DAP has been updated, renaming some directories.
In order to work with previous projects, a command dap convert is offered. This command updates the directory structure and symbolic links inside the project.
Owner
- Name: molinerisLab
- Login: molinerisLab
- Kind: organization
- Location: Italy
- Repositories: 2
- Profile: https://github.com/molinerisLab
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: DAP
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Marco
family-names: Masera
email: marco.masera@unito.it
affiliation: Università di Torino
- given-names: Chiara
family-names: Cicconetti
email: chiara.cicconetti@unito.it
affiliation: Università di Torino
- given-names: Ivan
family-names: Molineris
email: ivan.molineris@unito.it
affiliation: Università di Torino
orcid: 'https://orcid.org/0000-0003-2102-0804'
repository-code: 'https://github.com/molinerisLab/dap'
license: AGPL-3.0
GitHub Events
Total
- Watch event: 1
- Push event: 18
- Create event: 1
Last Year
- Watch event: 1
- Push event: 18
- Create event: 1