https://github.com/clavellab/maldipickr_manuscript

Worfklow for the data analysis in the maldipickr manuscript

https://github.com/clavellab/maldipickr_manuscript

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 5 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.4%) to scientific vocabulary
Last synced: 6 months ago · JSON representation

Repository

Worfklow for the data analysis in the maldipickr manuscript

Basic Info
  • Host: GitHub
  • Owner: ClavelLab
  • License: gpl-3.0
  • Language: R
  • Default Branch: main
  • Size: 62.5 KB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 8 months ago · Last pushed 8 months ago
Metadata Files
Readme License

README.md

workflow for maldipickr manuscript

This repository contains the code for the comparison of dereplication approaches as part of the manuscript describing the R package maldipickr (CRAN) using data at DOI.

Setup the environment for the workflow

These analyses were conducted in R 4.3.1 and Python 3.9.13 orchestrated from Rstudio. We recommend setting up R and specific versions using rig, and getting Rstudio from Posit. We also use renv for reproducible environment, which can be installed in R with install.packages("renv") and uv as a very fast Python package manager (as one of the tool in the benchmark uses Python).

  1. Make sure you have installed uv
  2. Open Rstudio and create a new project via "File > New Project..."
  3. Select "Version Control" and then "Git"
    1. Type https://github.com/ClavelLab/maldipickr_manuscript in Repository URL.
    2. Make sure the project is going to be created in the correct subdirectory on your computer, or else edit accordingly
    3. Click on "Create project"

If you comfortable with the command line and git, clone the repository either with SSH or HTTPS in a suitable location.

  1. Rstudio warns you that One or more packages recorded in the lockfile are not installed because a couple of R packages and dependencies are needed.
    1. Install the dependencies by typing renv::restore() in the Console and agree to the installation of the packages.
    2. Check that all dependencies are set by typing renv::status() in the Console where you should have No issues found

Run the workflow

Our analysis workflow is orchestrated by targets and can be run with the following command in the R console:

r targets::tar_make()

Overview of the workflow

This is the dependency graph of the different objects (i.e., targets) generated in R during the workflow.

mermaid graph TD x09f7694bb41893f8(["spede_code"]):::uptodate --> x7bc2998b7f8060b7(["spede_SPeDE_50<br>SPeDE 50"]):::uptodate xd5f24f93e9b6cd94(["spede_peaks"]):::uptodate --> x7bc2998b7f8060b7(["spede_SPeDE_50<br>SPeDE 50"]):::uptodate x4ab60fcd8b424f69(["spede_regrids"]):::uptodate --> x7bc2998b7f8060b7(["spede_SPeDE_50<br>SPeDE 50"]):::uptodate xf9a08efad4262897(["plot_dereplication"]):::uptodate --> x1b66e2a14a4c2014(["plot_dereplication_file"]):::uptodate x2cb78647d2229b81(["raw_biotyper_report_Biotyper<br>Biotyper"]):::uptodate --> x680a30458e74d9f2(["biotyper_report_Biotyper<br>Biotyper"]):::uptodate x7959b9683ac10b2a(["processed"]):::uptodate --> x92b7df9a3a4d38d2(["fm_interpolated"]):::uptodate x09f7694bb41893f8(["spede_code"]):::uptodate --> xae7b7dec01a37aa8(["spede_SPeDE_20<br>SPeDE 20"]):::uptodate xd5f24f93e9b6cd94(["spede_peaks"]):::uptodate --> xae7b7dec01a37aa8(["spede_SPeDE_20<br>SPeDE 20"]):::uptodate x4ab60fcd8b424f69(["spede_regrids"]):::uptodate --> xae7b7dec01a37aa8(["spede_SPeDE_20<br>SPeDE 20"]):::uptodate x13ba1745f23420fa(["clustering_metrics"]):::uptodate --> x1cce4c61ac164fbe(["metrics_results_tableS2"]):::uptodate x1d78c1c05eb8916a(["biotyper_report_clean_Biotyper<br>Biotyper"]):::uptodate --> xf2c733ac318b10e4(["picked_Biotyper<br>Biotyper"]):::uptodate x56a09f0edff10d0b(["sim_interpolated"]):::uptodate --> x6700a2551983a10c(["df_interpolated_maldipickr_79<br>maldipickr 79"]):::uptodate xf170f0c05dad2497(["clusters_maldipickr_79<br>maldipickr 79"]):::uptodate --> xddae61cd0a8f32e2(["picked_maldipickr_79<br>maldipickr 79"]):::uptodate x84822056ace4a9ce(["all_results_clean"]):::uptodate --> x13ba1745f23420fa(["clustering_metrics"]):::uptodate x7bc2998b7f8060b7(["spede_SPeDE_50<br>SPeDE 50"]):::uptodate --> x3b4a990b8c4b1912(["import_SPeDE_50<br>SPeDE 50"]):::uptodate xa63f30966279f049(["all_results"]):::uptodate --> x84822056ace4a9ce(["all_results_clean"]):::uptodate x6ab4545b996732f6(["isolate_table"]):::uptodate --> x84822056ace4a9ce(["all_results_clean"]):::uptodate x92b7df9a3a4d38d2(["fm_interpolated"]):::uptodate --> x56a09f0edff10d0b(["sim_interpolated"]):::uptodate x7d7b20e1a4c4ab31(["spectra_raw_noempty"]):::uptodate --> x5952e22b88c5a1e3(["spede_export"]):::uptodate x680a30458e74d9f2(["biotyper_report_Biotyper<br>Biotyper"]):::uptodate --> x1d78c1c05eb8916a(["biotyper_report_clean_Biotyper<br>Biotyper"]):::uptodate x09f7694bb41893f8(["spede_code"]):::uptodate --> x4ab60fcd8b424f69(["spede_regrids"]):::uptodate x5952e22b88c5a1e3(["spede_export"]):::uptodate --> x4ab60fcd8b424f69(["spede_regrids"]):::uptodate x7d7b20e1a4c4ab31(["spectra_raw_noempty"]):::uptodate --> x7959b9683ac10b2a(["processed"]):::uptodate xf2c733ac318b10e4(["picked_Biotyper<br>Biotyper"]):::uptodate --> xe6379debfdbe70f5(["results_Biotyper<br>Biotyper"]):::uptodate xd39b83c3fc857720(["spectra_raw"]):::uptodate --> x54be4f566da53823(["checks"]):::uptodate x09f7694bb41893f8(["spede_code"]):::uptodate --> xd5f24f93e9b6cd94(["spede_peaks"]):::uptodate x5952e22b88c5a1e3(["spede_export"]):::uptodate --> xd5f24f93e9b6cd94(["spede_peaks"]):::uptodate xddae61cd0a8f32e2(["picked_maldipickr_79<br>maldipickr 79"]):::uptodate --> x2fd3b756b6877101(["results_maldipickr_79<br>maldipickr 79"]):::uptodate x9a22b5e4d6937d40(["picked_maldipickr_92<br>maldipickr 92"]):::uptodate --> x22c57e7856533937(["results_maldipickr_92<br>maldipickr 92"]):::uptodate xacab802d0132ae58(["picked_SPeDE_50<br>SPeDE 50"]):::uptodate --> x3ee9d3cb8fb6b540(["results_SPeDE_50<br>SPeDE 50"]):::uptodate x934e30aad527086b(["isolate_table_file"]):::uptodate --> x6ab4545b996732f6(["isolate_table"]):::uptodate x54be4f566da53823(["checks"]):::uptodate --> x7d7b20e1a4c4ab31(["spectra_raw_noempty"]):::uptodate xd39b83c3fc857720(["spectra_raw"]):::uptodate --> x7d7b20e1a4c4ab31(["spectra_raw_noempty"]):::uptodate xfc3540f0c3e8645a(["spede_archive"]):::uptodate --> x09f7694bb41893f8(["spede_code"]):::uptodate xc0345d3922b7bdde(["picked_SPeDE_20<br>SPeDE 20"]):::uptodate --> xf1e15e217f13b69f(["results_SPeDE_20<br>SPeDE 20"]):::uptodate x83095fc2dc6147af(["import_SPeDE_20<br>SPeDE 20"]):::uptodate --> xc0345d3922b7bdde(["picked_SPeDE_20<br>SPeDE 20"]):::uptodate x3b4a990b8c4b1912(["import_SPeDE_50<br>SPeDE 50"]):::uptodate --> xacab802d0132ae58(["picked_SPeDE_50<br>SPeDE 50"]):::uptodate x6700a2551983a10c(["df_interpolated_maldipickr_79<br>maldipickr 79"]):::uptodate --> xf170f0c05dad2497(["clusters_maldipickr_79<br>maldipickr 79"]):::uptodate x7959b9683ac10b2a(["processed"]):::uptodate --> xf170f0c05dad2497(["clusters_maldipickr_79<br>maldipickr 79"]):::uptodate x76d85119dbdc6832(["df_interpolated_maldipickr_92<br>maldipickr 92"]):::uptodate --> x78e1a6127723e5ec(["clusters_maldipickr_92<br>maldipickr 92"]):::uptodate x7959b9683ac10b2a(["processed"]):::uptodate --> x78e1a6127723e5ec(["clusters_maldipickr_92<br>maldipickr 92"]):::uptodate x136e4e85e6851637(["raw_data"]):::uptodate --> xd39b83c3fc857720(["spectra_raw"]):::uptodate xae7b7dec01a37aa8(["spede_SPeDE_20<br>SPeDE 20"]):::uptodate --> x83095fc2dc6147af(["import_SPeDE_20<br>SPeDE 20"]):::uptodate x922a991fdf2e9cd1(["raw_data_archive"]):::uptodate --> x136e4e85e6851637(["raw_data"]):::uptodate x56a09f0edff10d0b(["sim_interpolated"]):::uptodate --> x76d85119dbdc6832(["df_interpolated_maldipickr_92<br>maldipickr 92"]):::uptodate x84822056ace4a9ce(["all_results_clean"]):::uptodate --> x4150f5fc1f91eae4(["clustering_results_tableS3"]):::uptodate xe6379debfdbe70f5(["results_Biotyper<br>Biotyper"]):::uptodate --> xa63f30966279f049(["all_results"]):::uptodate x2fd3b756b6877101(["results_maldipickr_79<br>maldipickr 79"]):::uptodate --> xa63f30966279f049(["all_results"]):::uptodate x22c57e7856533937(["results_maldipickr_92<br>maldipickr 92"]):::uptodate --> xa63f30966279f049(["all_results"]):::uptodate xf1e15e217f13b69f(["results_SPeDE_20<br>SPeDE 20"]):::uptodate --> xa63f30966279f049(["all_results"]):::uptodate x3ee9d3cb8fb6b540(["results_SPeDE_50<br>SPeDE 50"]):::uptodate --> xa63f30966279f049(["all_results"]):::uptodate x84822056ace4a9ce(["all_results_clean"]):::uptodate --> xf9a08efad4262897(["plot_dereplication"]):::uptodate x78e1a6127723e5ec(["clusters_maldipickr_92<br>maldipickr 92"]):::uptodate --> x9a22b5e4d6937d40(["picked_maldipickr_92<br>maldipickr 92"]):::uptodate classDef uptodate stroke:#000000,color:#555358,fill:#f0f0c9; classDef none stroke:#000000,color:#000000,fill:#94a4ac;

Owner

  • Name: The Clavel lab
  • Login: ClavelLab
  • Kind: organization
  • Location: Germany

This is the official GitHub account for the research group of Prof. Thomas Clavel.

GitHub Events

Total
  • Push event: 5
  • Create event: 2
Last Year
  • Push event: 5
  • Create event: 2

Dependencies

requirements.txt pypi
  • llvmlite ==0.39.1
  • numba ==0.56.0
  • numpy ==1.22.0
  • pandas ==1.4.4
  • python-dateutil ==2.9.0.post0
  • pytz ==2024.1
  • six ==1.16.0