pocpbenchmark_manuscript

Code and manuscript for benchmarking proteins alignment tools for improved genus delineation using the Percentage Of Conserved Proteins (POCP)

https://github.com/clavellab/pocpbenchmark_manuscript

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 6 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.2%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Code and manuscript for benchmarking proteins alignment tools for improved genus delineation using the Percentage Of Conserved Proteins (POCP)

Basic Info
  • Host: GitHub
  • Owner: ClavelLab
  • License: gpl-3.0
  • Language: TeX
  • Default Branch: main
  • Size: 2.11 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created almost 2 years ago · Last pushed about 1 year ago
Metadata Files
Readme License Citation

README.md

Analyses workflow for POCP benchmark manuscript

popcbenchmark_manuscript contains the workflow for analyzing data produced by our benchmark ClavelLab/pocpbenchmark. We set out to compare proteins alignment tools for improved genus delineation using the Percentage Of Conserved Proteins (POCP).

A preprint of our work is available at bioRxiv:

Robust genome-based delineation of bacterial genera. Charlie Pauvert, Thomas C.A. Hitch, Thomas Clavel. bioRxiv 2025.03.17.643616; doi: https://doi.org/10.1101/2025.03.17.643616

Setup the environment for the workflow

These analyses were conducted in R 4.3.1 and in Rstudio. We recommend setting up R and specific versions using rig, and getting Rstudio from Posit. We also use renv for reproducible environment, which can be installed in R with install.packages("renv").

  1. Open Rstudio and create a new project via "File > New Project..."
  2. Select "Version Control" and then "Git"
    1. Type https://github.com/ClavelLab/pocpbenchmark_manuscript in Repository URL.
    2. Make sure the project is going to be created in the correct subdirectory on your computer, or else edit accordingly
    3. Click on "Create project"

If you comfortable with the command line and git, clone the repository either with SSH or HTTPS in a suitable location.

  1. Rstudio warns you that One or more packages recorded in the lockfile are not installed because a couple of R packages and dependencies are needed.
    1. Install the dependencies by typing renv::restore() in the Console and agree to the installation of the packages.
    2. Check that all dependencies are set by typing renv::status() in the Console where you should have No issues found

Our analysis workflow is orchestrated by targets and is composed of two subworkflows.

Prepare the data for the analysis

[!NOTE] You can skip to the next section if you want to start the workflow from already prepared files!

  1. Download the raw output files from the workflow using the "Download all" button: https://doi.org/10.5281/zenodo.14974869
  2. Uncompress the zip archive within your project
  3. Create a data_benchmark folder within your project.
  4. Move all the zip files downloaded from zenodo (benchmark-gtdb-f__*.zip) to data_benchmark.
  5. Ensure the two csv files are at the root of your project.
  6. Run the workflow with the following command:

r Sys.setenv(TAR_PROJECT = "prepare_pocpbenchmark_data") targets::tar_make()

Analyze the data and build the manuscript

If you skipped the first workflow, you need to download the cleaned and formatted POCP/POCPu values and metadata tables for analysis from https://doi.org/10.5281/zenodo.14975029. These are the files you would have generated with the previous section.

  1. Run the workflow with the following command:

r Sys.setenv(TAR_PROJECT = "analyze_pocpbenchmark_data") targets::tar_make()

The manuscript is then available in the _manuscript folder, both as a HTML document (index.html) and a docx document. The figures are generated in the figures folder.

Owner

  • Name: The Clavel lab
  • Login: ClavelLab
  • Kind: organization
  • Location: Germany

This is the official GitHub account for the research group of Prof. Thomas Clavel.

Citation (CITATION.cff)

cff-version: 1.2.0
message: Please cite the following work when using this software.
title: Code and manuscript for benchmarking proteins alignment tools for improved genus delineation using the Percentage Of Conserved Proteins (POCP)
url: https://github.com/ClavelLab/pocpbenchmark_manuscript
authors:
    - family-names: Pauvert
      given-names: Charlie
      orcid: https://orcid.org/0000-0001-9832-2507
preferred-citation:
  type: article
  authors:
    - family-names: Pauvert
      given-names: Charlie
      orcid: https://orcid.org/0000-0001-9832-2507
    - family-names: Hitch
      given-names: Thomas C.A.
      orcid: https://orcid.org/0000-0003-2244-7412
    - family-names: Clavel
      given-names: Thomas
      orcid: https://orcid.org/0000-0002-7229-5595
  doi: 10.1101/2025.03.17.643616
  identifiers:
    - type: doi
      value: 10.1101/2025.03.17.643616
    - type: url
      value: http://dx.doi.org/10.1101/2025.03.17.643616
  title: Robust genome-based delineation of bacterial genera
  url: http://dx.doi.org/10.1101/2025.03.17.643616
  database: Crossref
  date-published: 2025-03-17
  year: 2025
  month: 3
  publisher:
    name: Cold Spring Harbor Laboratory

GitHub Events

Total
  • Push event: 4
  • Public event: 1
  • Create event: 1
Last Year
  • Push event: 4
  • Public event: 1
  • Create event: 1

Issues and Pull Requests

Last synced: about 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels