sum-buddy

Generate and save checksums for all (or certain) contents of given directory.

https://github.com/imageomics/sum-buddy

Science Score: 75.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
    Organization imageomics has institutional domain (imageomics.osu.edu)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.7%) to scientific vocabulary

Keywords

checksums csv deduplication file-verification images metadata verifier
Last synced: 7 months ago · JSON representation ·

Repository

Generate and save checksums for all (or certain) contents of given directory.

Basic Info
  • Host: GitHub
  • Owner: Imageomics
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 68.4 KB
Statistics
  • Stars: 3
  • Watchers: 9
  • Forks: 0
  • Open Issues: 6
  • Releases: 3
Topics
checksums csv deduplication file-verification images metadata verifier
Created almost 2 years ago · Last pushed 9 months ago
Metadata Files
Readme License Citation Zenodo

README.md

sum-buddy DOI

Command-line package to generate a CSV with filepath, filename, and checksum for contents of a given directory or a single file.

Requirements

Python 3.10+

Installation

bash pip install sum-buddy

How it Works

Command Line Usage

``` usage: sum-buddy [-h] [-o OUTPUTFILE] [-i IGNORE_FILE | -H] [-a ALGORITHM] inputpath

Generate CSV with filepath, filename, and checksums for all files in a given directory (or a single file)

positional arguments: input_path File or directory to traverse for files

options: -h, --help show this help message and exit -o OUTPUTFILE, --output-file OUTPUTFILE Filepath for the output CSV file -i IGNOREFILE, --ignore-file IGNOREFILE Filepath for the ignore patterns file -H, --include-hidden Include hidden files -a ALGORITHM, --algorithm ALGORITHM Hash algorithm to use (default: md5; available: ripemd160, sha3224, sha512224, blake2b, sha384, sha256, sm3, sha3256, shake256, sha512, sha1, sha224, md5, md5-sha1, sha3384, sha3512, sha512256, shake128, blake2s) -l LENGTH, --length LENGTH Length of the digest for SHAKE (required) or BLAKE (optional) algorithms in bytes ```

Note: The available algorithms are determined by those available to hashlib and may vary depending on your system and OpenSSL version, so the set shown on your system with sum-buddy -h may be different from above. At a minimum, it should include: {blake2s, blake2b, md5, sha1, sha224, sha256, sha384, sha512, sha3_224, sha3_256, sha3_384, sha3_512, shake_128, shake_256}, which is given by hashlib.algorithms_guaranteed.

CLI Examples

  • Basic Usage: bash sum-buddy examples/example_content/

    Output console filepath,filename,md5 examples/example_content/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df examples/example_content/dir/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df

  • Output to File: bash sum-buddy --output-file examples/checksums.csv examples/example_content/

    Output console Calculating md5 checksums on examples/example_content/: 100%|███████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 1552.01it/s] md5 checksums for examples/example_content/ written to examples/checksums.csv bash cat examples/checksums.csv Output: console filepath,filename,md5 examples/example_content/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df examples/example_content/dir/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df

  • Ignore Contents Based on Patterns: bash sum-buddy --output-file examples/checksums.csv --ignore-file examples/.sbignore_except_txt examples/example_content/

    Output console Calculating md5 checksums on examples/example_content/: 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 1845.48it/s] md5 checksums for examples/example_content/ written to examples/checksums.csv bash cat examples/checksums.csv Output: console filepath,filename,md5 examples/example_content/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df examples/example_content/.hidden_dir/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df examples/example_content/dir/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df examples/example_content/dir/.hidden_dir/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df

  • Include Hidden Files: bash sum-buddy --output-file examples/checksums.csv --include-hidden examples/example_content/

    Output console Calculating md5 checksums on examples/example_content/: 100%|████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 2101.35it/s] md5 checksums for examples/example_content/ written to examples/checksums.csv

bash cat examples/checksums.csv

Output: console filepath,filename,md5 examples/example_content/.hidden_file,.hidden_file,d41d8cd98f00b204e9800998ecf8427e examples/example_content/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df examples/example_content/.hidden_dir/.hidden_file,.hidden_file,d41d8cd98f00b204e9800998ecf8427e examples/example_content/.hidden_dir/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df examples/example_content/dir/.hidden_file,.hidden_file,d41d8cd98f00b204e9800998ecf8427e examples/example_content/dir/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df examples/example_content/dir/.hidden_dir/.hidden_file,.hidden_file,d41d8cd98f00b204e9800998ecf8427e examples/example_content/dir/.hidden_dir/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df

If only a target directory is passed, the default settings are to ignore hidden files and directories (those that begin with a .), use the md5 algorithm, and print output to stdout, which can be piped (|).

To include all files and directories, including hidden ones, use the --include-hidden (or -H) option.

To ignore files based on patterns, use the --ignore-file (or -i) option with the path to a file containing patterns to ignore. The --ignore-file works identically to how git handles a .gitignore file using the implementation from pathspec.

You may explore the filtering capabilities of the --ignore-file option by using the provided example files under examples/ and pointing at examples/example_content. The expected CSV output files are provided in examples/expected_outputs/.

The bash script, examples/run_examples will run all the examples; it was used to generate the expected_outputs.

Python Package Usage

We expose three functions to be used in your Python code: - get_checksums: Works like the CLI. - gather_file_paths: Returns a list of file paths according to ignore patterns. - checksum_file: Returns the checksum of a single file.

```python from sumbuddy import getchecksums, gatherfilepaths, checksumfile

inputpath = "examples/examplecontent" outputfile = "examples/checksums.csv" includehidden = True # Optional ignorefile = "examples/.sbignoreexcepttxt" # Optional alg = "md5" # Optional, possible inputs include list elements returned by hashlib.algorithmsavailable

To generate checksums and save to a CSV file

getchecksums(inputpath, outputfile, ignorefile=ignore_file, algorithm=alg)

or getchecksums(inputpath, outputfile, ignorehidden=ignore_hidden)

or getchecksums(inputpath, output_file)

outputs status bar followed by

Checksums written to examples/checksums.csv

To gather a list of file paths according to ignore/include patterns

filepaths = gatherfilepaths(inputpath, ignorefile=ignorefile)

or filepaths = gatherfilepaths(inputpath, includehidden=includehidden)

or filepaths = gatherfilepaths(inputpath)

To calculate the checksum of a single file

sum = checksumfile("examples/examplecontent/file.txt", algorithm=alg)

or sum = checksumfile("examples/examplecontent/file.txt")

```

Development

To develop the package further:

  1. Clone the repository and create a branch
  2. Install with dev dependencies: bash pip install -e ".[dev]"
  3. Install pre-commit hook bash pre-commit install pre-commit autoupdate # optionally update
  4. Run tests: bash pytest

Owner

  • Name: Imageomics Institute
  • Login: Imageomics
  • Kind: organization

Citation (CITATION.cff)

abstract: "A command-line package to generate CSV with filepath, filename, checksum for all contents of given directory."
authors:
- family-names: "Thompson"
  given-names: "Matthew J."
  orcid: "https://orcid.org/0000-0003-0583-8585"
- family-names: "Campolongo"
  given-names: "Elizabeth G."
  orcid: "https://orcid.org/0000-0003-0846-2413"
- family-names: "Duan"
  given-names: "Zoe"
  orcid: "https://orcid.org/0000-0002-8547-5907"
- family-names: "Lapp"
  given-names: "Hilmar"
  orcid: "https://orcid.org/0000-0001-9107-0714"
cff-version: 1.2.0
date-released: "2025-04-03"
identifiers:
  - description: "The GitHub release URL of tag v1.0.0."
    type: url
    value: "https://github.com/Imageomics/sum-buddy/releases/tag/v1.0.0"
  - description: "The GitHub URL of the commit tagged with v1.0.0."
    type: url
    value: "https://github.com/Imageomics/sum-buddy/tree/5dfc39dcd05cc17b9ccb67fc7f3364ce30d082e7"
keywords:
  - imageomics
  - metadata
  - CSV
  - images
  - verifier
  - checksums
  - file-verification
  - deduplication
license: MIT
message: "If you use this software, please cite it using the metadata from this file."
repository-code: "https://github.com/Imageomics/sum-buddy"
title: "Sum Buddy"
version: "1.0.0"
doi: "10.5281/zenodo.15133037"
type: software

GitHub Events

Total
  • Create event: 9
  • Release event: 1
  • Issues event: 3
  • Watch event: 1
  • Delete event: 6
  • Issue comment event: 3
  • Member event: 1
  • Push event: 22
  • Pull request review comment event: 9
  • Pull request review event: 11
  • Pull request event: 11
Last Year
  • Create event: 9
  • Release event: 1
  • Issues event: 3
  • Watch event: 1
  • Delete event: 6
  • Issue comment event: 3
  • Member event: 1
  • Push event: 22
  • Pull request review comment event: 9
  • Pull request review event: 11
  • Pull request event: 11

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 9 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 1
  • Total maintainers: 2
pypi.org: sum-buddy

A command-line package to generate CSV with filepath, filename, checksum for all contents of given directory.

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 9 Last month
Rankings
Dependent packages count: 9.4%
Average: 31.0%
Dependent repos count: 52.7%
Maintainers (2)
Last synced: 8 months ago

Dependencies

pyproject.toml pypi
  • tqdm *