sum-buddy
Generate and save checksums for all (or certain) contents of given directory.
Science Score: 75.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
✓Institutional organization owner
Organization imageomics has institutional domain (imageomics.osu.edu) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.7%) to scientific vocabulary
Keywords
Repository
Generate and save checksums for all (or certain) contents of given directory.
Basic Info
Statistics
- Stars: 3
- Watchers: 9
- Forks: 0
- Open Issues: 6
- Releases: 3
Topics
Metadata Files
README.md
sum-buddy 
Command-line package to generate a CSV with filepath, filename, and checksum for contents of a given directory or a single file.
Requirements
Python 3.10+
Installation
bash
pip install sum-buddy
How it Works
Command Line Usage
``` usage: sum-buddy [-h] [-o OUTPUTFILE] [-i IGNORE_FILE | -H] [-a ALGORITHM] inputpath
Generate CSV with filepath, filename, and checksums for all files in a given directory (or a single file)
positional arguments: input_path File or directory to traverse for files
options: -h, --help show this help message and exit -o OUTPUTFILE, --output-file OUTPUTFILE Filepath for the output CSV file -i IGNOREFILE, --ignore-file IGNOREFILE Filepath for the ignore patterns file -H, --include-hidden Include hidden files -a ALGORITHM, --algorithm ALGORITHM Hash algorithm to use (default: md5; available: ripemd160, sha3224, sha512224, blake2b, sha384, sha256, sm3, sha3256, shake256, sha512, sha1, sha224, md5, md5-sha1, sha3384, sha3512, sha512256, shake128, blake2s) -l LENGTH, --length LENGTH Length of the digest for SHAKE (required) or BLAKE (optional) algorithms in bytes ```
Note: The available algorithms are determined by those available to
hashliband may vary depending on your system and OpenSSL version, so the set shown on your system withsum-buddy -hmay be different from above. At a minimum, it should include:{blake2s, blake2b, md5, sha1, sha224, sha256, sha384, sha512, sha3_224, sha3_256, sha3_384, sha3_512, shake_128, shake_256}, which is given byhashlib.algorithms_guaranteed.
CLI Examples
Basic Usage:
bash sum-buddy examples/example_content/Output
console filepath,filename,md5 examples/example_content/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df examples/example_content/dir/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024dfOutput to File:
bash sum-buddy --output-file examples/checksums.csv examples/example_content/Output
console Calculating md5 checksums on examples/example_content/: 100%|███████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 1552.01it/s] md5 checksums for examples/example_content/ written to examples/checksums.csvbash cat examples/checksums.csvOutput:console filepath,filename,md5 examples/example_content/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df examples/example_content/dir/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024dfIgnore Contents Based on Patterns:
bash sum-buddy --output-file examples/checksums.csv --ignore-file examples/.sbignore_except_txt examples/example_content/Output
console Calculating md5 checksums on examples/example_content/: 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 1845.48it/s] md5 checksums for examples/example_content/ written to examples/checksums.csvbash cat examples/checksums.csvOutput:console filepath,filename,md5 examples/example_content/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df examples/example_content/.hidden_dir/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df examples/example_content/dir/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df examples/example_content/dir/.hidden_dir/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024dfInclude Hidden Files:
bash sum-buddy --output-file examples/checksums.csv --include-hidden examples/example_content/Output
console Calculating md5 checksums on examples/example_content/: 100%|████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 2101.35it/s] md5 checksums for examples/example_content/ written to examples/checksums.csv
bash
cat examples/checksums.csv
Output:
console filepath,filename,md5 examples/example_content/.hidden_file,.hidden_file,d41d8cd98f00b204e9800998ecf8427e examples/example_content/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df examples/example_content/.hidden_dir/.hidden_file,.hidden_file,d41d8cd98f00b204e9800998ecf8427e examples/example_content/.hidden_dir/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df examples/example_content/dir/.hidden_file,.hidden_file,d41d8cd98f00b204e9800998ecf8427e examples/example_content/dir/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df examples/example_content/dir/.hidden_dir/.hidden_file,.hidden_file,d41d8cd98f00b204e9800998ecf8427e examples/example_content/dir/.hidden_dir/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df
If only a target directory is passed, the default settings are to ignore hidden files and directories (those that begin with a .), use the md5 algorithm, and print output to stdout, which can be piped (|).
To include all files and directories, including hidden ones, use the --include-hidden (or -H) option.
To ignore files based on patterns, use the --ignore-file (or -i) option with the path to a file containing patterns to ignore. The --ignore-file works identically to how git handles a .gitignore file using the implementation from pathspec.
You may explore the filtering capabilities of the --ignore-file option by using the provided example files under examples/ and pointing at examples/example_content. The expected CSV output files are provided in examples/expected_outputs/.
The bash script, examples/run_examples will run all the examples; it was used to generate the expected_outputs.
Python Package Usage
We expose three functions to be used in your Python code:
- get_checksums: Works like the CLI.
- gather_file_paths: Returns a list of file paths according to ignore patterns.
- checksum_file: Returns the checksum of a single file.
```python from sumbuddy import getchecksums, gatherfilepaths, checksumfile
inputpath = "examples/examplecontent" outputfile = "examples/checksums.csv" includehidden = True # Optional ignorefile = "examples/.sbignoreexcepttxt" # Optional alg = "md5" # Optional, possible inputs include list elements returned by hashlib.algorithmsavailable
To generate checksums and save to a CSV file
getchecksums(inputpath, outputfile, ignorefile=ignore_file, algorithm=alg)
or getchecksums(inputpath, outputfile, ignorehidden=ignore_hidden)
or getchecksums(inputpath, output_file)
outputs status bar followed by
Checksums written to examples/checksums.csv
To gather a list of file paths according to ignore/include patterns
filepaths = gatherfilepaths(inputpath, ignorefile=ignorefile)
or filepaths = gatherfilepaths(inputpath, includehidden=includehidden)
or filepaths = gatherfilepaths(inputpath)
To calculate the checksum of a single file
sum = checksumfile("examples/examplecontent/file.txt", algorithm=alg)
or sum = checksumfile("examples/examplecontent/file.txt")
```
Development
To develop the package further:
- Clone the repository and create a branch
- Install with dev dependencies:
bash pip install -e ".[dev]" - Install pre-commit hook
bash pre-commit install pre-commit autoupdate # optionally update - Run tests:
bash pytest
Owner
- Name: Imageomics Institute
- Login: Imageomics
- Kind: organization
- Website: https://imageomics.osu.edu
- Twitter: imageomics
- Repositories: 4
- Profile: https://github.com/Imageomics
Citation (CITATION.cff)
abstract: "A command-line package to generate CSV with filepath, filename, checksum for all contents of given directory."
authors:
- family-names: "Thompson"
given-names: "Matthew J."
orcid: "https://orcid.org/0000-0003-0583-8585"
- family-names: "Campolongo"
given-names: "Elizabeth G."
orcid: "https://orcid.org/0000-0003-0846-2413"
- family-names: "Duan"
given-names: "Zoe"
orcid: "https://orcid.org/0000-0002-8547-5907"
- family-names: "Lapp"
given-names: "Hilmar"
orcid: "https://orcid.org/0000-0001-9107-0714"
cff-version: 1.2.0
date-released: "2025-04-03"
identifiers:
- description: "The GitHub release URL of tag v1.0.0."
type: url
value: "https://github.com/Imageomics/sum-buddy/releases/tag/v1.0.0"
- description: "The GitHub URL of the commit tagged with v1.0.0."
type: url
value: "https://github.com/Imageomics/sum-buddy/tree/5dfc39dcd05cc17b9ccb67fc7f3364ce30d082e7"
keywords:
- imageomics
- metadata
- CSV
- images
- verifier
- checksums
- file-verification
- deduplication
license: MIT
message: "If you use this software, please cite it using the metadata from this file."
repository-code: "https://github.com/Imageomics/sum-buddy"
title: "Sum Buddy"
version: "1.0.0"
doi: "10.5281/zenodo.15133037"
type: software
GitHub Events
Total
- Create event: 9
- Release event: 1
- Issues event: 3
- Watch event: 1
- Delete event: 6
- Issue comment event: 3
- Member event: 1
- Push event: 22
- Pull request review comment event: 9
- Pull request review event: 11
- Pull request event: 11
Last Year
- Create event: 9
- Release event: 1
- Issues event: 3
- Watch event: 1
- Delete event: 6
- Issue comment event: 3
- Member event: 1
- Push event: 22
- Pull request review comment event: 9
- Pull request review event: 11
- Pull request event: 11
Packages
- Total packages: 1
-
Total downloads:
- pypi 9 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 1
- Total maintainers: 2
pypi.org: sum-buddy
A command-line package to generate CSV with filepath, filename, checksum for all contents of given directory.
- Homepage: https://github.com/Imageomics/sum-buddy
- Documentation: https://sum-buddy.readthedocs.io/
- License: MIT License
-
Latest release: 1.0.0
published 12 months ago
Rankings
Maintainers (2)
Dependencies
- tqdm *