Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 9 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.8%) to scientific vocabulary
Repository
Fast Single-Pass ISCC Data-Code & Instance Code
Basic Info
- Host: GitHub
- Owner: bio-codes
- License: apache-2.0
- Language: Python
- Default Branch: main
- Homepage: https://sum.iscc.codes
- Size: 1.87 MB
Statistics
- Stars: 8
- Watchers: 2
- Forks: 1
- Open Issues: 0
- Releases: 3
Metadata Files
README.md
iscc-sum
A blazing-fast ISCC Data-Code and Instance-Code hashing tool built in Rust with Python bindings. Delivers 50-130x faster performance than reference implementations, processing data at over 1 GB/s.
Originally created to handle massive microscopic imaging datasets where existing tools were too slow.
Project Status
Version 0.1.0 — Initial release for Data-Code and Instance-Code generation.
[!WARNING] This package is under active development, and breaking changes may be released at any time. Be sure to pin to specific versions if you're using this package in a production environment.
Performance
- 950-1050 MB/s processing speed (vs 7-8 MB/s reference)
- 50-130x faster than existing implementations
- Consistent performance on multi-GB files
Ideal for large-scale data processing: microscopic imaging, video files, scientific datasets.
Installation
Python Package
The recommended way to install the iscc-sum CLI tool is using uv:
bash
uv tool install iscc-sum
Note: To install uv, run: curl -LsSf https://astral.sh/uv/install.sh | sh (or see
other installation methods)
Usage
Command Line Interface
The iscc-sum command provides checksum generation and verification functionality similar to standard tools
like md5sum or sha256sum, but using ISCC (International Standard Content Code) checksums.
Basic Usage
```bash
Generate checksum for a file
iscc-sum document.pdf
Output: ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2XHGQY *document.pdf
Generate checksums for multiple files
iscc-sum *.txt
Read from standard input
echo "Hello, World!" | iscc-sum cat document.txt | iscc-sum ```
[!NOTE] By default, this tool creates ISCC-CODEs of SubType WIDE, introduced for large-scale secure checksum support with data similarity matching capabilities. This SubType is not yet part of the ISO 24138:2024 standard but is supported by the latest version of the Iscc-Core reference implementation. For ISO 24138:2024 conformant ISCC-CODEs, use the
--narrowflag in the CLI tool.
Checksum Verification
```bash
Create a checksum file
iscc-sum *.txt > checksums.txt
Verify checksums
iscc-sum -c checksums.txt
Output:
file1.txt: OK
file2.txt: OK
Verify with quiet mode (only show failures)
iscc-sum -c -q checksums.txt ```
Output Formats
```bash
Default format (GNU style)
iscc-sum file.txt
ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2XHGQY *file.txt
BSD-style format
iscc-sum --tag file.txt
ISCC (file.txt) = ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2XHGQY
Narrow format (128-bit)
iscc-sum --narrow file.txt
ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HU *file.txt
Show component codes
iscc-sum --units file.txt
ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2XHGQY *file.txt
ISCC:EAAW4BQTJSTJSHAI27AJSAGMGHNUKSKRTK3E6OZ5CXUS57SWQZXJQ
ISCC:IABXF3ZHYL6O6PM5P2HGV677CS3RBHINZSXEJCITE3WNOTQ2CYXRA
Process entire directory as single unit
iscc-sum --tree /path/to/project
ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2XHGQY */path/to/project/
```
Similarity Matching
Find files with similar content:
```bash
Find similar files (default threshold: 12 bits)
iscc-sum --similar *.jpg
Output:
photo1.jpg
~8 photo2.jpg
~12 photo3.jpg
Adjust similarity threshold
iscc-sum --similar --threshold 6 *.pdf ```
Complete Options
```bash iscc-sum --help # Show all available options
Options: -c, --check Read checksums from files and check them --narrow Generate shorter 128-bit checksums --tag Create a BSD-style checksum --units Show Data-Code and Instance-Code components -z, --zero End each output line with NUL --similar Find files with similar Data-Codes --threshold Hamming distance threshold for similarity (default: 12) -t, --tree Process directory as single unit with combined checksum -q, --quiet Don't print OK for each verified file --status Don't output anything, exit code shows success -w, --warn Warn about improperly formatted lines --strict Exit non-zero for improperly formatted lines ```
Python API
Quick Start
Generate ISCC-SUM codes for files:
```pycon
from isccsum import codeiscc_sum
Generate extended ISCC-SUM for a file
result = codeisccsum("LICENSE", wide=True) result.iscc 'ISCC:K4AA2G6UMXGFJAO6ZOMIFZIYO6LYMOBT7Q6JDI3Z75IJWQY5WH372QA' result.datahash '1e203833fc3c91a379ff509b431db1f7fd40dea69a6614249f420ec62398957087b1' result.filesize 11357
```
Streaming API
For large files or streaming data, use the processor classes:
```python from iscc_sum import IsccSumProcessor
processor = IsccSumProcessor() with open("large_file.bin", "rb") as f: while chunk := f.read(1024 * 1024): # Read in 1MB chunks processor.update(chunk)
result = processor.result(wide=False, add_units=True) print(f"ISCC: {result.iscc}") print(f"Units: {result.units}") # Individual Data-Code and Instance-Code ```
Development
Prerequisites
- Rust (latest stable) - Install from rustup.rs
- Python 3.10+
- UV (for Python dependency management) - Install from astral.sh/uv
Quick Setup
```bash
Clone the repository
git clone https://github.com/bio-codes/iscc-sum.git cd iscc-sum
Install Python dependencies
uv sync --all-extras
Setup Rust development components
uv run poe setup
Build Python extension and run all checks
uv run poe all ```
Development Commands
All development tasks are managed through poethepoet:
```bash
One-time setup (installs Rust components)
uv run poe setup
Pre-commit checks (format, lint, test everything)
uv run poe all
Individual commands
uv run poe format # Format all code (Rust + Python) uv run poe test # Run all tests (Rust + Python) uv run poe typecheck # Run Python type checking uv run poe rust-build # Build Rust binary uv run poe build-ext # Build Python extension
Check if Rust toolchain is properly installed
uv run poe check-rust ```
Manual Setup (if needed)
```bash
Install all dependencies including dev dependencies
uv sync --all-extras
Install Rust components manually
rustup component add rustfmt clippy
Build Rust extension for Python
uv run maturin develop
Run tests manually
cargo test # Rust tests uv run pytest # Python tests ```
Building
```bash
Build Rust binary (creates isum executable)
cargo build --release
Build Python wheels
maturin build --release ```
Funding
This work was supported through the Open Science Clusters’ Action for Research and Society (OSCARS) European project under grant agreement Nº101129751.
See: BIO-CODES project (Enhancing AI-Readiness of Bioimaging Data with Content-Based Identifiers).
License
This project is licensed under the Apache License, Version 2.0 - see the LICENSE file for details.
Citation
if you use ISCC-SUM in your research, please cite:
bibtex
@software{pan2025isccsum,
title = {BIO-CODES/ISCC-SUM: High-Performance ISCC Generation for Bioimaging Data - OSCARS Project},
author = {Pan, Titusz},
year = 2025,
month = jul,
publisher = {Zenodo},
doi = {10.5281/zenodo.16541262},
url = {https://doi.org/10.5281/zenodo.16541262},
note = {Supported by OSCARS (Open Science Clusters' Action for Research and Society) under European Commission grant agreement Nº101129751},
version = {0.1.0}
}
Owner
- Name: bio-codes
- Login: bio-codes
- Kind: organization
- Repositories: 1
- Profile: https://github.com/bio-codes
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
type: software
authors:
- family-names: Pan
given-names: Titusz
email: tp@iscc.io
orcid: https://orcid.org/0000-0002-0521-4214
- family-names: Etzrodt
given-names: Martin
email: etzrodt.martin@gmail.com
orcid: https://orcid.org/0000-0003-1928-3904
title: "BIO-CODES/ISCC-SUM: High-Performance ISCC Generation for Bioimaging Data - OSCARS Project"
doi: 10.5281/zenodo.16541262
version: 0.1.0
date-released: 2025-07-28
url: "https://github.com/bio-codes/iscc-sum"
repository-code: "https://github.com/bio-codes/iscc-sum"
abstract: >-
A high-performance ISCC Data-Code and Instance-Code hashing tool built with Python including a rust extension.
Delivers 50-130x faster performance than reference implementations, processing data at over 1 GB/s.
Originally created to handle massive microscopic imaging datasets where existing tools were too slow.
keywords:
- iscc
- identifier
- provenance
- hash
- checksum
- content-hash
- similarity-hash
- deduplication
license: Apache-2.0
funding:
- name: "OSCARS - Open Science Clusters' Action for Research and Society"
award: "101129751"
GitHub Events
Total
- Release event: 4
- Watch event: 6
- Delete event: 1
- Push event: 68
- Pull request event: 7
- Create event: 7
Last Year
- Release event: 4
- Watch event: 6
- Delete event: 1
- Push event: 68
- Pull request event: 7
- Create event: 7
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 0
- Total pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: 1 day
- Total issue authors: 0
- Total pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: 1 day
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- etzm (4)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 30 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 1
- Total maintainers: 1
pypi.org: iscc-sum
High-performance ISCC Data-Code and Instance-Code hashing
- Homepage: https://github.com/bio-codes/iscc-sum
- Documentation: https://iscc-sum.readthedocs.io/
- License: Apache-2.0
-
Latest release: 0.1.0
published 8 months ago
Rankings
Maintainers (1)
Dependencies
- Swatinem/rust-cache v2 composite
- actions/checkout v4 composite
- actions/download-artifact v4 composite
- actions/setup-python v5 composite
- actions/upload-artifact v4 composite
- astral-sh/setup-uv v3 composite
- dtolnay/rust-toolchain stable composite
- actions/checkout v4 composite
- actions/download-artifact v4 composite
- actions/upload-artifact v4 composite
- dtolnay/rust-toolchain stable composite
- pypa/cibuildwheel v3.0.0 composite
- pypa/gh-action-pypi-publish release/v1 composite
- softprops/action-gh-release v1 composite
- aho-corasick 1.1.3
- anstream 0.6.19
- anstyle 1.0.11
- anstyle-parse 0.2.7
- anstyle-query 1.1.3
- anstyle-wincon 3.0.9
- arrayref 0.3.9
- arrayvec 0.7.6
- assert_cmd 2.0.17
- autocfg 1.4.0
- base32 0.5.1
- bitflags 2.9.1
- blake3 1.8.2
- bstr 1.12.0
- cc 1.2.27
- cfg-if 1.0.1
- clap 4.5.40
- clap_builder 4.5.40
- clap_derive 4.5.40
- clap_lex 0.7.5
- colorchoice 1.0.4
- constant_time_eq 0.3.1
- crossbeam-deque 0.8.6
- crossbeam-epoch 0.9.18
- crossbeam-utils 0.8.21
- difflib 0.4.0
- doc-comment 0.3.3
- either 1.15.0
- errno 0.3.12
- fastrand 2.3.0
- float-cmp 0.10.0
- getrandom 0.3.3
- globset 0.4.16
- heck 0.5.0
- hex 0.4.3
- indoc 2.0.6
- is_terminal_polyfill 1.70.1
- libc 0.2.172
- linux-raw-sys 0.9.4
- log 0.4.27
- memchr 2.7.5
- memoffset 0.9.1
- normalize-line-endings 0.3.0
- num-traits 0.2.19
- once_cell 1.21.3
- once_cell_polyfill 1.70.1
- portable-atomic 1.11.1
- predicates 3.1.3
- predicates-core 1.0.9
- predicates-tree 1.0.12
- proc-macro2 1.0.95
- pyo3 0.25.1
- pyo3-build-config 0.25.1
- pyo3-ffi 0.25.1
- pyo3-macros 0.25.1
- pyo3-macros-backend 0.25.1
- quote 1.0.40
- r-efi 5.2.0
- rayon 1.10.0
- rayon-core 1.12.1
- regex 1.11.1
- regex-automata 0.4.9
- regex-syntax 0.8.5
- rustix 1.0.7
- same-file 1.0.6
- serde 1.0.219
- serde_derive 1.0.219
- shlex 1.3.0
- strsim 0.11.1
- syn 2.0.103
- target-lexicon 0.13.2
- tempfile 3.20.0
- termtree 0.5.1
- tinyvec 1.9.0
- tinyvec_macros 0.1.1
- unicode-ident 1.0.18
- unicode-normalization 0.1.24
- unindent 0.2.4
- utf8parse 0.2.2
- wait-timeout 0.2.1
- walkdir 2.5.0
- wasi 0.14.2+wasi-0.2.4
- winapi-util 0.1.9
- windows-sys 0.59.0
- windows-targets 0.52.6
- windows_aarch64_gnullvm 0.52.6
- windows_aarch64_msvc 0.52.6
- windows_i686_gnu 0.52.6
- windows_i686_gnullvm 0.52.6
- windows_i686_msvc 0.52.6
- windows_x86_64_gnu 0.52.6
- windows_x86_64_gnullvm 0.52.6
- windows_x86_64_msvc 0.52.6
- wit-bindgen-rt 0.39.0
- xxhash-rust 0.8.15
- assert_cmd 2.0 development
- predicates 3.1 development
- tempfile 3.10 development
- base32 0.5.0
- blake3 1.8.2
- clap 4.5
- globset 0.4
- hex 0.4.3
- pyo3 0.25.1
- rayon 1.10.0
- unicode-normalization 0.1
- walkdir 2.5
- xxhash-rust 0.8.15
- blake3 >=1.0.5
- click >=8.0.0
- pathspec >=0.12.1
- universal-pathlib >=0.2.6
- xxhash >=3.5.0
- babel 2.17.0
- backrefs 5.8
- bandit 1.8.5
- blake3 1.0.5
- build 1.2.2.post1
- certifi 2025.6.15
- charset-normalizer 3.4.2
- click 8.2.1
- colorama 0.4.6
- coverage 7.9.1
- exceptiongroup 1.3.0
- fsspec 2025.5.1
- ghp-import 2.1.0
- idna 3.10
- importlib-metadata 8.7.0
- iniconfig 2.1.0
- iscc-sum 0.1.0
- jinja2 3.1.6
- markdown 3.8.1
- markdown-it-py 3.0.0
- markupsafe 3.0.2
- maturin 1.8.7
- mdformat 0.7.22
- mdformat-footnote 0.1.1
- mdformat-gfm 0.4.1
- mdformat-gfm-alerts 2.0.0
- mdformat-mkdocs 4.3.0
- mdformat-tables 1.0.0
- mdit-py-plugins 0.4.2
- mdurl 0.1.2
- mergedeep 1.3.4
- mkdocs 1.6.1
- mkdocs-get-deps 0.2.0
- mkdocs-glightbox 0.4.0
- mkdocs-material 9.6.14
- mkdocs-material-extensions 1.3.1
- more-itertools 10.7.0
- mypy 1.16.1
- mypy-extensions 1.1.0
- packaging 25.0
- paginate 0.5.7
- pastel 0.2.1
- pathspec 0.12.1
- pbr 6.1.1
- platformdirs 4.3.8
- pluggy 1.6.0
- poethepoet 0.35.0
- pyfakefs 5.8.0
- pygments 2.19.1
- pymdown-extensions 10.15
- pyproject-hooks 1.2.0
- pytest 8.4.1
- pytest-cov 6.2.1
- python-dateutil 2.9.0.post0
- pyyaml 6.0.2
- pyyaml-env-tag 1.1
- requests 2.32.4
- rich 14.0.0
- ruff 0.12.0
- setuptools 80.9.0
- six 1.17.0
- stevedore 5.4.1
- tomli 2.2.1
- typing-extensions 4.14.0
- universal-pathlib 0.2.6
- urllib3 2.5.0
- watchdog 6.0.0
- wcwidth 0.2.13
- xxhash 3.5.0
- zipp 3.23.0