wacksy

An experimental library for writing WACZ files

https://github.com/bodleian/wacksy

Science Score: 52.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
    Organization bodleian has institutional domain (www.bodleian.ox.ac.uk)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.1%) to scientific vocabulary

Keywords

cdxj save-the-internet wacz warc web-archive
Last synced: 10 months ago · JSON representation ·

Repository

An experimental library for writing WACZ files

Basic Info
Statistics
  • Stars: 2
  • Watchers: 1
  • Forks: 1
  • Open Issues: 13
  • Releases: 4
Topics
cdxj save-the-internet wacz warc web-archive
Created over 1 year ago · Last pushed 10 months ago
Metadata Files
Readme Changelog License Citation

README.md

Wacksy

Software Heritage Archive Deps.rs Crate Dependencies (latest) Crates.io Total Downloads

An experimental Rust library for ~~reading and~~ writing ᴡᴀᴄᴢ files.

Install

With cargo installed, run the following command in your project directory:

cargo add wacksy

Example

This library provides two main ᴀᴘɪ functions. from_file() takes a ᴡᴀʀᴄ file and returns a structured representation of a ᴡᴀᴄᴢ object. zip() takes a ᴡᴀᴄᴢ object and zips it up to a byte array using rawzip.

rust fn main() -> Result<(), Box<dyn Error>> { let warc_file_path = Path::new("example.warc.gz"); // set path to your ᴡᴀʀᴄ file let wacz_object = WACZ::from_file(warc_file_path)?; // index the ᴡᴀʀᴄ and create a ᴡᴀᴄᴢ object let zipped_wacz: Vec<u8> = wacz_object.zip()?; // zip up the ᴡᴀᴄᴢ fs::write("example.wacz", zipped_wacz)?; // write out to file Ok(()) }

See the documentation for more details.

Background

According to Ed Summers, a ᴡᴀᴄᴢ file is "really just a ᴢɪᴘ file that contains ᴡᴀʀᴄ data and metadata at predicatble file locations."[^code4lib_talk]

The example in the spec outlines what a ᴡᴀᴄᴢ file should contain:

archive └── data.warc.gz datapackage.json datapackage-digest.json indexes └── index.cdx.gz pages └── pages.jsonl

[^code4lib_talk]: For more discussion of the concept, see the talk "Web Archives in Digital Repositories" by Ilya Kremer and Ed Summers at Code4Lib 2022.

Similar libraries

License

MIT © Bodleian Libraries and contributors

Owner

  • Name: Bodleian Libraries
  • Login: bodleian
  • Kind: organization
  • Location: Oxford, UK

The Bodleian Libraries of the University of Oxford

Citation (CITATION.cff)

cff-version: 1.2.0
title: Wacksy
message: >-
  'If you use this software, please cite it using the
  metadata from this file.'
references:
  - authors:
    - name: Webrecorder
    title: 'Web Archive Collection Zipped (WACZ)'
    type: standard
    version: 1.1.1
    date-published: '2021-06-03'
    url: 'https://specs.webrecorder.net/wacz/1.1.1/'
type: software
authors:
  - given-names: Pierre
    family-names: Marshall
    email: 'pierre.marshall@bodleian.ox.ac.uk'
    affiliation: Oxford University
    orcid: 'https://orcid.org/0000-0001-9245-7670'
repository-code: 'https://github.com/bodleian/wacksy'
abstract: 'An experimental library for writing WACZ files.'
keywords:
  - wacz
  - warc
  - cdxj
  - archive
license: MIT
version: '0.0.2'
date-released: '2025-08-06'

GitHub Events

Total
  • Issues event: 9
  • Delete event: 12
  • Issue comment event: 10
  • Push event: 37
  • Pull request event: 3
  • Create event: 13
Last Year
  • Issues event: 9
  • Delete event: 12
  • Issue comment event: 10
  • Push event: 37
  • Pull request event: 3
  • Create event: 13

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 6
  • Total pull requests: 2
  • Average time to close issues: about 2 months
  • Average time to close pull requests: 9 days
  • Total issue authors: 1
  • Total pull request authors: 2
  • Average comments per issue: 0.5
  • Average comments per pull request: 0.5
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 1
Past Year
  • Issues: 6
  • Pull requests: 2
  • Average time to close issues: about 2 months
  • Average time to close pull requests: 9 days
  • Issue authors: 1
  • Pull request authors: 2
  • Average comments per issue: 0.5
  • Average comments per pull request: 0.5
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 1
Top Authors
Issue Authors
  • extua (5)
Pull Request Authors
  • github-actions[bot] (1)
  • extua (1)
Top Labels
Issue Labels
bug (1)
Pull Request Labels
refactor (1)