https://github.com/acquire-project/acquire-driver-zarr
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.6%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: acquire-project
- License: apache-2.0
- Language: C++
- Default Branch: main
- Size: 36.3 MB
Statistics
- Stars: 6
- Watchers: 1
- Forks: 5
- Open Issues: 26
- Releases: 56
Metadata Files
README.md
Acquire Zarr Driver
This is an Acquire Driver that supports chunked streaming to zarr.
Installing Dependencies
This driver uses the following libraries: - blosc v1.21.5 - nlohmann-json v3.11.3
We prefer using vcpkg for dependency management, as it integrates well with CMake. Below are instructions for installing vcpkg locally and configuring it to fetch and compile the necessary dependencies (the steps are taken from this vcpkg guide).
git clone https://github.com/microsoft/vcpkg.gitcd vcpkg && ./bootstrap-vcpkg.sh- Add export commands to your shell's profile script (e.g.,
~/.bashrcor~/.zshrc)export VCPKG_ROOT=/path/to/vcpkgexport PATH=$VCPKG_ROOT:$PATH- Click here to learn how to add environment variables on Windows.
- Select the default CMake preset before building (consider deleting your build directory first)
cmake --preset=default -B /path/to/build- (Alternatively, from the build directory, run
cmake --preset=default /path/to/source.) - If you're building this project on Windows, you might need to specify your compiler triplet. This ensures that all dependencies are built as static libraries. You can specify the triplet during the preset selection process.
cmake --preset=default -DVCPKG_TARGET_TRIPLET=x64-windows-static ...
Devices
Storage
- Zarr
- ZarrBlosc1ZstdByteShuffle
- ZarrBlosc1Lz4ByteShuffle
- ZarrV3
- ZarrV3Blosc1ZstdByteShuffle
- ZarrV3Blosc1Lz4ByteShuffle
Using the Zarr storage device
Zarr has additional capabilities relative to the basic storage devices, namely chunking, compression, and multiscale storage.
To compress while streaming, you can use one of the ZarrBlosc1* devices.
Chunking is configured using storage_properties_set_chunking_props() when configuring your video stream.
Multiscale storage can be enabled or disabled by calling storage_properties_set_enable_multiscale() when configuring
the video stream.
For the Zarr v3 version of each device, you can use the ZarrV3* devices.
Note: Zarr v3 is not yet supported
by ome-zarr-py, so you
will not be able to read multiscale metadata from the resulting dataset.
Zarr v3 is supported by zarr-python, but you will need to set two environment variables to work with it:
bash
export ZARR_V3_EXPERIMENTAL_API=1
export ZARR_V3_SHARDING=1
You can also set these variables in your Python script:
```python import os
these MUST come before importing zarr
os.environ["ZARRV3EXPERIMENTALAPI"] = "1" os.environ["ZARRV3_SHARDING"] = "1"
import zarr ```
Configuring the output array
You will need to specify the shape of the output array when configuring your video stream.
The StorageProperties object has a field acquisition_dimensions, which is defined like so:
```c struct storagepropertiesdimensions_s { // The dimensions of the output array. struct StorageDimension* data;
// The number of dimensions in the output array.
size_t size;
}; ```
Observe that this struct contains a pointer to a struct StorageDimension array, as well as a size_t field size.
Let's look at the StorageDimension struct.
This struct has the following fields:
```c struct StorageDimension { // the name of the dimension as it appears in the metadata, e.g., // "x", "y", "z", "c", "t" struct String name;
// the type of dimension, e.g., spatial, channel, time
enum DimensionType kind;
// the expected size of the full output array along this dimension
uint32_t array_size_px;
// the size of a chunk along this dimension
uint32_t chunk_size_px;
// the number of chunks in a shard along this dimension
uint32_t shard_size_chunks;
}; ```
Each of your output dimensions should have a corresponding StorageDimension struct.
The order of these dimensions matters: the first dimension in the array will be the fastest-varying dimension as you
acquire.
Then the next dimension will be the next-fastest varying, and so on.
The last dimension will be the slowest-varying, i.e., the append dimension.
The first two dimensions should represent the width and height of the frame, respectively.
The array_size_px for these dimensions should match the width and height of the frame, and the kind field should
be DimensionType_Space. The rest of the dimensions should match the order of acquisition.
You can configure chunking and sharding for each dimension by setting the chunk_size_px and shard_size_chunks
fields, respectively.
In general, you should not manipulate this struct, or the array, directly.
Instead, there are helper functions that can be used to initialize the array and set up each of the storage dimensions,
given a pointer to the StorageProperties struct that contains them:
``c
/// Initializes StorageProperties, allocating string storage on the heap
/// and filling out the struct fields.
/// @returns 0 whenbytesofoutis not large enough, otherwise 1.
/// @param[out] out The constructed StorageProperties object.
/// @param[in] first_frame_id (unused; aiming for future file rollover
/// support
/// @param[in] filename A c-style null-terminated string. The file to create
/// for streaming.
/// @param[in] bytes_of_filename Number of bytes in thefilenamebuffer
/// including the terminating null.
/// @param[in] metadata A c-style null-terminated string. Metadata string
/// to save along side the created file.
/// @param[in] bytes_of_metadata Number of bytes in themetadata` buffer
/// including the terminating null.
/// @param[in] pixelscaleum The pixel scale or size in microns.
/// @param[in] dimensioncount The number of dimensions in the storage
/// array. Each of the @p dimensioncount dimensions will be initialized
/// to zero.
int storagepropertiesinit(struct StorageProperties* out,
uint32t firstframeid,
const char* filename,
sizet bytesoffilename,
const char* metadata,
sizet bytesofmetadata,
struct PixelScale pixelscaleum,
uint8t dimension_count);
/// @brief Set the value of the StorageDimension struct at index index in
/// out.
/// @param[out] out The StorageProperties struct containing the
/// StorageDimension array.
/// @param[in] index The index of the dimension to set.
/// @param[in] name The name of the dimension.
/// @param[in] bytesofname The number of bytes in the name buffer.
/// Should include the terminating NULL.
/// @param[in] kind The type of dimension.
/// @param[in] arraysizepx The size of the array along this dimension.
/// @param[in] chunksizepx The size of a chunk along this dimension.
/// @param[in] shardsizechunks The number of chunks in a shard along this
/// dimension.
/// @returns 1 on success, otherwise 0
int storagepropertiessetdimension(struct StorageProperties* out,
int index,
const char* name,
sizet bytesofname,
enum DimensionType kind,
uint32t arraysizepx,
uint32t chunksizepx,
uint32t shardsize_chunks);
```
You will need to call storage_properties_init() before calling storage_properties_set_dimension().
You can find the implementation of these functions in the acquire-common library.
Example
Let's define some terms:
| |
|:-------------------------------------------------------------------------------------------------------------:|
| A collection of frames. |
A tile is a contiguous section, or region of interest, of a frame.
| |
|:------------------------------------------------------------------------------------------------------------:|
| A collection of frames, divided into tiles. |
A chunk is nothing more than some number of stacked tiles from subsequent frames, with each tile in a chunk having the same ROI in its respective frame.
| |
|:-------------------------------------------------------------------------------------------------------------:|
| A collection of frames, divided into tiles. A single chunk has been highlighted in red. |
A shard is a collection of chunks within a single file or S3 bucket.
|
|
|:---------------------------------------------------------------------------------------------------------------:|
| Shards aggregate chunks within individual files or S3 buckets. |
Suppose you have a video stream with the following dimensions:
- Width: 1920 px
- Height: 1080 px
- Channels: 3
- Time: 1000 frames (per channel)
You want to divide your frames into 4 x 4 tiles of size 480 x 270, and you want each channel to be stored in separate chunks. You also want to aggregate 100 time points into a single chunk. You will end up with 4 x 4 = 16 chunks of size 480 x 270 x 1 x 100 for each channel, for a total of 48 chunks, for every 100 time points. Suppose further that you wanted to aggregate each of these chunks into a single shard. Configuring your storage dimensions might look like this:
```cpp struct StorageProperties props = { 0 };
const char filename[] = "myvideo.zarr"; const char externalmetadata[] = R"({"my":"metadata"})"; const struct PixelScale samplespacingum = { 1, 1 }; storagepropertiesinit(&props, 0, // first frame id (char)filename, strlen(filename) + 1, (char)externalmetadata, sizeof(externalmetadata), samplespacingum, 4); // number of dimensions
// width storagepropertiessetdimension( &props, 0, "x", // name 2, // number of bytes in the name, including the null terminator DimensionTypeSpace, // type of the dimension 1920, // full size of the dimension 480, // number of pixels in a chunk 4); // aggregate all 4 chunks into a shard along this dimension
// height storagepropertiessetdimension( &props, 1, "y", // name 2, // number of bytes in the name, including the null terminator DimensionTypeSpace, // type of the dimension 1080, // full size of the dimension 270, // number of pixels in a chunk 4); // aggregate all 4 chunks into a shard along this dimension
// channels storagepropertiessetdimension( &props, 2 "c", // name 2, // number of bytes in the name, including the null terminator DimensionTypeChannel, // type of the dimension 3, // full size of the dimension 1, // number of pixels in a chunk (1 channel per chunk) 1); // one single-channel chunk per shard along this dimension
// time storagepropertiessetdimension( &props, 3, "t", // name 2, // number of bytes in the name, including the null terminator DimensionTypeTime, // type of the dimension 0, // append dimension; we don't know the full size a priori 100, // number of pixels in a chunk 1); // one 100-timepoint chunk per shard along this dimension ```
Compression
Compression is done via Blosc. Supported codecs are lz4 and zstd, which can be used by using the ZarrBlosc1Lz4ByteShuffle and ZarrBlosc1ZstdByteShuffle devices, respectively. For a comparison of these codecs, please refer to the Blosc docs.
Configuring multiscale
In order to enable or disable multiscale storage for your video stream, you can call
storage_properties_set_enable_multiscale() on your StorageProperties object after calling
storage_properties_init() on it.
c
int
storage_properties_set_enable_multiscale(struct StorageProperties* out,
uint8_t enable);
To enable, pass 1 as the enable parameter, and to disable, pass 0.
If enabled, the Zarr writer will write a pyramid of frames, with each level of the pyramid halving each dimension of the previous level, until the dimensions are less than or equal to a single tile.
Example
Suppose your frame size is 1920 x 1080, with a tile size of 384 x 216. Then the sequence of levels will have dimensions 1920 x 1080, 960 x 540, 480 x 270, and 240 x 135.
Owner
- Name: Acquire Project
- Login: acquire-project
- Kind: organization
- Location: United States of America
- Repositories: 1
- Profile: https://github.com/acquire-project
Focusing on multicamera video streaming for microscopy
GitHub Events
Total
- Create event: 47
- Issues event: 4
- Release event: 24
- Watch event: 1
- Delete event: 39
- Issue comment event: 24
- Push event: 25
- Pull request review event: 13
- Pull request review comment event: 1
- Pull request event: 86
Last Year
- Create event: 47
- Issues event: 4
- Release event: 24
- Watch event: 1
- Delete event: 39
- Issue comment event: 24
- Push event: 25
- Pull request review event: 13
- Pull request review comment event: 1
- Pull request event: 86
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 1
- Total pull requests: 31
- Average time to close issues: about 22 hours
- Average time to close pull requests: 4 days
- Total issue authors: 1
- Total pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.61
- Merged pull requests: 8
- Bot issues: 1
- Bot pull requests: 31
Past Year
- Issues: 1
- Pull requests: 31
- Average time to close issues: about 22 hours
- Average time to close pull requests: 4 days
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.61
- Merged pull requests: 8
- Bot issues: 1
- Bot pull requests: 31
Top Authors
Issue Authors
- aliddell (8)
- nclack (1)
- talonchandler (1)
- jeskesen (1)
- chris-delg (1)
- shlomnissan (1)
Pull Request Authors
- dependabot[bot] (59)
- aliddell (52)
- shlomnissan (7)
- jeskesen (2)
- andy-sweet (1)