sparsestack

Memory efficient stack of multiple 2D sparse arrays.

https://github.com/matchms/sparsestack

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.5%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Memory efficient stack of multiple 2D sparse arrays.

Basic Info
  • Host: GitHub
  • Owner: matchms
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 257 KB
Statistics
  • Stars: 4
  • Watchers: 3
  • Forks: 5
  • Open Issues: 8
  • Releases: 13
Created over 3 years ago · Last pushed 9 months ago
Metadata Files
Readme License Citation

README.md

GitHub PyPI Conda GitHub Workflow Status fair-software.eu

sparsestack logo

Memory efficient stack of multiple 2D sparse arrays.

sparsestack-overview-figure

Installation

Requirements

Python 3.10 or higher

Pip Install

Simply install using pip: pip install sparsestack

First code example

```python import numpy as np from sparsestack import StackedSparseArray

Create some fake data

scores1 = np.random.random((12, 10)) scores1[scores1 < 0.9] = 0 # make "sparse" scores2 = np.random.random((12, 10)) scores2[scores2 < 0.75] = 0 # make "sparse" sparsestack = StackedSparseArray(12, 10) sparsestack.adddensematrix(scores1, "scores_1")

Add second scores and filter

sparsestack.adddensematrix(scores2, "scores2", jointype="left")

Scores can be accessed using (limited) slicing capabilities

sparsestack[3, 4] # => scores1 and scores2 at position row=3, col=4 sparsestack[3, :] # => tuple with row, col, scores for all entries in row=3 sparsestack[:, 2] # => tuple with row, col, scores for all entries in col=2 sparsestack[3, :, 0] # => tuple with row, col, scores1 for all entries in row=3 sparsestack[3, :, "scores1"] # => same as the one before

Scores can also be converted to a dense numpy array:

scores2aftermerge = sparsestack.toarray("scores2") ```

Adding data to a sparsestack-array

Sparsestack provides three options to add data to a new layer. 1) .add_dense_matrix(input_array) Can be used to add all none-zero elements of input_array to the sparsestack. Depending on the chosen join_type either all such values will be added (join_type="outer" or join_type="right"), or only those which are already present in underlying layers ("left" or "inner" join). 2) .add_sparse_matrix(input_coo_matrix) This method will expect a COO-style matrix (e.g. scipy) which has attributes .row, .col and .data. The join type can again be specified using join_type. 3) .add_sparse_data(row, col, data) This essentially does the same as .add_sparse_matrix(input_coo_matrix) but might in some cases be a bit more flexible because row, col and data are separate input arguments.

Accessing data from sparsestack-array

The collected sparse data can be accessed in multiple ways.

1) Slicing. sparsestack allows multiple types of slicing (see also code example above). python sparsestack[3, 4] # => tuple with all scores at position row=3, col=4 sparsestack[3, :] # => tuple with row, col, scores for all entries in row=3 sparsestack[:, 2] # => tuple with row, col, scores for all entries in col=2 sparsestack[3, :, 0] # => tuple with row, col, scores_1 for all entries in row=3 sparsestack[3, :, "scores_1"] # => same as the one before 2) .to_array() Creates and returns a dense numpy array of size .shape. Can also be used to create a dense numpy array of only a single layer when used like .to_array(name="layerX").
Carefull: Obviously by converting to a dense array, the sparse nature will be lost and all empty positions in the stack will be filled with zeros. 3) .to_coo(name="layerX") Returns a scipy sparse COO-matrix of the specified layer.

Owner

  • Name: matchms
  • Login: matchms
  • Kind: organization

Citation (CITATION.cff)

# YAML 1.2
---
abstract: "Memory efficient stack of multiple 2D sparse arrays."
authors:
  -
    affiliation: "Centre for Digitalisation and Digitality, Univery of Applied Sciences Düsseldorf"
    family-names: Huber
    given-names: Florian
    orcid: https://orcid.org/0000-0002-3535-9406

cff-version: 1.2.0
license: "MIT Licence"
message: "If you use this software, please cite it using these metadata."
repository-code: "https://github.com/florian-huber/sparsestack"
title: sparsestack

GitHub Events

Total
  • Create event: 7
  • Release event: 7
  • Issues event: 6
  • Watch event: 1
  • Delete event: 2
  • Issue comment event: 5
  • Member event: 1
  • Push event: 15
  • Pull request review event: 1
  • Pull request event: 28
Last Year
  • Create event: 7
  • Release event: 7
  • Issues event: 6
  • Watch event: 1
  • Delete event: 2
  • Issue comment event: 5
  • Member event: 1
  • Push event: 15
  • Pull request review event: 1
  • Pull request event: 28

Dependencies

.github/workflows/CI_build.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v1 composite
  • conda-incubator/setup-miniconda v2 composite
.github/workflows/CI_publish_pypi.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v1 composite
  • pypa/gh-action-pypi-publish master composite
pyproject.toml pypi
  • decorator ^5.1.1 develop
  • isort ^5.13.2 develop
  • poetry-bumpversion ^0.3.2 develop
  • prospector ^1.12.1 develop
  • pytest ^8.3.3 develop
  • pytest-cov ^6.0.0 develop
  • testfixtures ^8.3.0 develop
  • yapf ^0.40.2 develop
  • numba ^0.60.0
  • numpy >1.24
  • python >=3.10,<3.13
  • scipy ^1.14.1