cottoncandy

cottoncandy: scientific python package for easy cloud storage - Published in JOSS (2018)

https://github.com/gallantlab/cottoncandy

Science Score: 95.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 6 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org
  • Committers with academic emails
    5 of 21 committers (23.8%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

cloud-storage google-drive numpy-arrays python s3

Scientific Fields

Mathematics Computer Science - 84% confidence
Engineering Computer Science - 60% confidence
Last synced: 4 months ago · JSON representation

Repository

sugar for s3

Basic Info
Statistics
  • Stars: 35
  • Watchers: 14
  • Forks: 17
  • Open Issues: 23
  • Releases: 7
Topics
cloud-storage google-drive numpy-arrays python s3
Created over 9 years ago · Last pushed 4 months ago
Metadata Files
Readme License

README.md

cottoncandy logo

Welcome to cottoncandy!

Build Status DOI License Downloads

sugar for s3

https://gallantlab.github.io/cottoncandy

What is cottoncandy?

A python scientific library for storing and accessing numpy array data on S3. This is achieved by reading arrays from memory and downloading arrays directly into memory. This means that you don't have to download your array to disk, and then load it from disk into your python session.

This library relies heavily on boto3

Try it out!

Jupyter Notebook examples using cottoncandy to

Directly from the repo:

Clone the repo from GitHub and do the usual python install from the command line

$ git clone https://github.com/gallantlab/cottoncandy.git $ cd cottoncandy $ sudo python setup.py install With pip:

$ pip install cottoncandy

Configuration file

Upon first use, cottoncandy will create a configuration file. This configuration file allows you to enter your S3 and Google Drive credentials and set many other options. See the default configuration file.

The configuration file is created the first time you import cottoncandy and it is stored under: * Linux: ~/.config/cottoncandy/options.cfg * MAC OS: ~/Library/Application Support/cottoncandy/options.cfg * Windows (not supported): C:\Users\<username>\AppData\Local\<AppAuthor>\cottoncandy\options.cfg

By default, cottoncandy sets object and bucket permissions to authenticated-read. If you wish to keep all your objects private, modify your configuration file and set default_acl = private. See AWS ACL overview for more information on S3 permissions.

Advanced (for admins): One can customize the cottoncandy system install by cloning the repo and modifying defaults.cfg. For example, one can set the default encyption key across the system for all users (key = SoMeEncypTionKey). When a user first uses cottoncandy, this deault value will be copied to their personal configuration file. Note however that the user can still overwrite that value.

Getting started

Setup the connection (endpoint, access and secret keys can be specified in the configuration file instead)::

```python

import cottoncandy as cc cci = cc.getinterface('mybucket', ACCESSKEY='FAKEACCESSKEYTEXT', SECRETKEY='FAKESECRETKEYTEXT', endpoint_url='https://s3.amazonaws.com') ```

Storing numpy arrays

```python

import numpy as np arr = np.random.randn(100) s3response = cci.uploadrawarray('myarray', arr) arrdown = cci.downloadrawarray('myarray') assert np.allclose(arr, arr_down) ```

Storing dask arrays

```python

arr = np.random.randn(100,600,1000) s3response = cci.uploaddaskarray('testdim', arr, axis=-1) daskobject = cci.downloaddaskarray('testdim') daskobject dask.array daskslice = daskobject[..., :200] daskslice dask.array downloadeddata = np.asarray(daskslice) # this downloads the array downloaded_data.shape (100, 600, 200) ```

Command-line search

```python

cci.glob('/path/to//file01.grp/imagedata') ['/path/to/my/file01a.grp/imagedata', '/path/to/my/file01b.grp/imagedata', '/path/to/your/file01a.grp/imagedata', '/path/to/your/file01b.grp/imagedata'] cci.glob('/path/to/my/file02.grp/') ['/path/to/my/file02a.grp/imagedata', '/path/to/my/file02a.grp/textdata', '/path/to/my/file02b.grp/imagedata', '/path/to/my/file02b.grp/text_data'] ```

File system-like object browsing

```python

import cottoncandy as cc browser = cc.getbrowser('mybucketname', ACCESSKEY='FAKEACCESSKEYTEXT', SECRETKEY='FAKESECRETKEYTEXT', endpointurl='https://s3.amazonaws.com') browser.sweetproject.sub browser.sweetproject.sub01awesomeanalysisDOTgrp browser.sweetproject.sub02awesomeanalysisDOTgrp browser.sweetproject.sub01awesomeanalysisDOTgrp (sub01awesomeanalysis.grp: 3 keys)> browser.sweetproject.sub01awesomeanalysisDOTgrp.resultmodel01 ```

Connection settings (S3 only)

cottoncandy allows users to modify connection settings via botocore. For example, the user can define the connection time out for downloads, and the number of times to retry dropped S3 requests.

from botocore.client import Config config = Config(connect_timeout=60, read_timeout=60, retries=dict(max_attempts=10)) cci = cc.get_interface('my_bucket_name', config=config)

Google Drive backend

cottoncandy can also use Google Drive as a back-end. This equires a client_secrets.json file in your ~/.config/cottoncandy folder and the pydrive package.

See the Google Drive setup instructions for more details.

```python

import cottoncandy as cc cci = cc.get_interface(backend='gdrive') ```

Contributing

  • If you find any issues with cottoncandy, please report it by submitting an issue on GitHub.
  • If you wish to contribute, please submit a pull request. Include information as to how you ran the tests and the full output log if possible. Running tests on AWS can incur costs.

Cite as

Nunez-Elizalde AO, Gao JS, Zhang T, Gallant JL (2018). cottoncandy: scientific python package for easy cloud storage. Journal of Open Source Software, 3(28), 890, https://doi.org/10.21105/joss.00890

Owner

  • Name: gallantlab
  • Login: gallantlab
  • Kind: organization

JOSS Publication

cottoncandy: scientific python package for easy cloud storage
Published
August 24, 2018
Volume 3, Issue 28, Page 890
Authors
Anwar O. Nunez-Elizalde ORCID
Helen Wills Neuroscience Institute, University of California, Berkeley, CA, USA
James S. Gao
Helen Wills Neuroscience Institute, University of California, Berkeley, CA, USA
Tianjiao Zhang
Program in Bioengineering, UCSF and UC Berkeley, CA, USA
Jack L. Gallant
Helen Wills Neuroscience Institute, University of California, Berkeley, CA, USA, Program in Bioengineering, UCSF and UC Berkeley, CA, USA, Department of Psychology, University of California, Berkeley, CA, USA
Editor
Thomas J. Leeper ORCID
Tags
S3 cloud storage

GitHub Events

Total
  • Issues event: 3
  • Watch event: 1
  • Delete event: 1
  • Issue comment event: 6
  • Push event: 5
  • Pull request event: 11
  • Create event: 4
Last Year
  • Issues event: 3
  • Watch event: 1
  • Delete event: 1
  • Issue comment event: 6
  • Push event: 5
  • Pull request event: 11
  • Create event: 4

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 338
  • Total Committers: 21
  • Avg Commits per committer: 16.095
  • Development Distribution Score (DDS): 0.473
Past Year
  • Commits: 4
  • Committers: 3
  • Avg Commits per committer: 1.333
  • Development Distribution Score (DDS): 0.5
Top Committers
Name Email Commits
Anwar Nunez-Elizalde a****z@g****m 178
Tianjiao Zhang t****g@b****u 54
Tom Dupré la Tour t****r@m****g 32
MarkLescroart m****t@b****u 12
Storm Slivkoff s****f@g****m 7
fatma f****u@g****m 7
carson c****n@c****b 6
cchen23 c****7@g****m 6
Sara Popham s****m@b****u 5
robert g****o@g****m 5
Alex Huth a****h@b****u 5
Carson McNeil c****l@g****m 4
Jen Holmberg 8****g 3
arokem a****m@g****m 3
Matteo Visconti di Oleggio Castello m****c@b****u 3
carson c****n@n****b 3
Aditya Vaidya 6****8 1
Michael Oliver m****r@g****m 1
Thomas J. Leeper t****r@g****m 1
Ubuntu u****u@i****l 1
moflo g****b@m****e 1

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 47
  • Total pull requests: 60
  • Average time to close issues: 5 months
  • Average time to close pull requests: about 1 month
  • Total issue authors: 15
  • Total pull request authors: 18
  • Average comments per issue: 1.57
  • Average comments per pull request: 1.15
  • Merged pull requests: 39
  • Bot issues: 0
  • Bot pull requests: 2
Past Year
  • Issues: 2
  • Pull requests: 17
  • Average time to close issues: 5 months
  • Average time to close pull requests: 3 months
  • Issue authors: 2
  • Pull request authors: 4
  • Average comments per issue: 0.5
  • Average comments per pull request: 0.88
  • Merged pull requests: 4
  • Bot issues: 0
  • Bot pull requests: 1
Top Authors
Issue Authors
  • anwarnunez (27)
  • alexhuth (7)
  • chrisgorgo (1)
  • majure (1)
  • jamesgao (1)
  • cmcneil (1)
  • spopham (1)
  • vsoch (1)
  • kroq-gar78 (1)
  • mvdoc (1)
  • jenholmberg (1)
  • r-b-g-b (1)
  • the-moliver (1)
  • DeepakSahoo-Reflektion (1)
  • TomDLT (1)
Pull Request Authors
  • kroq-gar78 (8)
  • mvdoc (8)
  • anwarnunez (7)
  • fatmai (5)
  • marklescroart (5)
  • cmcneil (4)
  • arokem (3)
  • r-b-g-b (3)
  • eickenberg (3)
  • spopham (2)
  • jenholmberg (2)
  • dependabot[bot] (2)
  • TomDLT (2)
  • cchen23 (2)
  • sslivkoff (1)
Top Labels
Issue Labels
enhancement (9) bug (5) help wanted (2) question (2)
Pull Request Labels
dependencies (2) github_actions (1)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 60 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 7
  • Total maintainers: 4
pypi.org: cottoncandy

sugar for S3

  • Versions: 7
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 60 Last month
Rankings
Forks count: 9.1%
Dependent packages count: 10.1%
Stargazers count: 11.0%
Average: 16.1%
Dependent repos count: 21.6%
Downloads: 28.5%
Maintainers (4)
Last synced: 4 months ago

Dependencies

requirements.txt pypi
  • boto3 >=1.2.3
  • cloudpickle >=0.2.2
  • dask *
  • numcodecs >=0.5.5
  • numpy >=1.6.0
  • pycrypto >=2.6.1
  • pydrive >=1.3.1
  • python-dateutil >=2.7.3
  • scipy >=0.9.0
  • six >=1.11.0
  • toolz >=0.7.4
  • wheel >=0.31.1
setup.py pypi
  • PyDrive *
  • boto3 *
  • botocore *
  • pycrypto *
  • python-dateutil *
  • six *
.github/workflows/run_tests.yml actions
  • actions/cache v2 composite
  • actions/checkout v2 composite
  • actions/setup-python v2 composite