py-dataset
Python package of dataset (https://github.com/caltechlibrary/dataset) for working with JSON objects as collections on disc
Science Score: 62.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
✓Committers with academic emails
1 of 3 committers (33.3%) from academic institutions -
✓Institutional organization owner
Organization caltechlibrary has institutional domain (www.library.caltech.edu) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.3%) to scientific vocabulary
Repository
Python package of dataset (https://github.com/caltechlibrary/dataset) for working with JSON objects as collections on disc
Basic Info
- Host: GitHub
- Owner: caltechlibrary
- License: other
- Language: Python
- Default Branch: main
- Homepage: https://caltechlibrary.github.io/py_dataset
- Size: 408 MB
Statistics
- Stars: 2
- Watchers: 5
- Forks: 1
- Open Issues: 3
- Releases: 13
Metadata Files
README.md
py_dataset
py_dataset is a Python wrapper for the dataset command line tools. It replaces the depreciated libdataset a C shared library starting with the dataset 2.2.x release.
This package wraps all dataset operations such as initialization of collections, creation, reading, updating and deleting JSON objects in the collection. Some of its enhanced features include the ability to generate data frames as well as the ability to import and export JSON objects to and from CSV files.
py_dataset is release under a BSD style license.
Features
dataset supports
- Basic storage actions (create, read, update and delete)
- listing of collection keys (including filtering and sorting)
- import/export of CSV files.
- The ability to reshape data by performing simple object join
- The ability to create data frames from collections based on keys lists and dot paths into the JSON objects stored
See docs for detials.
Limitations of dataset
dataset has many limitations, some are listed below
- it is not a multi-process, multi-user data store (it's files on "disc" without locking)
- it is not a replacement for a repository management system
- it is not a general purpose database system
- it does not supply version control on collections or objects
Install
Available via pip pip install py_dataset or by downloading this repo and
typing python setup.py install. This repo includes dataset shared C libraries
compiled for Windows, Mac, and Linux and the appripriate library will be used
automatically.
Quick Tutorial
This module provides the functionality of the dataset command line tool as a Python 3.10 module. Once installed try out the following commands to see if everything is in order (or to get familier with dataset).
The "#" comments don't have to be typed in, they are there to explain the commands as your type them. Start the tour by launching Python3 in interactive mode.
shell
python3
Then run the following Python commands.
```python from pydataset import dataset # Almost all the commands require the collectionname as first paramter, # we're storing that name in cname for convienence. cname = "atourof_dataset.ds"
# Let's create our a dataset collection. We use the method called
# 'init' it returns True on success or False otherwise.
dataset.init(c_name)
# Let's check to see if our collection to exists, True it exists
# False if it doesn't.
dataset.status(c_name)
# Let's count the records in our collection (should be zero)
cnt = dataset.count(c_name)
print(cnt)
# Let's read all the keys in the collection (should be an empty list)
keys = dataset.keys(c_name)
print(keys)
# Now let's add a record to our collection. To create a record we need to know
# this collection name (e.g. c_name), the key (most be string) and have a
# record (i.e. a dict literal or variable)
key = "one"
record = {"one": 1}
# If create returns False, we can check the last error message
# with the 'error_message' method
if not dataset.create(c_name, key, record):
print(dataset.error_message())
# Let's count and list the keys in our collection, we should see a count of '1' and a key of 'one'
dataset.count(c_name)
keys = dataset.keys(c_name)
print(keys)
# We can read the record we stored using the 'read' method.
new_record, err = dataset.read(c_name, key)
if err != '':
print(err)
else:
print(new_record)
# Let's modify new_record and update the record in our collection
new_record["two"] = 2
if not dataset.update(c_name, key, new_record):
print(dataset.error_message())
# Let's print out the record we stored using read method
# read returns a touple so we're printing the first one.
print(dataset.read(c_name, key)[0])
# Now let's query the collection.
sql_stmt = f'''select src from {c_name} order by created desc'''
print(dataset.query(c_name, sql_stmt))
# Finally we can remove (delete) a record from our collection
if not dataset.delete(c_name, key):
print(dataset.error_message())
# We should not have a count of Zero records
cnt = dataset.count(c_name)
print(cnt)
```
Owner
- Name: Caltech Library
- Login: caltechlibrary
- Kind: organization
- Email: helpdesk@library.caltech.edu
- Location: Pasadena, CA 91125
- Website: https://www.library.caltech.edu/
- Repositories: 84
- Profile: https://github.com/caltechlibrary
We manage the physical and digital holdings of the California Institute of Technology, provide services and training, and develop open-source software.
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
type: software
title: py_dataset
abstract: "A command line tool for working with JSON documents on local disc"
authors:
- family-names: Doiel
given-names: Robert
orcid: https://orcid.org/0000-0003-0900-6903
email: rsdoiel@caltech.edu
- family-names: Morrell
given-names: Thomas E
orcid: https://orcid.org/0000-0001-9266-5146
email: tmorrell@caltech.edu
contacts:
- family-names: Doiel
given-names: R. S.
orcid: https://orcid.org/0000-0003-0900-6903
email: rsdoiel@caltech.edu
- family-names: Morrell
given-names: Thomas E
orcid: https://orcid.org/0000-0001-9266-5146
email: tmorrell@caltech.edu
repository-code: "https://github.com/caltechlibrary/py_dataset"
version: 2.2.3.1
date-released: 2025-04-17
license-url: "https://github.com/caltechlibrary/py_dataset/blob/main/LICENSE"
keywords:
- GitHub
- metadata
- data
- software
- json
CodeMeta (codemeta.json)
{
"@context": "https://doi.org/10.5063/schema/codemeta-2.0",
"type": "SoftwareSourceCode",
"codeRepository": "https://github.com/caltechlibrary/py_dataset",
"author": [
{
"id": "https://orcid.org/0000-0003-0900-6903",
"type": "Person",
"givenName": "Robert",
"familyName": "Doiel",
"affiliation": {
"@type": "Organization",
"name": "Caltech Library"
},
"email": "rsdoiel@caltech.edu"
},
{
"id": "https://orcid.org/0000-0001-9266-5146",
"type": "Person",
"givenName": "Thomas E",
"familyName": "Morrell",
"affiliation": {
"@type": "Organization",
"name": "Caltech Library"
},
"email": "tmorrell@caltech.edu"
}
],
"maintainer": [
{
"id": "https://orcid.org/0000-0003-0900-6903",
"type": "Person",
"givenName": "R. S.",
"familyName": "Doiel",
"affiliation": {
"@type": "Organization",
"name": "Caltech Library"
},
"email": "rsdoiel@caltech.edu"
},
{
"id": "https://orcid.org/0000-0001-9266-5146",
"type": "Person",
"givenName": "Thomas E",
"familyName": "Morrell",
"affiliation": {
"@type": "Organization",
"name": "Caltech Library"
},
"email": "tmorrell@caltech.edu"
}
],
"dateCreated": "2017-06-18",
"dateModified": "2025-04-17",
"datePublished": "2025-04-17",
"description": "A command line tool for working with JSON documents on local disc",
"funder": [
"Caltech Library"
],
"keywords": [
"GitHub",
"metadata",
"data",
"software",
"json"
],
"name": "py_dataset",
"license": "https://github.com/caltechlibrary/py_dataset/blob/main/LICENSE",
"programmingLanguage": [
"Python3"
],
"softwareRequirements": [
"dataset >= 2.2.3"
],
"version": "2.2.3.1",
"developmentStatus": "active",
"issueTracker": "https://github.com/caltechlibrary/py_dataset/issues",
"downloadUrl": "https://github.com/caltechlibrary/py_dataset/releases",
"releaseNotes": "This patch adds missing dsquery support.",
"copyrightYear": 2025,
"copyrightHolder": "California Institute of Technology"
}
GitHub Events
Total
- Release event: 3
- Watch event: 1
- Delete event: 1
- Push event: 16
- Create event: 4
Last Year
- Release event: 3
- Watch event: 1
- Delete event: 1
- Push event: 16
- Create event: 4
Committers
Last synced: almost 3 years ago
All Time
- Total Commits: 93
- Total Committers: 3
- Avg Commits per committer: 31.0
- Development Distribution Score (DDS): 0.226
Top Committers
| Name | Commits | |
|---|---|---|
| R. S. Doiel | r****l@g****m | 72 |
| Tom Morrell | t****l@c****u | 20 |
| tmorrell | t****l@u****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 14
- Total pull requests: 2
- Average time to close issues: 26 days
- Average time to close pull requests: 8 days
- Total issue authors: 2
- Total pull request authors: 1
- Average comments per issue: 1.0
- Average comments per pull request: 0.5
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- rsdoiel (9)
- tmorrell (4)
Pull Request Authors
- rsdoiel (2)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 458 last-month
- Total docker downloads: 81
- Total dependent packages: 1
- Total dependent repositories: 1
- Total versions: 7
- Total maintainers: 1
pypi.org: py-dataset
A command line tool for working with JSON documents on local disc
- Homepage: https://github.com/caltechlibrary/py_dataset
- Documentation: https://py-dataset.readthedocs.io/
- License: https://data.caltech.edu/license
-
Latest release: 1.0.1
published over 4 years ago
Rankings
Maintainers (1)
Dependencies
- EndBug/add-and-commit v7 composite
- actions/checkout v2 composite
- caltechlibrary/codemeta2cff main composite