https://github.com/mverleg/pyjson_tricks
Extra features for Python's JSON: comments, order, numpy, pandas, datetimes, and many more! Simple but customizable.
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
✓Committers with academic emails
3 of 12 committers (25.0%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.0%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
Extra features for Python's JSON: comments, order, numpy, pandas, datetimes, and many more! Simple but customizable.
Basic Info
Statistics
- Stars: 160
- Watchers: 10
- Forks: 24
- Open Issues: 9
- Releases: 0
Topics
Metadata Files
README.md
JSON tricks (python)
The [pyjson-tricks] package brings several pieces of functionality to python handling of json files:
- Store and load numpy arrays in human-readable format.
- Store and load class instances both generic and customized.
- Store and load date/times as a dictionary (including timezone).
- Preserve map order
{}usingOrderedDict. - Allow for comments in json files by starting lines with
#. - Sets, complex numbers, Decimal, Fraction, enums, compression, duplicate keys, pathlib Paths, bytes ...
As well as compression and disallowing duplicate keys.
- Code: https://github.com/mverleg/pyjson_tricks
- Documentation: http://json-tricks.readthedocs.org/en/latest/
- PIP: https://pypi.python.org/pypi/json_tricks
Several keys of the format __keyname__ have special meanings, and more
might be added in future releases.
If you're considering JSON-but-with-comments as a config file format, have a look at HJSON, it might be more appropriate. For other purposes, keep reading!
Thanks for all the Github stars⭐!
Installation and use
You can install using
bash
pip install json-tricks
Decoding of some data types needs the corresponding package to be
installed, e.g. numpy for arrays, pandas for dataframes and pytz
for timezone-aware datetimes.
You can import the usual json functions dump(s) and load(s), as well as a separate comment removal function, as follows:
bash
from json_tricks import dump, dumps, load, loads, strip_comments
The exact signatures of these and other functions are in the documentation.
Quite some older versions of Python are supported. For an up-to-date list see the automated tests.
Features
Numpy arrays
When not compressed, the array is encoded in sort-of-readable and very flexible and portable format, like so:
python
arr = arange(0, 10, 1, dtype=uint8).reshape((2, 5))
print(dumps({'mydata': arr}))
this yields:
javascript
{
"mydata": {
"dtype": "uint8",
"shape": [2, 5],
"Corder": true,
"__ndarray__": [[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]
}
}
which will be converted back to a numpy array when using
json_tricks.loads. Note that the memory order (Corder) is only
stored in v3.1 and later and for arrays with at least 2 dimensions.
As you see, this uses the magic key __ndarray__. Don't use
__ndarray__ as a dictionary key unless you're trying to make a numpy
array (and know what you're doing).
Numpy scalars are also serialized (v3.5+). They are represented by the closest python primitive type. A special representation was not feasible, because Python's json implementation serializes some numpy types as primitives, without consulting custom encoders. If you want to preserve the exact numpy type, use encodescalarsinplace.
There is also a compressed format (thanks claydugo for fix). From
the next major release, this will be default when using compression.
For now, you can use it as:
python
dumps(data, compression=True, properties={'ndarray_compact': True})
This compressed format encodes the array data in base64, with gzip
compression for the array, unless 1) compression has little effect for
that array, or 2) the whole file is already compressed. If you only want
compact format for large arrays, pass the number of elements to
ndarray_compact.
Example:
``` python data = [linspace(0, 10, 9), array([pi, exp(1)])] dumps(data, compression=False, properties={'ndarray_compact': 8})
[{ "ndarray": "b64.gz:H4sIAAAAAAAC/2NgQAZf7CE0iwOE5oPSIlBaEkrLQegGRShfxQEAz7QFikgAAAA=", "dtype": "float64", "shape": [9] }, { "ndarray": [3.141592653589793, 2.718281828459045], "dtype": "float64", "shape": [2] }] ```
Class instances
json_tricks can serialize class instances.
If the class behaves normally (not generated dynamic, no __new__ or
__metaclass__ magic, etc) and all it's attributes are serializable,
then this should work by default.
``` python
jsontricks/testclass.py
class MyTestCls: def init(self, **kwargs): for k, v in kwargs.items(): setattr(self, k, v)
cls_instance = MyTestCls(s='ub', dct={'7': 7})
json = dumps(clsinstance, indent=4) clsinstance_again = loads(json) ```
You'll get your instance back. Here the json looks like this:
javascript
{
"__instance_type__": [
"json_tricks.test_class",
"MyTestCls"
],
"attributes": {
"s": "ub",
"dct": {
"7": 7
}
}
}
As you can see, this stores the module and class name. The class must be
importable from the same module when decoding (and should not have
changed). If it isn't, you have to manually provide a dictionary to
cls_lookup_map when loading in which the class name can be looked up.
Note that if the class is imported, then globals() is such a
dictionary (so try loads(json, cls_lookup_map=glboals())). Also note
that if the class is defined in the 'top' script (that you're calling
directly), then this isn't a module and the import part cannot be
extracted. Only the class name will be stored; it can then only be
deserialized in the same script, or if you provide cls_lookup_map.
Note that this also works with slots without having to do anything
(thanks to koffie and dominicdoty), which encodes like this (custom
indentation):
javascript
{
"__instance_type__": ["module.path", "ClassName"],
"slots": {"slotattr": 37},
"attributes": {"dictattr": 42}
}
If the instance doesn't serialize automatically, or if you want custom
behaviour, then you can implement __json__encode__(self) and
__json_decode__(self, **attributes) methods, like so:
``` python class CustomEncodeCls: def init(self): self.relevant = 42 self.irrelevant = 37
def __json_encode__(self):
# should return primitive, serializable types like dict, list, int, string, float...
return {'relevant': self.relevant}
def __json_decode__(self, **attrs):
# should initialize all properties; note that __init__ is not called implicitly
self.relevant = attrs['relevant']
self.irrelevant = 12
```
As you've seen, this uses the magic key __instance_type__. Don't use
__instance_type__ as a dictionary key unless you know what you're
doing.
Date, time, datetime and timedelta
Date, time, datetime and timedelta objects are stored as dictionaries of "day", "hour", "millisecond" etc keys, for each nonzero property.
Timezone name is also stored in case it is set, as is DST (thanks eumir).
You'll need to have pytz installed to use timezone-aware date/times,
it's not needed for naive date/times.
javascript
{
"__datetime__": null,
"year": 1988,
"month": 3,
"day": 15,
"hour": 8,
"minute": 3,
"second": 59,
"microsecond": 7,
"tzinfo": "Europe/Amsterdam"
}
This approach was chosen over timestamps for readability and consistency
between date and time, and over a single string to prevent parsing
problems and reduce dependencies. Note that if primitives=True,
date/times are encoded as ISO 8601, but they won't be restored
automatically.
Don't use __date__, __time__, __datetime__, __timedelta__ or
__tzinfo__ as dictionary keys unless you know what you're doing, as
they have special meaning.
Order
Given an ordered dictionary like this (see the tests for a longer one):
python
ordered = OrderedDict((
('elephant', None),
('chicken', None),
('tortoise', None),
))
Converting to json and back will preserve the order:
python
from json_tricks import dumps, loads
json = dumps(ordered)
ordered = loads(json, preserve_order=True)
where preserve_order=True is added for emphasis; it can be left out
since it's the default.
As a note on performance,
both dicts and OrderedDicts have the same scaling for getting and
setting items (O(1)). In Python versions before 3.5, OrderedDicts were
implemented in Python rather than C, so were somewhat slower; since
Python 3.5 both are implemented in C. In summary, you should have no
scaling problems and probably no performance problems at all, especially
in Python 3. Python 3.6+ preserves order of dictionaries by default
making this redundant, but this is an implementation detail that should
not be relied on.
Comments
Warning: in the next major version, comment parsing will be opt-in, not
default anymore (for performance reasons). Update your code now to pass
ignore_comments=True explicitly if you want comment parsing.
This package uses # and // for comments, which seem to be the most
common conventions, though only the latter is valid javascript.
For example, you could call loads on the following string:
{ # "comment 1 "hello": "Wor#d", "Bye": ""M#rk"", "yes\"": 5,# comment" 2 "quote": ""th#t's" what she said", // comment "3" "list": [1, 1, "#", """, "\", 8], "dict": {"q": 7} #" comment 4 with quotes } // comment 5
And it would return the de-commented version:
javascript
{
"hello": "Wor#d", "Bye": ""M#rk"", "yes\\"": 5,
"quote": ""th#t's" what she said",
"list": [1, 1, "#", """, "\", 8], "dict": {"q": 7}
}
Since comments aren't stored in the Python representation of the data, loading and then saving a json file will remove the comments (it also likely changes the indentation).
The implementation of comments is a bit crude, which means that there are some exceptional cases that aren't handled correctly (#57).
It is also not very fast. For that reason, if ignore_comments wasn't
explicitly set to True, then json-tricks first tries to parge without
ignoring comments. If that fails, then it will automatically re-try
with comment handling. This makes the no-comment case faster at the cost
of the comment case, so if you are expecting comments make sure to set
ignore_comments to True.
Other features
- Special floats like
NaN,Infinityand-0using theallow_nan=Trueargument (non-standard json, may not decode in other implementations). - Sets are serializable and can be loaded. By default the set json representation is sorted, to have a consistent representation.
- Save and load complex numbers (py3) with
1+2jserializing as{'__complex__': [1, 2]}. - Save and load
DecimalandFraction(including NaN, infinity, -0 for Decimal). - Save and load
Enum(thanks toJenselme), either built-in in python3.4+, or with the enum34 package in earlier versions.IntEnumneeds encodeintenumsinplace. json_tricksallows for gzip compression using thecompression=Trueargument (off by default).json_trickscan check for duplicate keys in maps by settingallow_duplicatesto False. These are kind of allowed, but are handled inconsistently between json implementations. In Python, fordictandOrderedDict, duplicate keys are silently overwritten.- Save and load
pathlib.Pathobjects (e.g., the current path,Path('.'), serializes as{"__pathlib__": "."}) (thanks tobburan). - Save and load bytes (python 3+ only), which will be encoded as utf8 if
that is valid, or as base64 otherwise. Base64 is always used if
primitives are requested. Serialized as
[{"__bytes_b64__": "aGVsbG8="}]vs[{"__bytes_utf8__": "hello"}]. - Save and load slices (thanks to
claydugo).
Preserve type vs use primitive
By default, types are encoded such that they can be restored to their
original type when loaded with json-tricks. Example encodings in this
documentation refer to that format.
You can also choose to store things as their closest primitive type (e.g. arrays and sets as lists, decimals as floats). This may be desirable if you don't care about the exact type, or you are loading the json in another language (which doesn't restore python types). It's also smaller.
To forego meta data and store primitives instead, pass primitives to
dump(s). This is available in version 3.8 and later. Example:
``` python data = [ arange(0, 10, 1, dtype=int).reshape((2, 5)), datetime(year=2017, month=1, day=19, hour=23, minute=00, second=00), 1 + 2j, Decimal(42), Fraction(1, 3), MyTestCls(s='ub', dct={'7': 7}), # see later set(range(7)), ]
Encode with metadata to preserve types when decoding
print(dumps(data)) ```
``` javascript
// (comments added and indenting changed) [ // numpy array { "ndarray": [ [0, 1, 2, 3, 4], [5, 6, 7, 8, 9]], "dtype": "int64", "shape": [2, 5], "Corder": true }, // datetime (naive) { "datetime": null, "year": 2017, "month": 1, "day": 19, "hour": 23 }, // complex number { "complex": [1.0, 2.0] }, // decimal & fraction { "decimal": "42" }, { "fraction": true "numerator": 1, "denominator": 3, }, // class instance { "instance_type": [ "tests.testclass", "MyTestCls" ], "attributes": { "s": "ub", "dct": {"7": 7} } }, // set { "set_": [0, 1, 2, 3, 4, 5, 6] } ] ```
``` python
Encode as primitive types; more simple but loses type information
print(dumps(data, primitives=True)) ```
javascript
// (comments added and indentation changed)
[
// numpy array
[[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]],
// datetime (naive)
"2017-01-19T23:00:00",
// complex number
[1.0, 2.0],
// decimal & fraction
42.0,
0.3333333333333333,
// class instance
{
"s": "ub",
"dct": {"7": 7}
},
// set
[0, 1, 2, 3, 4, 5, 6]
]
Note that valid json is produced either way: json-tricks stores meta data as normal json, but other packages probably won't interpret it.
Note that valid json is produced either way: json-tricks stores meta
data as normal json, but other packages probably won't interpret it.
Usage & contributions
Code is under Revised BSD License so you can use it for most purposes including commercially.
Contributions are very welcome! Bug reports, feature suggestions and code contributions help this project become more useful for everyone! There is a short contribution guide.
Contributors not yet mentioned: janLo (performance boost).
Tests
Tests are run automatically for commits to the repository for all supported versions. This is the status:
To run the tests manually for your version, see this guide.
Owner
- Name: Mark Verleg
- Login: mverleg
- Kind: user
- Location: Earth
- Website: https://markv.nl
- Repositories: 137
- Profile: https://github.com/mverleg
GitHub Events
Total
- Issues event: 4
- Watch event: 8
- Issue comment event: 6
- Pull request event: 2
- Fork event: 1
Last Year
- Issues event: 4
- Watch event: 8
- Issue comment event: 6
- Pull request event: 2
- Fork event: 1
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Mark | m****y@g****m | 143 |
| mark | m****k@r****i | 122 |
| Clay Dugo | c****y@r****m | 9 |
| mulan | m****k@m****n | 7 |
| Julien Enselme | j****e@c****m | 7 |
| Maarten Derickx | m****n@m****l | 4 |
| Yaroslav Halchenko | d****n@o****m | 2 |
| Mark Harfouche | m****e@g****m | 2 |
| Brad Buran | b****n@a****u | 2 |
| Steve Kowalik | s****n@w****g | 1 |
| Jan Losinski | l****i@w****e | 1 |
| Erwan | e****r@c****r | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 7 months ago
All Time
- Total issues: 75
- Total pull requests: 32
- Average time to close issues: 8 months
- Average time to close pull requests: 4 days
- Total issue authors: 33
- Total pull request authors: 13
- Average comments per issue: 2.49
- Average comments per pull request: 1.38
- Merged pull requests: 26
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 2
- Pull requests: 4
- Average time to close issues: 20 days
- Average time to close pull requests: 12 days
- Issue authors: 2
- Pull request authors: 3
- Average comments per issue: 3.0
- Average comments per pull request: 0.75
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- mverleg (36)
- hmaarrfk (5)
- yarikoptic (2)
- cfobel (2)
- ruancomelli (1)
- Arakkun (1)
- dominicdoty (1)
- papalotis (1)
- liuzhe-lz (1)
- Laksh1997 (1)
- samuelgprice (1)
- Miserlou (1)
- MiaoDX (1)
- skewty (1)
- minerharry (1)
Pull Request Authors
- mverleg (11)
- hmaarrfk (8)
- claydugo (4)
- zeha (2)
- koffie (2)
- janLo (1)
- s-t-e-v-e-n-k (1)
- Jenselme (1)
- bburan (1)
- yarikoptic (1)
- peircej (1)
- erwanp (1)
- dominicdoty (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 2
-
Total downloads:
- pypi 180,797 last-month
- Total docker downloads: 6,648
-
Total dependent packages: 36
(may contain duplicates) -
Total dependent repositories: 237
(may contain duplicates) - Total versions: 65
- Total maintainers: 1
pypi.org: json-tricks
Extra features for Python's JSON: comments, order, numpy, pandas, datetimes, and many more! Simple but customizable.
- Homepage: https://github.com/mverleg/pyjson_tricks
- Documentation: https://json-tricks.readthedocs.io/
- License: Revised BSD License (LICENSE.txt)
-
Latest release: 3.17.3
published over 2 years ago
Rankings
Maintainers (1)
conda-forge.org: json_tricks
The pyjson-tricks package brings several pieces of functionality to python handling of json files: (1) store and load numpy arrays in human-readable format; (2) store and load class instances both generic and customized; (3) store and load date/times as a dictionary (including timezone); (4) preserve map order {} using OrderedDict; (5) allow for comments in json files by starting lines with #; (6) sets, complex numbers, decimal, fraction, enums, compression, duplicate keys, etc. As well as compression and disallowing duplicate keys.
- Homepage: https://github.com/mverleg/pyjson_tricks
- License: BSD-3-Clause
-
Latest release: 3.16.1
published over 3 years ago
Rankings
Dependencies
- actions/checkout v1 composite
- actions/setup-python v2 composite