Visions

Visions: An Open-Source Library for Semantic Data - Published in JOSS (2020)

https://github.com/dylan-profiler/visions

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 4 DOI reference(s) in README and JOSS metadata
✓
Academic publication links
Links to: joss.theoj.org
○
Committers with academic emails
○
Institutional organization owner
✓
JOSS paper metadata
Published in Journal of Open Source Software

Keywords

data-analysis data-science hacktoberfest numpy pandas python spark type-inference type-system

Keywords from Contributors

energy-systems meshes cryptocurrencies blackhole gravitational-lenses bioinformatics simulations bayesian-statistics graph-generation hydrology

Scientific Fields

Sociology Social Sciences - 87% confidence

Last synced: 6 months ago · JSON representation

Repository

Type System for Data Analysis in Python

Basic Info

Host: GitHub
Owner: dylan-profiler
License: other
Language: Python
Default Branch: develop
Homepage: https://dylan-profiler.github.io/visions/visions/getting_started/usage/types.html
Size: 37.9 MB

Statistics

Stars: 213
Watchers: 6
Forks: 19
Open Issues: 18
Releases: 17

Topics

data-analysis data-science hacktoberfest numpy pandas python spark type-inference type-system

Created about 6 years ago · Last pushed about 1 year ago

Metadata Files

Readme License

README.md

And these visions of data types, they kept us up past the dawn.

The Semantic Data Library

Visions provides a set of tools for defining and using semantic data types.

[x] Semantic type detection & inference on sequence data.
[x] Automated data processing
[x] Completely customizable. Visions makes it easy to build and modify semantic data types for domain specific purposes
[x] Out of the box support for multiple backend implementations including pandas, spark, numpy, and python
[x] A robust set of default types and typesets covering the most common use cases.

Check out the complete documentation here.

Installation

Source code is available on github and binary installers via pip.

```

Pip

pip install visions ```

Complete installation instructions (including extras) are available in the docs.

Quick Start Guide

If you want to play immediately check out the examples folder on . Otherwise, let's get some data

```python import pandas as pd

df = pd.read_csv("https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv") df.head(2) ```

PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked
1	0	3	Braund, Mr. Owen Harris	male	22.0	1	0	A/5 21171	7.2500	NaN	S
2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Thayer)	female	38.0	1	0	PC 17599	71.2833	C85	C

The most important abstraction in visions are Types - these represent semantic notions about data. You have access to a range of well tested types like Integer, Float, and Files covering the most common software development use cases. Types can be bundled together into typesets. Behind the scenes, visions builds a traversable graph for any collection of types.

```python from visions import types, typesets

StandardSet is the basic builtin typeset

typeset = typesets.CompleteSet() typeset.plot_graph() ```

Note: Plots require pygraphviz to be installed.

Because of the special relationship between types these graphs can be used to detect the type of your data or infer a more appropriate one.

```python

Detection looks like this

typeset.detect_type(df)

While inference looks like this

typeset.infer_type(df)

Inference works well even if we monkey with the data, say by converting everything to strings

typeset.infer_type(df.astype(str))

{ 'PassengerId': Integer, 'Survived': Integer, 'Pclass': Integer, 'Name': String, 'Sex': String, 'Age': Float, 'SibSp': Integer, 'Parch': Integer, 'Ticket': String, 'Fare': Float, 'Cabin': String, 'Embarked': String } ```

Visions solves many of the most common problems working with tabular data for example, sequences of Integers are still recognized as integers whether they have trailing decimal 0's from being cast to float, missing values, or something else altogether. Much of this cleaning is performed automatically providing nicely cleaned and processed data as well.

python cleaned_df = typeset.cast_to_inferred(df)

This is only a small taste of everything visions can do including building your own domain specific types and typesets so please check out the API documentation or the examples/ directory for more info!

Supported frameworks

Thanks to its dispatch based implementation Visions is able to exploit framework specific capabilities offered by libraries like pandas and spark. Currently it works with the following backends by default.

Pandas (feature complete)
Numpy (boolean, complex, date time, float, integer, string, time deltas, string, objects)
Spark (boolean, categorical, date, date time, float, integer, numeric, object, string)
Python (string, float, integer, date time, time delta, boolean, categorical, object, complex - other datatypes are untested)

If you're using pandas it will also take advantage of parallelization tools like swifter if available.

It also offers a simple annotation based API for registering new implementations as needed. For example, if you wished to extend the categorical data type to include a Dask specific implementation you might do something like

```python from visions.types.categorical import Categorical from pandas.api import types as pdt import dask

@Categorical.containsop.register def categoricalcontains(series: dask.dataframe.Series, state: dict) -> bool: return pdt.iscategoricaldtype(series.dtype) ```

Contributing and support

Contributions to visions are welcome. For more information, please visit the community contributions page and join on us on slack. The github issues tracker is used for reporting bugs, feature requests and support questions.

Also, please check out some of the other companies and packages using visions including:

If you're currently using visions or would like to be featured here please let us know.

Acknowledgements

This package is part of the dylan-profiler project. The package is core component of pandas-profiling. More information can be found here. This work was partially supported by SIDN Fonds.

Owner

Name: dylan-profiler
Login: dylan-profiler
Kind: organization

Repositories: 4
Profile: https://github.com/dylan-profiler

DYLAN: Tools for effective data analysis

JOSS Publication

Visions: An Open-Source Library for Semantic Data

Published

April 13, 2020

DOI

10.21105/joss.02145

Volume 5, Issue 48, Page 2145

Authors

Simon Brugman

Radboud University

Ian Eaves

Independent

Editor

Matthew Sottile

GitHub Events

Total

Create event: 7
Release event: 2
Issues event: 3
Watch event: 6
Delete event: 5
Issue comment event: 2
Push event: 24
Pull request review event: 3
Pull request review comment event: 4
Pull request event: 21

Last Year

Create event: 7
Release event: 2
Issues event: 3
Watch event: 6
Delete event: 5
Issue comment event: 2
Push event: 24
Pull request review event: 3
Pull request review comment event: 4
Pull request event: 21

Committers

Last synced: 7 months ago

All Time

Total Commits: 906
Total Committers: 12
Avg Commits per committer: 75.5
Development Distribution Score (DDS): 0.371

Past Year

Commits: 10
Committers: 2
Avg Commits per committer: 5.0
Development Distribution Score (DDS): 0.1

Top Committers

Name	Email	Commits
simon_graphkite	s**n@g**m	570
Ian Eaves	i**s@g**m	254
GitHub Action	a**n@g**m	72
dependabot-preview[bot]	2****]	2
lgtm-com[bot]	4****]	1
dependabot[bot]	4****]	1
Gustavo Camargo	g****1	1
Erik Cederstrand	e**k@c**k	1
Dan Houghton	d**n@g**m	1
Charles-Meldhine Madi Mnemoi	6****6	1
Arfon Smith	a****n	1
Aarni Koskela	a**x@i**i	1

Committer Domains (Top 20 + Academic)

iki.fi: 1 cederstrand.dk: 1 github.com: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 31
Total pull requests: 98
Average time to close issues: about 2 months
Average time to close pull requests: about 1 month
Total issue authors: 18
Total pull request authors: 10
Average comments per issue: 1.58
Average comments per pull request: 0.96
Merged pull requests: 85
Bot issues: 1
Bot pull requests: 7

Past Year

Issues: 1
Pull requests: 20
Average time to close issues: about 22 hours
Average time to close pull requests: 12 minutes
Issue authors: 1
Pull request authors: 1
Average comments per issue: 2.0
Average comments per pull request: 0.1
Merged pull requests: 18
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

sbrugman (8)
ieaves (6)
majidaldo (2)
cstabnick (1)
dependabot-preview[bot] (1)
seshurajup (1)
lahwaacz (1)
irvinktang (1)
fkiraly (1)
nv-rliu (1)
PraJaL55 (1)
sterlinm (1)
hubutui (1)
ttpro1995 (1)
cmnemoi (1)

Pull Request Authors

sbrugman (47)
ieaves (42)
dependabot[bot] (3)
lgtm-com[bot] (2)
dependabot-preview[bot] (2)
akx (1)
cmnemoi (1)
ecederstrand (1)
dah33 (1)
gcamargo1 (1)

Top Labels

Issue Labels

enhancement (13) bug (13) good first issue (1) wontfix (1)

Pull Request Labels

dependencies (5)

Packages

Total packages: 3
Total downloads:
- pypi 1,072,345 last-month
Total docker downloads: 511,361

Total dependent packages: 10
(may contain duplicates)
Total dependent repositories: 748
(may contain duplicates)
Total versions: 50
Total maintainers: 2

pypi.org: visions

Visions

Documentation: https://dylan-profiler.github.io/visions
License: BSD License
Latest release: 0.8.1
published about 1 year ago

Versions: 30
Dependent Packages: 6
Dependent Repositories: 722
Downloads: 1,072,345 Last month
Docker Downloads: 511,361

Rankings

Downloads: 0.3%

Dependent repos count: 0.5%

Docker downloads count: 0.9%

Dependent packages count: 1.6%

Average: 2.8%

Stargazers count: 5.0%

Forks count: 8.4%

Maintainers (2)

ieaves sbrugman

Last synced: 6 months ago

conda-forge.org: visions

Homepage: https://github.com/dylan-profiler/visions
License: BSD-4-Clause
Latest release: 0.7.5
published about 4 years ago

Versions: 12
Dependent Packages: 2
Dependent Repositories: 13

Rankings

Dependent repos count: 9.8%

Dependent packages count: 19.6%

Average: 23.4%

Stargazers count: 27.6%

Forks count: 36.7%

Last synced: 6 months ago

anaconda.org: visions

Visions provides an extensible suite of tools to support common data analysis operations including type inference on unknown data, casting data types and automated data summarization.

Homepage: https://github.com/dylan-profiler/visions
License: BSD-4-Clause
Latest release: 0.8.1
published 11 months ago

Versions: 8
Dependent Packages: 2
Dependent Repositories: 13

Rankings

Dependent packages count: 20.4%

Average: 35.8%

Dependent repos count: 36.0%

Stargazers count: 39.7%

Forks count: 47.1%

Last synced: 6 months ago

Dependencies

requirements.txt pypi

attrs >=19.3.0
multimethod >=1.4
networkx >=2.4
numpy *
pandas >=0.25.3
tangled_up_in_unicode >=0.0.4

requirements_dev.txt pypi

IPython * development
Sphinx-copybutton * development
black >=20.8b1 development
isort >=5.0.9 development
mypy >=0.770 development
nbsphinx * development
recommonmark >=0.6.0 development
setuptools >=46.1.3 development
sphinx-autodoc-typehints >=1.10.3 development
sphinx_rtd_theme >=0.4.3 development
wheel >=0.34.2 development

requirements_test.txt pypi

Pillow * test
big_o >=0.10.1 test
black >=19.10b0 test
check-manifest >=0.41 test
imagehash * test
isort >=5.0.9 test
matplotlib * test
mypy >=0.800 test
pandas * test
pre-commit * test
pyarrow >=1.0.1 test
pydot * test
pyspark * test
pytest >=5.2.0 test
pytest-spark >=0.6.0 test
shapely * test
twine >=3.1.1 test

.github/workflows/ci.yml actions

actions/cache v1 composite
actions/checkout v2 composite
actions/setup-python v1 composite
ad-m/github-push-action master composite

.github/workflows/pypi.yml actions

actions/cache v1 composite
actions/checkout v2 composite
actions/setup-python v1 composite
pypa/gh-action-pypi-publish master composite

.github/workflows/tests.yml actions

actions/checkout v2 composite
actions/setup-python v1 composite

requirements_spark.txt pypi

setup.py pypi

Visions

Science Score: 93.0%

Keywords

Keywords from Contributors

Scientific Fields

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

The Semantic Data Library

Installation

Pip

Quick Start Guide

StandardSet is the basic builtin typeset

Detection looks like this

While inference looks like this

Inference works well even if we monkey with the data, say by converting everything to strings

Supported frameworks

Contributing and support

Acknowledgements

Owner

JOSS Publication

Visions: An Open-Source Library for Semantic Data

Authors

Editor

Tags

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: visions

Rankings

Maintainers (2)

conda-forge.org: visions

Rankings

anaconda.org: visions

Rankings

Dependencies