Visions
Visions: An Open-Source Library for Semantic Data - Published in JOSS (2020)
Science Score: 93.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README and JOSS metadata -
✓Academic publication links
Links to: joss.theoj.org -
○Committers with academic emails
-
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Keywords
Keywords from Contributors
Scientific Fields
Repository
Type System for Data Analysis in Python
Basic Info
- Host: GitHub
- Owner: dylan-profiler
- License: other
- Language: Python
- Default Branch: develop
- Homepage: https://dylan-profiler.github.io/visions/visions/getting_started/usage/types.html
- Size: 37.9 MB
Statistics
- Stars: 213
- Watchers: 6
- Forks: 19
- Open Issues: 18
- Releases: 17
Topics
Metadata Files
README.md

And these visions of data types, they kept us up past the dawn.
The Semantic Data Library
Visions provides a set of tools for defining and using semantic data types.
[x] Semantic type detection & inference on sequence data.
[x] Automated data processing
[x] Completely customizable.
Visionsmakes it easy to build and modify semantic data types for domain specific purposes[x] Out of the box support for multiple backend implementations including pandas, spark, numpy, and python
[x] A robust set of default types and typesets covering the most common use cases.
Check out the complete documentation here.
Installation
Source code is available on github and binary installers via pip.
```
Pip
pip install visions ```
Complete installation instructions (including extras) are available in the docs.
Quick Start Guide
If you want to play immediately check out the examples folder
on . Otherwise,
let's get some data
```python import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv") df.head(2) ```
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
| 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Thayer) | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
The most important abstraction in visions are Types - these represent semantic notions about data. You have access to
a range of well tested types like Integer, Float, and Files covering the most common software development use
cases.
Types can be bundled together into typesets. Behind the scenes, visions builds a traversable graph for any collection
of types.
```python from visions import types, typesets
StandardSet is the basic builtin typeset
typeset = typesets.CompleteSet() typeset.plot_graph() ```
Note: Plots require pygraphviz to be installed.
Because of the special relationship between types these graphs can be used to detect the type of your data or infer a more appropriate one.
```python
Detection looks like this
typeset.detect_type(df)
While inference looks like this
typeset.infer_type(df)
Inference works well even if we monkey with the data, say by converting everything to strings
typeset.infer_type(df.astype(str))
{ 'PassengerId': Integer, 'Survived': Integer, 'Pclass': Integer, 'Name': String, 'Sex': String, 'Age': Float, 'SibSp': Integer, 'Parch': Integer, 'Ticket': String, 'Fare': Float, 'Cabin': String, 'Embarked': String } ```
Visions solves many of the most common problems working with tabular data for example, sequences of Integers are still
recognized as integers whether they have trailing decimal 0's from being cast to float, missing values, or something
else altogether. Much of this cleaning is performed automatically providing nicely cleaned and processed data as well.
python
cleaned_df = typeset.cast_to_inferred(df)
This is only a small taste of everything visions can do including building your own domain specific types and typesets so please check out the API documentation or the examples/ directory for more info!
Supported frameworks
Thanks to its dispatch based implementation Visions is able to exploit framework specific capabilities offered by
libraries like pandas and spark. Currently it works with the following backends by default.
- Pandas (feature complete)
- Numpy (boolean, complex, date time, float, integer, string, time deltas, string, objects)
- Spark (boolean, categorical, date, date time, float, integer, numeric, object, string)
- Python (string, float, integer, date time, time delta, boolean, categorical, object, complex - other datatypes are untested)
If you're using pandas it will also take advantage of parallelization tools like swifter if available.
It also offers a simple annotation based API for registering new implementations as needed. For example, if you wished to extend the categorical data type to include a Dask specific implementation you might do something like
```python from visions.types.categorical import Categorical from pandas.api import types as pdt import dask
@Categorical.containsop.register def categoricalcontains(series: dask.dataframe.Series, state: dict) -> bool: return pdt.iscategoricaldtype(series.dtype) ```
Contributing and support
Contributions to visions are welcome. For more information, please visit the community
contributions page and join on us
on slack. The
github issues tracker is used for reporting bugs, feature
requests and support questions.
Also, please check out some of the other companies and packages using visions including:
If you're currently using visions or would like to be featured here please let us know.
Acknowledgements
This package is part of the dylan-profiler project. The package is core component of pandas-profiling. More information can be found here. This work was partially supported by SIDN Fonds.

Owner
- Name: dylan-profiler
- Login: dylan-profiler
- Kind: organization
- Repositories: 4
- Profile: https://github.com/dylan-profiler
DYLAN: Tools for effective data analysis
JOSS Publication
Visions: An Open-Source Library for Semantic Data
Tags
data types data workflows data integration machine learningGitHub Events
Total
- Create event: 7
- Release event: 2
- Issues event: 3
- Watch event: 6
- Delete event: 5
- Issue comment event: 2
- Push event: 24
- Pull request review event: 3
- Pull request review comment event: 4
- Pull request event: 21
Last Year
- Create event: 7
- Release event: 2
- Issues event: 3
- Watch event: 6
- Delete event: 5
- Issue comment event: 2
- Push event: 24
- Pull request review event: 3
- Pull request review comment event: 4
- Pull request event: 21
Committers
Last synced: 5 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| simon_graphkite | s****n@g****m | 570 |
| Ian Eaves | i****s@g****m | 254 |
| GitHub Action | a****n@g****m | 72 |
| dependabot-preview[bot] | 2****] | 2 |
| lgtm-com[bot] | 4****] | 1 |
| dependabot[bot] | 4****] | 1 |
| Gustavo Camargo | g****1 | 1 |
| Erik Cederstrand | e****k@c****k | 1 |
| Dan Houghton | d****n@g****m | 1 |
| Charles-Meldhine Madi Mnemoi | 6****6 | 1 |
| Arfon Smith | a****n | 1 |
| Aarni Koskela | a****x@i****i | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 31
- Total pull requests: 98
- Average time to close issues: about 2 months
- Average time to close pull requests: about 1 month
- Total issue authors: 18
- Total pull request authors: 10
- Average comments per issue: 1.58
- Average comments per pull request: 0.96
- Merged pull requests: 85
- Bot issues: 1
- Bot pull requests: 7
Past Year
- Issues: 1
- Pull requests: 20
- Average time to close issues: about 22 hours
- Average time to close pull requests: 12 minutes
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 2.0
- Average comments per pull request: 0.1
- Merged pull requests: 18
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- sbrugman (8)
- ieaves (6)
- majidaldo (2)
- cstabnick (1)
- dependabot-preview[bot] (1)
- seshurajup (1)
- lahwaacz (1)
- irvinktang (1)
- fkiraly (1)
- nv-rliu (1)
- PraJaL55 (1)
- sterlinm (1)
- hubutui (1)
- ttpro1995 (1)
- cmnemoi (1)
Pull Request Authors
- sbrugman (47)
- ieaves (42)
- dependabot[bot] (3)
- lgtm-com[bot] (2)
- dependabot-preview[bot] (2)
- akx (1)
- cmnemoi (1)
- ecederstrand (1)
- dah33 (1)
- gcamargo1 (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 3
-
Total downloads:
- pypi 1,072,345 last-month
- Total docker downloads: 511,361
-
Total dependent packages: 10
(may contain duplicates) -
Total dependent repositories: 748
(may contain duplicates) - Total versions: 50
- Total maintainers: 2
pypi.org: visions
Visions
- Documentation: https://dylan-profiler.github.io/visions
- License: BSD License
-
Latest release: 0.8.1
published 11 months ago
Rankings
conda-forge.org: visions
- Homepage: https://github.com/dylan-profiler/visions
- License: BSD-4-Clause
-
Latest release: 0.7.5
published about 4 years ago
Rankings
anaconda.org: visions
Visions provides an extensible suite of tools to support common data analysis operations including type inference on unknown data, casting data types and automated data summarization.
- Homepage: https://github.com/dylan-profiler/visions
- License: BSD-4-Clause
-
Latest release: 0.8.1
published 9 months ago
Rankings
Dependencies
- attrs >=19.3.0
- multimethod >=1.4
- networkx >=2.4
- numpy *
- pandas >=0.25.3
- tangled_up_in_unicode >=0.0.4
- IPython * development
- Sphinx-copybutton * development
- black >=20.8b1 development
- isort >=5.0.9 development
- mypy >=0.770 development
- nbsphinx * development
- recommonmark >=0.6.0 development
- setuptools >=46.1.3 development
- sphinx-autodoc-typehints >=1.10.3 development
- sphinx_rtd_theme >=0.4.3 development
- wheel >=0.34.2 development
- Pillow * test
- big_o >=0.10.1 test
- black >=19.10b0 test
- check-manifest >=0.41 test
- imagehash * test
- isort >=5.0.9 test
- matplotlib * test
- mypy >=0.800 test
- pandas * test
- pre-commit * test
- pyarrow >=1.0.1 test
- pydot * test
- pyspark * test
- pytest >=5.2.0 test
- pytest-spark >=0.6.0 test
- shapely * test
- twine >=3.1.1 test
- actions/cache v1 composite
- actions/checkout v2 composite
- actions/setup-python v1 composite
- ad-m/github-push-action master composite
- actions/cache v1 composite
- actions/checkout v2 composite
- actions/setup-python v1 composite
- pypa/gh-action-pypi-publish master composite
- actions/checkout v2 composite
- actions/setup-python v1 composite
