https://github.com/ggupta2005/data.understand

Repository for generating insights like value distribution, class imbalance for tabular datasets

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.3%) to scientific vocabulary

Keywords

data-science dataset jupyter-notebook pdf-generation

Last synced: 10 months ago · JSON representation

Repository

Repository for generating insights like value distribution, class imbalance for tabular datasets

Basic Info

Host: GitHub
Owner: ggupta2005
License: mit
Language: Python
Default Branch: master
Homepage:
Size: 186 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 4
Releases: 0

Topics

data-science dataset jupyter-notebook pdf-generation

Created over 3 years ago · Last pushed about 2 years ago

Metadata Files

Readme License

data-understand

PyPI data-understand versions

Motivation

As data scientists and machine learning engineers, we are often required to execute various data science tasks like loading up the dataset into a pandas dataframe, inspecting the columns/rows in the dataset, visualizing the distribution of values, finding feature correlations and determining if there are any sort of imbalances in the dataset. Often these tasks are repetitive and involve creating multiple jupyter notebooks and we have to manage these jupyter notebooks separately with different handles to the location of input dataset. How about you have one tool which could take the directory location of your dataset and generate the boring aforementioned logic for you to execute and learn the same insights about your dataset. All you need to do is to install this tool in your local python environment and then execute the tool from a command line.

Installation

You can install the package data-understand from pypi using the following command:-

pip install data-understand

Usage

Once you have installed the tool locally, you can then look at the various options of the CLI tool:-

```

data_understand -h

======================================================================================================================== usage: data_understand [-h] [-f FILE_NAME] [-t TARGET_COLUMN] [-p] [-j]

data.understand CLI

options: -h, --help show this help message and exit -f FILENAME, --filename FILENAME Directory path to CSV file -t TARGETCOLUMN, --targetcolumn TARGETCOLUMN Target column name -p, --generatepdf Generate PDF file for understanding of data -j, --generatejupyter_notebook Generate jupyter notebook file for understanding of data ```

Notebook and PDF report generation

In order to generate both PDF report and jupyter notebook you can execute the following CLI command:-

```

dataunderstand --filename adultdataset.csv --targetcolumn income --generatepdf --generatejupyter_notebook

======================================================================================================================== The parsed arguments are:- filename: adultdataset.csv targetcolumn: income generatepdf: True generatejupyternotebook: True

Time taken: 0.0 min 0.0012356000000863787 sec

Generating PDF report and jupyter notebook

Generating PDF report for the dataset in adultdataset.csv Successfully generated PDF report for the dataset in adultdataset.csv at adult_dataset.csv.pdf

Time taken: 0.0 min 7.363417799999979 sec

======================================================================================================================== Generating jupyter notebook for the dataset in adultdataset.csv Successfully generated jupyter notebook for the dataset in adultdataset.csv at adult_dataset.csv.ipynb

Time taken: 0.0 min 0.053841799999986506 sec

Successfully generated PDF report and jupyter notebook

Time taken: 0.0 min 7.485209299999951 sec

```

This would generate the jupyter notebook and PDF report in the same directory location as your dataset. You can execute the cells in the jupyter notebook to generate various insights and graphs on the fly or you can read through the PDF report to learn about various aspects of your dataset.

Repos using `data-understand` to generate notebooks and PDF reports

understanding-datasets

Owner

Name: Gaurav Gupta
Login: ggupta2005
Kind: user
Location: Seattle
Company: Microsoft Corp

Website: https://ggupta2005.wixsite.com/gaurav-gupta/
Repositories: 18
Profile: https://github.com/ggupta2005

I am a machine learning engineer at Azure Machine Learning.

GitHub Events

Total

Last Year

Issues and Pull Requests

Last synced: 11 months ago

Packages

Total packages: 1
Total downloads:
- pypi 22 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 7
Total maintainers: 1

pypi.org: data-understand

Utility package for generating insights for datasets

Homepage: https://github.com/ggupta2005/data.understand
Documentation: https://data-understand.readthedocs.io/
License: MIT License
Latest release: 0.0.6
published almost 3 years ago

Versions: 7
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 22 Last month

Rankings

Dependent packages count: 7.2%

Downloads: 19.9%

Average: 27.6%

Forks count: 30.3%

Stargazers count: 39.2%

Dependent repos count: 41.2%

Maintainers (1)

ggupta2005

Last synced: 11 months ago

Dependencies

.github/workflows/python-e2e-tests.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite

.github/workflows/python-linting.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite

.github/workflows/python-unit-tests.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite

.github/workflows/release-data-understand.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite
pypa/gh-action-pypi-publish release/v1 composite

requirements-linting.txt pypi

flake8 *
flake8-breakpoint *
flake8-bugbear *
flake8-builtins *
flake8-docstrings *
flake8-pytest-style *
isort *

requirements-test.txt pypi

PyPDF2 * test
ipykernel * test
nbclient * test
nbconvert * test
pytest * test
pytest-xdist * test
rai_test_utils >=0.3.0 test
requests * test
scikit-learn * test

requirements.txt pypi

fpdf2 *
matplotlib *
nbformat *
numpy *
pandas <2.0.0
raiutils *
seaborn *

setup.py pypi

line.strip *

.github/workflows/python-safety-check.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite

.github/workflows/python-twine-check.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite

pyproject.toml pypi

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/ggupta2005/data.understand

Science Score: 26.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

data-understand

Motivation

Installation

Usage

data_understand -h

Notebook and PDF report generation

dataunderstand --filename adultdataset.csv --targetcolumn income --generatepdf --generatejupyter_notebook

Time taken: 0.0 min 0.0012356000000863787 sec

Generating PDF report and jupyter notebook

Time taken: 0.0 min 7.363417799999979 sec

Time taken: 0.0 min 0.053841799999986506 sec

Time taken: 0.0 min 7.485209299999951 sec

Repos using `data-understand` to generate notebooks and PDF reports

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

Packages

pypi.org: data-understand

Rankings

Maintainers (1)

Dependencies

https://github.com/ggupta2005/data.understand

Science Score: 26.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

data-understand

Motivation

Installation

Usage

data_understand -h

Notebook and PDF report generation

dataunderstand --filename adultdataset.csv --targetcolumn income --generatepdf --generatejupyter_notebook

Time taken: 0.0 min 0.0012356000000863787 sec

Generating PDF report and jupyter notebook

Time taken: 0.0 min 7.363417799999979 sec

Time taken: 0.0 min 0.053841799999986506 sec

Time taken: 0.0 min 7.485209299999951 sec

Repos using data-understand to generate notebooks and PDF reports

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

Packages

pypi.org: data-understand

Rankings

Maintainers (1)

Dependencies

Repos using `data-understand` to generate notebooks and PDF reports