https://github.com/ploomber/soorgeon

Convert monolithic Jupyter notebooks 📙 into maintainable Ploomber pipelines. 📊

https://github.com/ploomber/soorgeon

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • â—‹
    CITATION.cff file
  • ✓
    codemeta.json file
    Found codemeta.json file
  • â—‹
    .zenodo.json file
  • â—‹
    DOI references
  • â—‹
    Academic publication links
  • â—‹
    Committers with academic emails
  • â—‹
    Institutional organization owner
  • â—‹
    JOSS paper metadata
  • â—‹
    Scientific vocabulary similarity
    Low similarity (11.7%) to scientific vocabulary

Keywords

data-engineering data-science jupyter jupyter-notebooks machine-learning mlops workflow

Keywords from Contributors

sequences projection interactive serializer measurement cycles packaging charts network-simulation archival
Last synced: 5 months ago · JSON representation

Repository

Convert monolithic Jupyter notebooks 📙 into maintainable Ploomber pipelines. 📊

Basic Info
  • Host: GitHub
  • Owner: ploomber
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Homepage: https://ploomber.io
  • Size: 517 KB
Statistics
  • Stars: 79
  • Watchers: 8
  • Forks: 20
  • Open Issues: 15
  • Releases: 0
Archived
Topics
data-engineering data-science jupyter jupyter-notebooks machine-learning mlops workflow
Created over 4 years ago · Last pushed over 1 year ago
Metadata Files
Readme Changelog Contributing License

README.md

Soorgeon

[!TIP] Deploy AI apps for free on Ploomber Cloud!

Join our community | Newsletter | Contact us | Blog | Website | YouTube

header

Convert monolithic Jupyter notebooks into Ploomber pipelines.

https://user-images.githubusercontent.com/989250/150660392-559eca67-b630-4ef2-b660-4f5ddb5a8d65.mp4

3-minute video tutorial.

Note: Soorgeon is in alpha, help us make it better.

Install

Compatible with Python 3.7 and higher.

sh pip install soorgeon

Usage

[Optional] Testing if the notebook runs

Before refactoring, you can optionally test if the original notebook or script runs without exceptions:

```sh

works with ipynb files

soorgeon test path/to/notebook.ipynb

and notebooks in percent format

soorgeon test path/to/notebook.py ```

Optionally, set the path to the output notebook:

```sh soorgeon test path/to/notebook.ipynb path/to/output.ipynb

soorgeon test path/to/notebook.py path/to/output.ipynb ```

Refactoring

To refactor your notebook:

```sh

refactor notebook

soorgeon refactor nb.ipynb

all variables with the df prefix are stored in csv files

soorgeon refactor nb.ipynb --df-format csv

all variables with the df prefix are stored in parquet files

soorgeon refactor nb.ipynb --df-format parquet

store task output in 'some-directory' (if missing, this defaults to 'output')

soorgeon refactor nb.ipynb --product-prefix some-directory

generate tasks in .py format

soorgeon refactor nb.ipynb --file-format py

use alternative serializer (cloudpickle or dill) if notebook

contains variables that cannot be serialized using pickle

soorgeon refactor nb.ipynb --serializer cloudpickle soorgeon refactor nb.ipynb --serializer dill ```

To learn more, check out our guide.

Cleaning

Soorgeon has a clean command that applies black <!--and isort-->for .ipynb and .py files:

soorgeon clean path/to/notebook.ipynb

or

soorgeon clean path/to/script.py

Linting

Soorgeon has a lint command that can apply [flake8]:

soorgeon lint path/to/notebook.ipynb

or

soorgeon lint path/to/script.py

Examples

sh git clone https://github.com/ploomber/soorgeon

Exploratory data analysis notebook:

```sh cd soorgeon/examples/exploratory soorgeon refactor nb.ipynb

to run the pipeline

pip install -r requirements.txt ploomber build ```

Machine learning notebook:

```sh cd soorgeon/examples/machine-learning soorgeon refactor nb.ipynb

to run the pipeline

pip install -r requirements.txt ploomber build ```

To learn more, check out our guide.

Community

About Ploomber

Ploomber is a big community of data enthusiasts pushing the boundaries of Data Science and Machine Learning tooling.

Whatever your skillset is, you can contribute to our mission. So whether you're a beginner or an experienced professional, you're welcome to join us on this journey!

Click here to know how you can contribute to Ploomber.

Owner

  • Name: Ploomber
  • Login: ploomber
  • Kind: organization
  • Email: contact@ploomber.io

We develop tools to streamline Data Science.

GitHub Events

Total
Last Year

Committers

Last synced: 11 months ago

All Time
  • Total Commits: 279
  • Total Committers: 12
  • Avg Commits per committer: 23.25
  • Development Distribution Score (DDS): 0.201
Past Year
  • Commits: 4
  • Committers: 1
  • Avg Commits per committer: 4.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Eduardo Blancas Reyes g****b@b****o 223
shuyang 9****n@m****m 25
Xilin 3****4 7
e1ha h****6@g****m 5
Ido M m****o@g****m 5
dependabot[bot] 4****] 3
Rod r****h@f****m 3
Neelasha Sen n****n@g****m 3
Daniel Blancas e****s@g****m 2
grnnja g****a@g****m 1
WSShawn 5****n 1
Jose Ramirez j****7 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 60
  • Total pull requests: 33
  • Average time to close issues: 2 months
  • Average time to close pull requests: 27 days
  • Total issue authors: 5
  • Total pull request authors: 12
  • Average comments per issue: 2.48
  • Average comments per pull request: 1.7
  • Merged pull requests: 29
  • Bot issues: 0
  • Bot pull requests: 3
Past Year
  • Issues: 0
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: less than a minute
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • edublancas (51)
  • idomic (6)
  • grnnja (1)
  • Createdd (1)
  • Wxl19980214 (1)
Pull Request Authors
  • Wxl19980214 (9)
  • 94rain (7)
  • rrhg (4)
  • neelasha23 (4)
  • dependabot[bot] (3)
  • edublancas (2)
  • edblancas (2)
  • grnnja (1)
  • idomic (1)
  • WSShawn (1)
  • jramirez857 (1)
  • e1ha (1)
Top Labels
Issue Labels
good first issue (6) low priority (3) med effort (2) high priority (2) enhancement (2) bug (1) low effort (1)
Pull Request Labels
dependencies (3)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 1,592 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 4
  • Total versions: 20
  • Total maintainers: 2
pypi.org: soorgeon

Convert monolithic Jupyter notebooks into maintainable pipelines.

  • Versions: 20
  • Dependent Packages: 0
  • Dependent Repositories: 4
  • Downloads: 1,592 Last month
Rankings
Dependent repos count: 7.6%
Stargazers count: 8.1%
Forks count: 8.4%
Average: 9.0%
Dependent packages count: 9.6%
Downloads: 11.4%
Maintainers (2)
Last synced: 6 months ago

Dependencies

_kaggle/requirements.lock.txt pypi
  • Jinja2 ==3.0.3
  • Keras-Preprocessing ==1.1.2
  • Markdown ==3.3.6
  • MarkupSafe ==2.0.1
  • Pillow ==9.0.1
  • PyYAML ==6.0
  • Werkzeug ==2.0.3
  • absl-py ==1.0.0
  • astunparse ==1.6.3
  • bleach ==4.1.0
  • bokeh ==2.4.2
  • cachetools ==5.0.0
  • certifi ==2021.10.8
  • charset-normalizer ==2.0.12
  • colorcet ==3.0.0
  • cycler ==0.11.0
  • flatbuffers ==2.0
  • fonttools ==4.29.1
  • gast ==0.4.0
  • google-auth ==2.6.0
  • google-auth-oauthlib ==0.4.6
  • google-pasta ==0.2.0
  • grpcio ==1.43.0
  • h5py ==3.6.0
  • holoviews ==1.14.7
  • hvplot ==0.7.3
  • idna ==3.3
  • importlib-metadata ==4.11.0
  • joblib ==1.1.0
  • keras ==2.7.0
  • kiwisolver ==1.3.2
  • libclang ==13.0.0
  • matplotlib ==3.5.1
  • numpy ==1.22.0
  • oauthlib ==3.2.0
  • opt-einsum ==3.3.0
  • packaging ==21.3
  • pandas ==1.4.1
  • panel ==0.12.6
  • param ==1.12.0
  • patsy ==0.5.2
  • plotly ==5.6.0
  • protobuf ==3.19.4
  • pyasn1 ==0.4.8
  • pyasn1-modules ==0.2.8
  • pyct ==0.4.8
  • pyparsing ==3.0.7
  • python-dateutil ==2.8.2
  • pytz ==2021.3
  • pyviz-comms ==2.1.0
  • requests ==2.27.1
  • requests-oauthlib ==1.3.1
  • rsa ==4.8
  • scikit-learn ==1.0.2
  • scipy ==1.7.3
  • six ==1.16.0
  • statsmodels ==0.12.0
  • tenacity ==8.0.1
  • tensorboard ==2.8.0
  • tensorboard-data-server ==0.6.1
  • tensorboard-plugin-wit ==1.8.1
  • tensorflow ==2.7.2
  • tensorflow-estimator ==2.7.0
  • tensorflow-io-gcs-filesystem ==0.24.0
  • termcolor ==1.1.0
  • threadpoolctl ==3.1.0
  • tornado ==6.1
  • tqdm ==4.62.3
  • typing_extensions ==4.1.0
  • urllib3 ==1.26.8
  • webencodings ==0.5.1
  • wrapt ==1.13.3
  • yellowbrick ==1.4
  • zipp ==3.7.0
_kaggle/requirements.txt pypi
  • hvplot *
  • plotly *
  • scipy ==1.7.3
  • statsmodels ==0.12
  • tensorflow *
  • yellowbrick *
.github/workflows/ci.yaml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
pyproject.toml pypi
setup.py pypi