https://github.com/ploomber/soorgeon
Convert monolithic Jupyter notebooks 📙 into maintainable Ploomber pipelines. 📊
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
â—‹CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
â—‹.zenodo.json file
-
â—‹DOI references
-
â—‹Academic publication links
-
â—‹Committers with academic emails
-
â—‹Institutional organization owner
-
â—‹JOSS paper metadata
-
â—‹Scientific vocabulary similarity
Low similarity (11.7%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
Convert monolithic Jupyter notebooks 📙 into maintainable Ploomber pipelines. 📊
Basic Info
- Host: GitHub
- Owner: ploomber
- License: apache-2.0
- Language: Python
- Default Branch: main
- Homepage: https://ploomber.io
- Size: 517 KB
Statistics
- Stars: 79
- Watchers: 8
- Forks: 20
- Open Issues: 15
- Releases: 0
Topics
Metadata Files
README.md
Soorgeon
[!TIP] Deploy AI apps for free on Ploomber Cloud!
Join our community | Newsletter | Contact us | Blog | Website | YouTube

Convert monolithic Jupyter notebooks into Ploomber pipelines.
https://user-images.githubusercontent.com/989250/150660392-559eca67-b630-4ef2-b660-4f5ddb5a8d65.mp4
Note: Soorgeon is in alpha, help us make it better.
Install
Compatible with Python 3.7 and higher.
sh
pip install soorgeon
Usage
[Optional] Testing if the notebook runs
Before refactoring, you can optionally test if the original notebook or script runs without exceptions:
```sh
works with ipynb files
soorgeon test path/to/notebook.ipynb
and notebooks in percent format
soorgeon test path/to/notebook.py ```
Optionally, set the path to the output notebook:
```sh soorgeon test path/to/notebook.ipynb path/to/output.ipynb
soorgeon test path/to/notebook.py path/to/output.ipynb ```
Refactoring
To refactor your notebook:
```sh
refactor notebook
soorgeon refactor nb.ipynb
all variables with the df prefix are stored in csv files
soorgeon refactor nb.ipynb --df-format csv
all variables with the df prefix are stored in parquet files
soorgeon refactor nb.ipynb --df-format parquet
store task output in 'some-directory' (if missing, this defaults to 'output')
soorgeon refactor nb.ipynb --product-prefix some-directory
generate tasks in .py format
soorgeon refactor nb.ipynb --file-format py
use alternative serializer (cloudpickle or dill) if notebook
contains variables that cannot be serialized using pickle
soorgeon refactor nb.ipynb --serializer cloudpickle soorgeon refactor nb.ipynb --serializer dill ```
To learn more, check out our guide.
Cleaning
Soorgeon has a clean command that applies
black <!--and isort-->for .ipynb and .py files:
soorgeon clean path/to/notebook.ipynb
or
soorgeon clean path/to/script.py
Linting
Soorgeon has a lint command that can apply [flake8]:
soorgeon lint path/to/notebook.ipynb
or
soorgeon lint path/to/script.py
Examples
sh
git clone https://github.com/ploomber/soorgeon
Exploratory data analysis notebook:
```sh cd soorgeon/examples/exploratory soorgeon refactor nb.ipynb
to run the pipeline
pip install -r requirements.txt ploomber build ```
Machine learning notebook:
```sh cd soorgeon/examples/machine-learning soorgeon refactor nb.ipynb
to run the pipeline
pip install -r requirements.txt ploomber build ```
To learn more, check out our guide.
Community
About Ploomber
Ploomber is a big community of data enthusiasts pushing the boundaries of Data Science and Machine Learning tooling.
Whatever your skillset is, you can contribute to our mission. So whether you're a beginner or an experienced professional, you're welcome to join us on this journey!
Owner
- Name: Ploomber
- Login: ploomber
- Kind: organization
- Email: contact@ploomber.io
- Website: https://ploomber.io/
- Twitter: ploomber
- Repositories: 35
- Profile: https://github.com/ploomber
We develop tools to streamline Data Science.
GitHub Events
Total
Last Year
Committers
Last synced: 11 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Eduardo Blancas Reyes | g****b@b****o | 223 |
| shuyang | 9****n@m****m | 25 |
| Xilin | 3****4 | 7 |
| e1ha | h****6@g****m | 5 |
| Ido M | m****o@g****m | 5 |
| dependabot[bot] | 4****] | 3 |
| Rod | r****h@f****m | 3 |
| Neelasha Sen | n****n@g****m | 3 |
| Daniel Blancas | e****s@g****m | 2 |
| grnnja | g****a@g****m | 1 |
| WSShawn | 5****n | 1 |
| Jose Ramirez | j****7 | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 60
- Total pull requests: 33
- Average time to close issues: 2 months
- Average time to close pull requests: 27 days
- Total issue authors: 5
- Total pull request authors: 12
- Average comments per issue: 2.48
- Average comments per pull request: 1.7
- Merged pull requests: 29
- Bot issues: 0
- Bot pull requests: 3
Past Year
- Issues: 0
- Pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: less than a minute
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- edublancas (51)
- idomic (6)
- grnnja (1)
- Createdd (1)
- Wxl19980214 (1)
Pull Request Authors
- Wxl19980214 (9)
- 94rain (7)
- rrhg (4)
- neelasha23 (4)
- dependabot[bot] (3)
- edublancas (2)
- edblancas (2)
- grnnja (1)
- idomic (1)
- WSShawn (1)
- jramirez857 (1)
- e1ha (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 1,592 last-month
- Total dependent packages: 0
- Total dependent repositories: 4
- Total versions: 20
- Total maintainers: 2
pypi.org: soorgeon
Convert monolithic Jupyter notebooks into maintainable pipelines.
- Homepage: https://github.com/ploomber/soorgeon
- Documentation: https://soorgeon.readthedocs.io/
- License: apache-2.0
-
Latest release: 0.0.20
published over 1 year ago
Rankings
Maintainers (2)
Dependencies
- Jinja2 ==3.0.3
- Keras-Preprocessing ==1.1.2
- Markdown ==3.3.6
- MarkupSafe ==2.0.1
- Pillow ==9.0.1
- PyYAML ==6.0
- Werkzeug ==2.0.3
- absl-py ==1.0.0
- astunparse ==1.6.3
- bleach ==4.1.0
- bokeh ==2.4.2
- cachetools ==5.0.0
- certifi ==2021.10.8
- charset-normalizer ==2.0.12
- colorcet ==3.0.0
- cycler ==0.11.0
- flatbuffers ==2.0
- fonttools ==4.29.1
- gast ==0.4.0
- google-auth ==2.6.0
- google-auth-oauthlib ==0.4.6
- google-pasta ==0.2.0
- grpcio ==1.43.0
- h5py ==3.6.0
- holoviews ==1.14.7
- hvplot ==0.7.3
- idna ==3.3
- importlib-metadata ==4.11.0
- joblib ==1.1.0
- keras ==2.7.0
- kiwisolver ==1.3.2
- libclang ==13.0.0
- matplotlib ==3.5.1
- numpy ==1.22.0
- oauthlib ==3.2.0
- opt-einsum ==3.3.0
- packaging ==21.3
- pandas ==1.4.1
- panel ==0.12.6
- param ==1.12.0
- patsy ==0.5.2
- plotly ==5.6.0
- protobuf ==3.19.4
- pyasn1 ==0.4.8
- pyasn1-modules ==0.2.8
- pyct ==0.4.8
- pyparsing ==3.0.7
- python-dateutil ==2.8.2
- pytz ==2021.3
- pyviz-comms ==2.1.0
- requests ==2.27.1
- requests-oauthlib ==1.3.1
- rsa ==4.8
- scikit-learn ==1.0.2
- scipy ==1.7.3
- six ==1.16.0
- statsmodels ==0.12.0
- tenacity ==8.0.1
- tensorboard ==2.8.0
- tensorboard-data-server ==0.6.1
- tensorboard-plugin-wit ==1.8.1
- tensorflow ==2.7.2
- tensorflow-estimator ==2.7.0
- tensorflow-io-gcs-filesystem ==0.24.0
- termcolor ==1.1.0
- threadpoolctl ==3.1.0
- tornado ==6.1
- tqdm ==4.62.3
- typing_extensions ==4.1.0
- urllib3 ==1.26.8
- webencodings ==0.5.1
- wrapt ==1.13.3
- yellowbrick ==1.4
- zipp ==3.7.0
- hvplot *
- plotly *
- scipy ==1.7.3
- statsmodels ==0.12
- tensorflow *
- yellowbrick *
- actions/checkout v2 composite
- actions/setup-python v2 composite