https://github.com/sematic-ai/sematic
An open-source ML pipeline development platform
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.7%) to scientific vocabulary
Keywords
Repository
An open-source ML pipeline development platform
Basic Info
Statistics
- Stars: 991
- Watchers: 12
- Forks: 63
- Open Issues: 131
- Releases: 61
Topics
Metadata Files
README.md

The open-source Continuous Machine Learning Platform
Build ML pipelines with only Python, run on your laptop, or in the cloud.

Sematic is an open-source ML development platform. It lets ML Engineers and Data Scientists write arbitrarily complex end-to-end pipelines with simple Python and execute them on their local machine, in a cloud VM, or on a Kubernetes cluster to leverage cloud resources.
Sematic is based on learnings gathered at top self-driving car companies. It enables chaining data processing jobs (e.g. Apache Spark) with model training (e.g. PyTorch, Tensorflow), or any other arbitrary Python business logic into type-safe, traceable, reproducible end-to-end pipelines that can be monitored and visualized in a modern web dashboard.
Read our documentation and join our Discord channel.
Why Sematic
- Easy onboarding – no deployment or infrastructure needed to get started, simply install Sematic locally and start exploring.
- Local-to-cloud parity – run the same code on your local laptop and on your Kubernetes cluster.
- End-to-end traceability – all pipeline artifacts are persisted, tracked, and visualizable in a web dashboard.
- Access heterogeneous compute – customize required resources for each pipeline step to optimize your performance and cloud footprint (CPUs, memory, GPUs, Spark cluster, etc.)
- Reproducibility – rerun your pipelines from the UI with guaranteed reproducibility of results
Getting Started
To get started locally, simply install Sematic in your Python environment:
shell
$ pip install sematic
Start the local web dashboard:
shell
$ sematic start
Run an example pipeline:
shell
$ sematic run examples/mnist/pytorch
Create a new boilerplate project:
shell
$ sematic new my_new_project
Or from an existing example:
shell
$ sematic new my_new_project --from examples/mnist/pytorch
Then run it with:
shell
$ python3 -m my_new_project
To deploy Sematic to Kubernetes and leverage cloud resources, see our documentation.
Features
- Lightweight Python SDK – define arbitrarily complex end-to-end pipelines
- Pipeline nesting – arbitrarily nest pipelines into larger pipelines
- Dynamic graphs – Python-defined graphs allow for iterations, conditional branching, etc.
- Lineage tracking – all inputs and outputs of all steps are persisted and tracked
- Runtime type-checking – fail early with run-time type checking
- Web dashboard – Monitor, track, and visualize pipelines in a modern web UI
- Artifact visualization – visualize all inputs and outputs of all steps in the web dashboard
- Local execution – run pipelines on your local machine without any deployment necessary
- Cloud orchestration – run pipelines on Kubernetes to access GPUs and other cloud resources
- Heterogeneous compute resources – run different steps on different machines (e.g. CPUs, memory, GPU, Spark, etc.)
- Helm chart deployment – install Sematic on your Kubernetes cluster
- Pipeline reruns – rerun pipelines from the UI from an arbitrary point in the graph
- Step caching – cache expensive pipeline steps for faster iteration
- Step retry – recover from transient failures with step retries
- Metadata and collaboration – Tags, source code visualization, docstrings, notes, etc.
- Numerous integrations – See below
Integrations
- Apache Spark – on-demand in-cluster Spark cluster
- Ray – on-demand Ray in-cluster Ray resources
- Snowflake – easily query your data warehouse (other warehouses supported too)
- Plotly, Matplotlib – visualize plot artifacts in the web dashboard
- Pandas – visualize dataframe artifacts in the dashboard
- Grafana – embed Grafana panels in the web dashboard
- Bazel – integrate with your Bazel build system
- Helm chart – deploy to Kubernetes with our Helm chart
- Git – track git information in the web dashboard
Community and resources
Learn more about Sematic and get in touch with the following resources:
Contribute!
To contribute to Sematic, check out open issues tagged "good first issue", and get in touch with us on Discord. You can find instructions on how to get your development environment set up in our developer docs. If you'd like to add an example, you may also find this guide helpful.

Owner
- Name: Sematic
- Login: sematic-ai
- Kind: organization
- Location: United States of America
- Website: https://sematic.dev
- Twitter: SematicAI
- Repositories: 5
- Profile: https://github.com/sematic-ai
Prototype-to-production ML in days not weeks.
GitHub Events
Total
- Create event: 25
- Issues event: 10
- Release event: 1
- Watch event: 36
- Delete event: 27
- Issue comment event: 5
- Push event: 58
- Pull request review comment event: 14
- Pull request event: 26
- Pull request review event: 27
- Fork event: 3
Last Year
- Create event: 25
- Issues event: 10
- Release event: 1
- Watch event: 36
- Delete event: 27
- Issue comment event: 5
- Push event: 58
- Pull request review comment event: 14
- Pull request event: 26
- Pull request review event: 27
- Fork event: 3
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Emmanuel Turlay | e****l@s****i | 393 |
| augray | a****y | 267 |
| tscurtu | t****r@s****v | 172 |
| Chance An | a****i@g****m | 90 |
| chance-sematic | 1****c | 59 |
| Sash Nagarkar | s****5@g****m | 30 |
| Jai Chopra | j****a@g****m | 16 |
| Kamalesh Palanisamy | k****0@g****m | 5 |
| Kaushil Kundalia | 3****4 | 4 |
| Vinay Varma | v****9@g****m | 2 |
| Emmanuel Turlay | e****y@e****n | 1 |
| Aaron Roney | t****x@g****m | 1 |
| Anurag Kanungo | 4****o | 1 |
| Erik Kandalík | 3****k | 1 |
| Matteo Destro | m****t@g****m | 1 |
| idow09 | i****9@g****m | 1 |
| jmalicki | j****i@g****m | 1 |
| v-pwais | 1****s | 1 |
| Brian Calvert | b****n@g****m | 1 |
| KatkaG | k****a@g****m | 1 |
| Siddharth Gupta | s****s@g****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 100
- Total pull requests: 129
- Average time to close issues: 7 months
- Average time to close pull requests: 16 days
- Total issue authors: 12
- Total pull request authors: 14
- Average comments per issue: 0.54
- Average comments per pull request: 0.26
- Merged pull requests: 117
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 2
- Pull requests: 19
- Average time to close issues: about 1 month
- Average time to close pull requests: 3 days
- Issue authors: 1
- Pull request authors: 3
- Average comments per issue: 1.0
- Average comments per pull request: 0.0
- Merged pull requests: 17
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- augray (65)
- tscurtu (14)
- neutralino1 (6)
- nvinayvarma189 (2)
- eafpres (2)
- snoshy (2)
- labeldevops (2)
- allenwang-git (1)
- aapope (1)
- jaichopra (1)
- kenziehong (1)
- chance-sematic (1)
Pull Request Authors
- augray (64)
- neutralino1 (59)
- jaichopra (6)
- chance-sematic (5)
- pwais (4)
- snoshy (3)
- tscurtu (3)
- ZPerling (2)
- v-pwais (2)
- ayush9096 (2)
- bcalvert-graft (2)
- swastiksadyal (2)
- nvinayvarma189 (2)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 2
-
Total downloads:
- pypi 157 last-month
- Total docker downloads: 9,044
-
Total dependent packages: 0
(may contain duplicates) -
Total dependent repositories: 0
(may contain duplicates) - Total versions: 139
- Total maintainers: 5
proxy.golang.org: github.com/sematic-ai/sematic
- Documentation: https://pkg.go.dev/github.com/sematic-ai/sematic#section-documentation
- License: apache-2.0
-
Latest release: v0.41.0
published about 1 year ago
Rankings
pypi.org: sematic
Sematic ML orchestration tool
- Documentation: https://sematic.readthedocs.io/
- License: apache-2.0
-
Latest release: 0.41.0
published about 1 year ago
Rankings
Maintainers (5)
Dependencies
- 1344 dependencies
- @types/dagre ^0.7.47 development
- @types/plotly.js ^1.54.22 development
- @types/react-copy-to-clipboard ^5.0.2 development
- @types/react-plotly.js ^2.5.0 development
- @types/react-syntax-highlighter ^15.5.1 development
- @emotion/react ^11.9.0
- @emotion/styled ^11.8.1
- @fontsource/roboto ^4.5.5
- @glideapps/glide-data-grid ^4.1.0
- @mui/icons-material ^5.6.2
- @mui/lab ^5.0.0-alpha.82
- @mui/material ^5.7.0
- @testing-library/jest-dom ^5.16.4
- @testing-library/react ^13.2.0
- @testing-library/user-event ^13.5.0
- @types/jest ^27.5.0
- @types/node ^16.11.34
- @types/react ^18.0.9
- @types/react-dom ^18.0.3
- dagre ^0.8.5
- javascript-time-ago ^2.3.13
- plotly.js-cartesian-dist ^2.12.1
- react ^18.1.0
- react-copy-to-clipboard ^5.1.0
- react-dom ^18.1.0
- react-error-boundary ^3.1.4
- react-flow-renderer ^10.3.1
- react-icons ^4.4.0
- react-markdown ^8.0.3
- react-medium-image-zoom ^4.4.3
- react-plotly.js ^2.5.1
- react-router-dom ^6.3.0
- react-scripts 5.0.1
- react-syntax-highlighter ^15.5.0
- react-time-ago ^7.1.9
- socket.io-client ^4.5.1
- source-map-explorer ^2.5.2
- typescript ^4.6.4
- web-vitals ^2.1.4
- boto3-stubs *
- data-science-types *
- docutils ==0.18.1
- flake8 *
- flask *
- kubernetes-stubs *
- m2r *
- mistune ==0.8.4
- mypy >=0.950
- pandas-stubs *
- pip-tools *
- pytest *
- snowflake-connector-python *
- sqlalchemy *
- types-PyYAML *
- types-psycopg2 *
- types-python-dateutil *
- types-requests *
- myst-parser *
- sphinx *
- sphinx-press-theme *
- SQLAlchemy ==1.4.36
- boto3 *
- click *
- cloudpickle *
- flask *
- flask-cors *
- flask-socketio *
- gunicorn *
- ipython ==8.2.0
- kubernetes *
- matplotlib *
- numpy *
- pandas *
- pandas-stubs *
- plotly *
- psycopg2-binary *
- pyarrow *
- pytest ==7.1.1
- python-dateutil *
- pyyaml *
- requests *
- scikit-learn *
- seaborn *
- setuptools ==58.1.0
- snowflake-connector-python *
- statsmodels *
- testing-postgresql *
- torch *
- torchmetrics *
- torchvision *
- werkzeug *
- xgboost *
- 105 dependencies
- matplotlib *
- numpy *
- pandas *
- seaborn *
- sklearn *
- statsmodels *
- xgboost *
- pandas *
- plotly *
- sklearn *
- torch *
- torchmetrics *
- torchvision *
- pandas *
- pyarrow *
- snowflake-connector-python *