https://github.com/sematic-ai/sematic

An open-source ML pipeline development platform

https://github.com/sematic-ai/sematic

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.7%) to scientific vocabulary

Keywords

ai data-science machine-learning ml ml-ops ml-pipeline ml-pipelines mlops pipeline python python3
Last synced: 5 months ago · JSON representation

Repository

An open-source ML pipeline development platform

Basic Info
  • Host: GitHub
  • Owner: sematic-ai
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 20.2 MB
Statistics
  • Stars: 991
  • Watchers: 12
  • Forks: 63
  • Open Issues: 131
  • Releases: 61
Topics
ai data-science machine-learning ml ml-ops ml-pipeline ml-pipelines mlops pipeline python python3
Created almost 4 years ago · Last pushed about 1 year ago
Metadata Files
Readme License Code of conduct

README.md

Sematic Logo

The open-source Continuous Machine Learning Platform

Build ML pipelines with only Python, run on your laptop, or in the cloud.

PyPI CircleCI PyPI - License Python 3.9 Python 3.10 Python 3.11 Python 3.12 Python 3.13 Discord Made By Sematic PyPI - Downloads

Sematic Screenshot

Sematic is an open-source ML development platform. It lets ML Engineers and Data Scientists write arbitrarily complex end-to-end pipelines with simple Python and execute them on their local machine, in a cloud VM, or on a Kubernetes cluster to leverage cloud resources.

Sematic is based on learnings gathered at top self-driving car companies. It enables chaining data processing jobs (e.g. Apache Spark) with model training (e.g. PyTorch, Tensorflow), or any other arbitrary Python business logic into type-safe, traceable, reproducible end-to-end pipelines that can be monitored and visualized in a modern web dashboard.

Read our documentation and join our Discord channel.

Why Sematic

  • Easy onboarding – no deployment or infrastructure needed to get started, simply install Sematic locally and start exploring.
  • Local-to-cloud parity – run the same code on your local laptop and on your Kubernetes cluster.
  • End-to-end traceability – all pipeline artifacts are persisted, tracked, and visualizable in a web dashboard.
  • Access heterogeneous compute – customize required resources for each pipeline step to optimize your performance and cloud footprint (CPUs, memory, GPUs, Spark cluster, etc.)
  • Reproducibility – rerun your pipelines from the UI with guaranteed reproducibility of results

Getting Started

To get started locally, simply install Sematic in your Python environment:

shell $ pip install sematic

Start the local web dashboard:

shell $ sematic start

Run an example pipeline:

shell $ sematic run examples/mnist/pytorch

Create a new boilerplate project:

shell $ sematic new my_new_project

Or from an existing example:

shell $ sematic new my_new_project --from examples/mnist/pytorch

Then run it with:

shell $ python3 -m my_new_project

To deploy Sematic to Kubernetes and leverage cloud resources, see our documentation.

Features

  • Lightweight Python SDK – define arbitrarily complex end-to-end pipelines
  • Pipeline nesting – arbitrarily nest pipelines into larger pipelines
  • Dynamic graphs – Python-defined graphs allow for iterations, conditional branching, etc.
  • Lineage tracking – all inputs and outputs of all steps are persisted and tracked
  • Runtime type-checking – fail early with run-time type checking
  • Web dashboard – Monitor, track, and visualize pipelines in a modern web UI
  • Artifact visualization – visualize all inputs and outputs of all steps in the web dashboard
  • Local execution – run pipelines on your local machine without any deployment necessary
  • Cloud orchestration – run pipelines on Kubernetes to access GPUs and other cloud resources
  • Heterogeneous compute resources – run different steps on different machines (e.g. CPUs, memory, GPU, Spark, etc.)
  • Helm chart deployment – install Sematic on your Kubernetes cluster
  • Pipeline reruns – rerun pipelines from the UI from an arbitrary point in the graph
  • Step caching – cache expensive pipeline steps for faster iteration
  • Step retry – recover from transient failures with step retries
  • Metadata and collaboration – Tags, source code visualization, docstrings, notes, etc.
  • Numerous integrations – See below

Integrations

  • Apache Spark – on-demand in-cluster Spark cluster
  • Ray – on-demand Ray in-cluster Ray resources
  • Snowflake – easily query your data warehouse (other warehouses supported too)
  • Plotly, Matplotlib – visualize plot artifacts in the web dashboard
  • Pandas – visualize dataframe artifacts in the dashboard
  • Grafana – embed Grafana panels in the web dashboard
  • Bazel – integrate with your Bazel build system
  • Helm chart – deploy to Kubernetes with our Helm chart
  • Git – track git information in the web dashboard

Community and resources

Learn more about Sematic and get in touch with the following resources:

Contribute!

To contribute to Sematic, check out open issues tagged "good first issue", and get in touch with us on Discord. You can find instructions on how to get your development environment set up in our developer docs. If you'd like to add an example, you may also find this guide helpful.

scarf pixel

Owner

  • Name: Sematic
  • Login: sematic-ai
  • Kind: organization
  • Location: United States of America

Prototype-to-production ML in days not weeks.

GitHub Events

Total
  • Create event: 25
  • Issues event: 10
  • Release event: 1
  • Watch event: 36
  • Delete event: 27
  • Issue comment event: 5
  • Push event: 58
  • Pull request review comment event: 14
  • Pull request event: 26
  • Pull request review event: 27
  • Fork event: 3
Last Year
  • Create event: 25
  • Issues event: 10
  • Release event: 1
  • Watch event: 36
  • Delete event: 27
  • Issue comment event: 5
  • Push event: 58
  • Pull request review comment event: 14
  • Pull request event: 26
  • Pull request review event: 27
  • Fork event: 3

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 1,049
  • Total Committers: 21
  • Avg Commits per committer: 49.952
  • Development Distribution Score (DDS): 0.625
Past Year
  • Commits: 21
  • Committers: 3
  • Avg Commits per committer: 7.0
  • Development Distribution Score (DDS): 0.143
Top Committers
Name Email Commits
Emmanuel Turlay e****l@s****i 393
augray a****y 267
tscurtu t****r@s****v 172
Chance An a****i@g****m 90
chance-sematic 1****c 59
Sash Nagarkar s****5@g****m 30
Jai Chopra j****a@g****m 16
Kamalesh Palanisamy k****0@g****m 5
Kaushil Kundalia 3****4 4
Vinay Varma v****9@g****m 2
Emmanuel Turlay e****y@e****n 1
Aaron Roney t****x@g****m 1
Anurag Kanungo 4****o 1
Erik Kandalík 3****k 1
Matteo Destro m****t@g****m 1
idow09 i****9@g****m 1
jmalicki j****i@g****m 1
v-pwais 1****s 1
Brian Calvert b****n@g****m 1
KatkaG k****a@g****m 1
Siddharth Gupta s****s@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 100
  • Total pull requests: 129
  • Average time to close issues: 7 months
  • Average time to close pull requests: 16 days
  • Total issue authors: 12
  • Total pull request authors: 14
  • Average comments per issue: 0.54
  • Average comments per pull request: 0.26
  • Merged pull requests: 117
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 2
  • Pull requests: 19
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 3 days
  • Issue authors: 1
  • Pull request authors: 3
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 17
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • augray (65)
  • tscurtu (14)
  • neutralino1 (6)
  • nvinayvarma189 (2)
  • eafpres (2)
  • snoshy (2)
  • labeldevops (2)
  • allenwang-git (1)
  • aapope (1)
  • jaichopra (1)
  • kenziehong (1)
  • chance-sematic (1)
Pull Request Authors
  • augray (64)
  • neutralino1 (59)
  • jaichopra (6)
  • chance-sematic (5)
  • pwais (4)
  • snoshy (3)
  • tscurtu (3)
  • ZPerling (2)
  • v-pwais (2)
  • ayush9096 (2)
  • bcalvert-graft (2)
  • swastiksadyal (2)
  • nvinayvarma189 (2)
Top Labels
Issue Labels
enhancement (40) bug (27) ui (27) usability (22) observability (18) infrastructure (15) good first issue (14) tech debt (13) types (12) housekeeping (5) reliability (4) ci (4) db (4) wip (4) compatibility (3) documentation (3) cli (3) question (3) testing (3) examples (2) build (2) security (2) scaling (1)
Pull Request Labels
enhancement (36) ui (12) documentation (8) wip (6) bug (6) do-not-merge (4) examples (4) cli (1) usability (1) reliability (1) tech debt (1) infrastructure (1) housekeeping (1) release (1)

Packages

  • Total packages: 2
  • Total downloads:
    • pypi 157 last-month
  • Total docker downloads: 9,044
  • Total dependent packages: 0
    (may contain duplicates)
  • Total dependent repositories: 0
    (may contain duplicates)
  • Total versions: 139
  • Total maintainers: 5
proxy.golang.org: github.com/sematic-ai/sematic
  • Versions: 62
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 5.4%
Average: 5.6%
Dependent repos count: 5.8%
Last synced: 6 months ago
pypi.org: sematic

Sematic ML orchestration tool

  • Versions: 77
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 157 Last month
  • Docker Downloads: 9,044
Rankings
Docker downloads count: 2.2%
Downloads: 7.0%
Dependent packages count: 7.3%
Average: 14.3%
Dependent repos count: 40.8%
Last synced: 6 months ago

Dependencies

sematic/ui/package-lock.json npm
  • 1344 dependencies
sematic/ui/package.json npm
  • @types/dagre ^0.7.47 development
  • @types/plotly.js ^1.54.22 development
  • @types/react-copy-to-clipboard ^5.0.2 development
  • @types/react-plotly.js ^2.5.0 development
  • @types/react-syntax-highlighter ^15.5.1 development
  • @emotion/react ^11.9.0
  • @emotion/styled ^11.8.1
  • @fontsource/roboto ^4.5.5
  • @glideapps/glide-data-grid ^4.1.0
  • @mui/icons-material ^5.6.2
  • @mui/lab ^5.0.0-alpha.82
  • @mui/material ^5.7.0
  • @testing-library/jest-dom ^5.16.4
  • @testing-library/react ^13.2.0
  • @testing-library/user-event ^13.5.0
  • @types/jest ^27.5.0
  • @types/node ^16.11.34
  • @types/react ^18.0.9
  • @types/react-dom ^18.0.3
  • dagre ^0.8.5
  • javascript-time-ago ^2.3.13
  • plotly.js-cartesian-dist ^2.12.1
  • react ^18.1.0
  • react-copy-to-clipboard ^5.1.0
  • react-dom ^18.1.0
  • react-error-boundary ^3.1.4
  • react-flow-renderer ^10.3.1
  • react-icons ^4.4.0
  • react-markdown ^8.0.3
  • react-medium-image-zoom ^4.4.3
  • react-plotly.js ^2.5.1
  • react-router-dom ^6.3.0
  • react-scripts 5.0.1
  • react-syntax-highlighter ^15.5.0
  • react-time-ago ^7.1.9
  • socket.io-client ^4.5.1
  • source-map-explorer ^2.5.2
  • typescript ^4.6.4
  • web-vitals ^2.1.4
requirements/ci-requirements.txt pypi
  • boto3-stubs *
  • data-science-types *
  • docutils ==0.18.1
  • flake8 *
  • flask *
  • kubernetes-stubs *
  • m2r *
  • mistune ==0.8.4
  • mypy >=0.950
  • pandas-stubs *
  • pip-tools *
  • pytest *
  • snowflake-connector-python *
  • sqlalchemy *
  • types-PyYAML *
  • types-psycopg2 *
  • types-python-dateutil *
  • types-requests *
requirements/docs-requirements.txt pypi
  • myst-parser *
  • sphinx *
  • sphinx-press-theme *
requirements/requirements.in pypi
  • SQLAlchemy ==1.4.36
  • boto3 *
  • click *
  • cloudpickle *
  • flask *
  • flask-cors *
  • flask-socketio *
  • gunicorn *
  • ipython ==8.2.0
  • kubernetes *
  • matplotlib *
  • numpy *
  • pandas *
  • pandas-stubs *
  • plotly *
  • psycopg2-binary *
  • pyarrow *
  • pytest ==7.1.1
  • python-dateutil *
  • pyyaml *
  • requests *
  • scikit-learn *
  • seaborn *
  • setuptools ==58.1.0
  • snowflake-connector-python *
  • statsmodels *
  • testing-postgresql *
  • torch *
  • torchmetrics *
  • torchvision *
  • werkzeug *
  • xgboost *
requirements/requirements.txt pypi
  • 105 dependencies
sematic/examples/liver_cirrhosis/requirements.txt pypi
  • matplotlib *
  • numpy *
  • pandas *
  • seaborn *
  • sklearn *
  • statsmodels *
  • xgboost *
sematic/examples/mnist/pytorch/requirements.txt pypi
  • pandas *
  • plotly *
  • sklearn *
  • torch *
  • torchmetrics *
  • torchvision *
sematic/types/types/snowflake/requirements.txt pypi
  • pandas *
  • pyarrow *
  • snowflake-connector-python *