https://github.com/featureform/featureform

The Virtual Feature Store. Turn your existing data infrastructure into a feature store.

https://github.com/featureform/featureform

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    3 of 32 committers (9.4%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.5%) to scientific vocabulary

Keywords

data-quality data-science embeddings embeddings-similarity feature-engineering feature-store hacktoberfest machine-learning ml mlops python vector-database

Keywords from Contributors

transformation diffusion distributed charts sequencing agents image-generation cryptocurrencies image2image stable-diffusion
Last synced: 6 months ago · JSON representation

Repository

The Virtual Feature Store. Turn your existing data infrastructure into a feature store.

Basic Info
  • Host: GitHub
  • Owner: featureform
  • License: mpl-2.0
  • Language: Go
  • Default Branch: main
  • Homepage: https://www.featureform.com
  • Size: 217 MB
Statistics
  • Stars: 1,941
  • Watchers: 15
  • Forks: 98
  • Open Issues: 128
  • Releases: 30
Topics
data-quality data-science embeddings embeddings-similarity feature-engineering feature-store hacktoberfest machine-learning ml mlops python vector-database
Created over 5 years ago · Last pushed 8 months ago
Metadata Files
Readme Contributing License Code of conduct

README.md

featureform

Embedding Store workflow Featureform Slack
Python supported PyPi Version Featureform Website Twitter

Website | Docs | Community forum

What is Featureform?

Featureform is a virtual feature store. It enables data scientists to define, manage, and serve their ML model's features. Featureform sits atop your existing infrastructure and orchestrates it to work like a traditional feature store. By using Featureform, a data science team can solve the following organizational problems:

  • Enhance Collaboration Featureform ensures that transformations, features, labels, and training sets are defined in a standardized form, so they can easily be shared, re-used, and understood across the team.
  • Organize Experimentation The days of untitled_128.ipynb are over. Transformations, features, and training sets can be pushed from notebooks to a centralized feature repository with metadata like name, variant, lineage, and owner.
  • Facilitate Deployment Once a feature is ready to be deployed, Featureform will orchestrate your data infrastructure to make it ready in production. Using the Featureform API, you won't have to worry about the idiosyncrasies of your heterogeneous infrastructure (beyond their transformation language).
  • Increase Reliability Featureform enforces that all features, labels, and training sets are immutable. This allows them to safely be re-used among data scientists without worrying about logic changing. Furthermore, Featureform's orchestrator will handle retry logic and attempt to resolve other common distributed system problems automatically.
  • Preserve Compliance With built-in role-based access control, audit logs, and dynamic serving rules, your compliance logic can be enforced directly by Featureform.

Further Reading



A virtual feature store's architecture



Why is Featureform unique?

Use your existing data infrastructure. Featureform does not replace your existing infrastructure. Rather, Featureform transforms your existing infrastructure into a feature store. In being infrastructure-agnostic, teams can pick the right data infrastructure to solve their processing problems, while Featureform provides a feature store abstraction above it. Featureform orchestrates and manages transformations rather than actually computing them. The computations are offloaded to the organization's existing data infrastructure. In this way, Featureform is more akin to a framework and workflow, than an additional piece of data infrastructure.

Designed for both single data scientists and large enterprise teams Whether you're a single data scientist or a part of a large enterprise organization, Featureform allows you to document and push your transformations, features, and training sets definitions to a centralized repository. It works everywhere from a laptop to a large heterogeneous cloud deployment. * A single data scientist working locally: The days of untitled128.ipynb, dffinalfinal7, and hundreds of undocumented versions of datasets. A data scientist working in a notebook can push transformation, feature, and training set definitions to a centralized, local repository. * A single data scientist with a production deployment: Register your PySpark transformations and let Featureform orchestrate your data infrastructure from Spark to Redis, and monitor both the infrastructure and the data. * A data science team: Share, re-use, and learn from each other's transformations, features, and training sets. Featureform standardizes how machine learning resources are defined and provides an interface for search and discovery. It also maintains a history of changes, allows for different variants of features, and enforces immutability to resolve the most common cases of failure when sharing resources. * A data science organization: An enterprise will have a variety of different rules around access control of their data and features. The rules may be based on the data scientist’s role, the model’s category, or dynamically based on a user’s input data (i.e. they are in Europe and subject to GDPR). All of these rules can be specified, and Featureform will enforce them. Data scientists can be sure to comply with the organization’s governance rules without modifying their workflow.

Native embeddings support Featureform was built from the ground up with embeddings in mind. It supports vector databases as both inference and training stores. Transformer models can be used as transformations, so that embedding tables can be versioned and reliably regenerated. We even created and open-sourced a popular vector database, Emeddinghub.

Open-source Featureform is free to use under the Mozilla Public License 2.0.


The Featureform Abstraction



The components of a feature



In reality, the feature’s definition is split across different pieces of infrastructure: the data source, the transformations, the inference store, the training store, and all their underlying data infrastructure. However, a data scientist will think of a feature in its logical form, something like: “a user’s average purchase price”. Featureform allows data scientists to define features in their logical form through transformations, providers, labels, and training set resources. Featureform will then orchestrate the actual underlying components to achieve the data scientists' desired state.

How to use Featureform

Featureform can be run locally on files or in Kubernetes with your existing infrastructure.

Kubernetes

Featureform on Kubernetes can be used to connect to your existing cloud infrastructure and can also be run locally on Minikube.

To check out how to run it in the cloud, follow our Kubernetes deployment.

To try Featureform in a single docker container, follow our docker quickstart guide



Contributing

  • To contribute to Featureform, please check out Contribution docs.
  • Welcome to our community, join us on Slack.


Report Issues

Please help us by reporting any issues you may have while using Featureform.


License

Owner

  • Name: Featureform
  • Login: featureform
  • Kind: organization
  • Location: United States of America

We turn features into first-class component of the ML process.

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 3,753
  • Total Committers: 32
  • Avg Commits per committer: 117.281
  • Development Distribution Score (DDS): 0.699
Past Year
  • Commits: 122
  • Committers: 9
  • Avg Commits per committer: 13.556
  • Development Distribution Score (DDS): 0.803
Top Committers
Name Email Commits
sdreyer s****g@f****m 1,130
Sam Inloes s****m@f****m 780
Simba Khadder s****a@f****m 354
Steffi Tan s****w@b****u 319
Riddhi Bagadiaa r****b@b****u 234
ahmadnazeri 4****i 166
Ali Olfat a****i@f****m 118
Erik Eppel e****k@f****m 97
Ksshiraja Bagadiaa 9****a 97
anthonylasso 3****o 90
GitHub Actions Bot 73
Simba Khadder s****a@f****o 69
dependabot[bot] 4****] 56
shabbyjoon 3****n 46
ff-kamal k****l@f****m 24
Mikiko Bazeley M****l 21
saadhvi umesh s****i@s****l 21
saadhvi umesh s****7@b****u 20
RedLeader16 3****6 16
Zubeen z****y@g****m 4
GitHub Actions Bot u****n 3
joshcolts18 1****8 2
jerempy 9****y 2
imanthorpe i****e 2
Pakhi 3****1 2
josephrocca 1****a 1
Samuel Lampa s****a@r****m 1
Rushabh Patel 5****1 1
Jason j****e@y****m 1
Eric O. Korman e****n@g****m 1
and 2 more...
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 122
  • Total pull requests: 758
  • Average time to close issues: 6 months
  • Average time to close pull requests: 16 days
  • Total issue authors: 44
  • Total pull request authors: 32
  • Average comments per issue: 0.71
  • Average comments per pull request: 0.98
  • Merged pull requests: 523
  • Bot issues: 1
  • Bot pull requests: 60
Past Year
  • Issues: 10
  • Pull requests: 230
  • Average time to close issues: 6 days
  • Average time to close pull requests: 9 days
  • Issue authors: 8
  • Pull request authors: 8
  • Average comments per issue: 0.1
  • Average comments per pull request: 1.7
  • Merged pull requests: 165
  • Bot issues: 1
  • Bot pull requests: 7
Top Authors
Issue Authors
  • sdreyer (18)
  • RedLeader16 (10)
  • aolfat (10)
  • shabbyjoon (8)
  • ahmadnazeri (8)
  • epps (7)
  • Sami1309 (6)
  • anthonylasso (6)
  • Anntey (5)
  • TokuiNico (3)
  • samuell (2)
  • SanRehmo (1)
  • fumoboy007 (1)
  • ajordan-apixio (1)
  • nikolay-pavlnk (1)
Pull Request Authors
  • epps (182)
  • aolfat (167)
  • anthonylasso (120)
  • sdreyer (109)
  • ahmadnazeri (95)
  • dependabot[bot] (94)
  • RiddhiBagadiaa (64)
  • ff-kamal (59)
  • simba-git (40)
  • RedLeader16 (7)
  • gingerwizard (6)
  • pushkarmoi (6)
  • ihkap11 (4)
  • nfx (2)
  • Atry (2)
Top Labels
Issue Labels
bug (53) Hacktoberfest (9) enhancement (4) good first issue (4) documentation (1)
Pull Request Labels
dependencies (94) python (33) javascript (31) go (29) invalid (3) Hacktoberfest (3) bug (3) autocreated (2) documentation (1) github_actions (1)

Packages

  • Total packages: 2
  • Total downloads:
    • pypi 287 last-month
  • Total dependent packages: 0
    (may contain duplicates)
  • Total dependent repositories: 1
    (may contain duplicates)
  • Total versions: 196
  • Total maintainers: 1
proxy.golang.org: github.com/featureform/featureform
  • Versions: 41
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 7.0%
Average: 8.2%
Dependent repos count: 9.3%
Last synced: 6 months ago
pypi.org: featureform

Package for the Featureform Feature Store

  • Versions: 155
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 287 Last month
Rankings
Downloads: 7.0%
Dependent packages count: 10.1%
Average: 12.9%
Dependent repos count: 21.6%
Maintainers (1)
Last synced: 6 months ago

Dependencies

.github/workflows/ci.yml actions
  • actions/cache v1 composite
  • actions/checkout v1 composite
  • actions/setup-python v2 composite
.github/workflows/client-ci.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
.github/workflows/dashboard-workflow.yml actions
  • actions/checkout v1 composite
.github/workflows/deploy-helm.yml actions
  • actions/checkout v2 composite
  • docker/build-push-action v3 composite
  • docker/login-action v2 composite
  • docker/setup-buildx-action v2 composite
  • jsdaniell/create-json 1.1.2 composite
.github/workflows/deploy-python.yml actions
  • actions/checkout v2 composite
  • actions/setup-node v3 composite
.github/workflows/e2e.yml actions
  • actions/checkout v2 composite
  • actions/download-artifact v3 composite
  • actions/setup-go v2 composite
  • actions/setup-python v4 composite
  • actions/upload-artifact v3 composite
  • celinekurpershoek/link-checker master composite
  • docker/build-push-action v3 composite
  • docker/setup-buildx-action v2 composite
  • jsdaniell/create-json 1.1.2 composite
.github/workflows/link-check.yml actions
  • actions/checkout master composite
  • gaurav-nelson/github-action-markdown-link-check v1 composite
.github/workflows/linter.yml actions
  • actions/checkout v2 composite
  • actions/checkout v3 composite
  • actions/setup-go v2 composite
  • psf/black stable composite
.github/workflows/pr-notify.yml actions
  • actions/checkout v2 composite
.github/workflows/reset-cluster.yml actions
.github/workflows/testing.yml actions
  • actions/checkout v2 composite
  • actions/download-artifact v3 composite
  • actions/setup-go v2 composite
  • actions/setup-python v4 composite
  • actions/upload-artifact v3 composite
  • codecov/codecov-action v2 composite
  • getong/redis-action v1 composite
  • jsdaniell/create-json 1.1.2 composite
  • cassandra * docker
  • postgres * docker
  • redis * docker
  • redis/redis-stack * docker
Dockerfile docker
  • base latest build
  • golang 1.18 build
  • node 16-alpine build
api/Dockerfile docker
  • alpine latest build
  • golang 1.18-alpine build
backup/Dockerfile docker
  • golang 1.18-alpine build
charts/data_loader/Dockerfile docker
  • golang 1.18-alpine build
coordinator/Dockerfile docker
  • featureformcom/coordinator_base latest build
  • golang 1.18-alpine build
coordinator/scheduletest/Dockerfile docker
  • golang 1.18-alpine build
dashboard/Dockerfile docker
  • base latest build
  • node 16-alpine build
dashboard/package-lock.json npm
  • 765 dependencies
dashboard/package.json npm
  • @next/eslint-plugin-next ^13.4.1 development
  • @testing-library/dom ^9.2.0 development
  • @testing-library/jest-dom ^5.16.4 development
  • @wojtekmaj/enzyme-adapter-react-17 ^0.6.3 development
  • enzyme ^3.11.0 development
  • enzyme-to-json ^3.5.0 development
  • eslint ^8.40.0 development
  • eslint-plugin-jest ^27.2.1 development
  • eslint-plugin-react ^7.32.2 development
  • husky ^8.0.3 development
  • jest ^28.1.3 development
  • jest-canvas-mock ^2.5.0 development
  • jest-environment-jsdom ^28.1.3 development
  • prettier ^2.0.5 development
  • prettier-plugin-organize-imports ^3.2.2 development
  • run-script-os ^1.1.6 development
  • typescript ^3.9.7 development
  • @emotion/react ^11.11.0
  • @emotion/styled ^11.11.0
  • @mui/icons-material ^5.11.16
  • @mui/material ^5.13.0
  • @mui/styles ^5.5.0
  • @mui/x-data-grid ^5.17.16
  • @reduxjs/toolkit ^1.6.2
  • @testing-library/jest-dom ^4.2.4
  • @testing-library/react ^12.1.5
  • @testing-library/user-event ^14.4.3
  • chart.js ^2.9.4
  • chartjs-color ^2.1.0
  • chartjs-plugin-datasource-prometheus ^1.0.11
  • deferred ^0.7.11
  • faker ^5.5.3
  • immer ^9.0.15
  • jspdf ^2.5.1
  • jspdf-autotable ^3.5.28
  • material-table ^2.0.3
  • meilisearch ^0.31.1
  • moment ^2.29.2
  • momentjs ^2.0.0
  • next ^12.2.4-canary.8
  • prometheus-query ^3.2.5
  • react ^17.0.2
  • react-chartjs-2 ^2.11.2
  • react-dom ^17.0.2
  • react-keydown ^1.9.12
  • react-loader-spinner ^4.0.0
  • react-router ^6.8.2
  • react-router-dom ^5.3.0
  • react-select ^5.4.0
  • react-syntax-highlighter ^13.5.3
  • sql-formatter ^12.2.3
  • styled-components ^5.3.6
dashboard/yarn.lock npm
  • 741 dependencies
backup/requirements.txt pypi
  • PyYAML ==6.0
  • beautifulsoup4 ==4.11.2
  • cachetools ==5.3.0
  • click ==8.1.3
  • google-auth ==2.16.0
  • pyasn1 ==0.4.8
  • pyasn1-modules ==0.2.8
  • rsa ==4.9
  • six ==1.16.0
  • soupsieve ==2.3.2.post1
client/doc-requirements.txt pypi
  • mkdocs ==1.5.2
  • mkdocs-autorefs ==0.5.0
  • mkdocs-gen-files ==0.5.0
  • mkdocs-literate-nav ==0.6.0
  • mkdocs-material ==9.2.5
  • mkdocs-material-extensions ==1.1.1
  • mkdocs-section-index ==0.3.5
  • mkdocstrings ==0.22.0
  • mkdocstrings-python ==1.6.0
client/pyproject.toml pypi