lace

A probabalistic ML tool for science

https://github.com/promised-ai/lace

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.4%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

A probabalistic ML tool for science

Basic Info
  • Host: GitHub
  • Owner: promised-ai
  • License: other
  • Language: HTML
  • Default Branch: master
  • Homepage:
  • Size: 30.6 MB
Statistics
  • Stars: 127
  • Watchers: 5
  • Forks: 9
  • Open Issues: 11
  • Releases: 4
Created about 3 years ago · Last pushed about 1 year ago
Metadata Files
Readme Changelog Contributing License Citation Codeowners

README.md

Lace: A Probabilistic Machine Learning tool for Scientific Discovery



Read The Docs Badge Rust: Build and Test Python: Build and Test Crates.io Latest Version PyPi Latest Version
Documentation: User guide | Rust API | Python API |
Installation: Rust | Python | CLI
Contents: Problem | QUICK START | License



Lace is a probabilistic cross-categorization engine written in rust with an optional interface to python. Unlike traditional machine learning methods, which learn some function mapping inputs to outputs, Lace learns a joint probability distribution over your dataset, which enables users to...

  • predict or compute likelihoods of any number of features conditioned on any number of other features
  • identify, quantify, and attribute uncertainty from variance in the data, epistemic uncertainty in the model, and missing features
  • determine which variables are predictive of which others
  • determine which records/rows are similar to which others on the whole or given a specific context
  • simulate and manipulate synthetic data
  • work natively with missing data and make inferences about missingness (missing not-at-random)
  • work with continuous and categorical data natively, without transformation
  • identify anomalies, errors, and inconsistencies within the data
  • edit, backfill, and append data without retraining

and more, all in one place, without any explicit model building.

```python import pandas as pd import lace

Create an engine from a dataframe

df = pd.readcsv("animals.csv", indexcol=0) engine = lace.Engine.from_df(df)

Fit a model to the dataframe over 5000 steps of the fitting procedure

engine.update(5000)

Show the statistical structure of the data -- which features are likely

dependent (predictive) on each other

engine.clustermap("depprob", zmin=0, zmax=1) ```

Animals dataset dependence probability

The Problem

The goal of lace is to fill some of the massive chasm between standard machine learning (ML) methods like deep learning and random forests, and statistical methods like probabilistic programming languages. We wanted to develop a machine that allows users to experience the joy of discovery, and indeed optimizes for it.

Short version

Standard, optimization-based ML methods don't help you learn about your data. Probabilistic programming tools assume you already have learned a lot about your data. Neither approach is optimized for what we think is the most important part of data science: the science part: asking and answering questions.

Long version

Standard ML methods are easy to use. You can throw data into a random forest and start predicting with little thought. These methods attempt to learn a function f(x) -> y that maps inputs x, to outputs y. This ease-of-use comes at a cost. Generally f(x) does not reflect the reality of the process that generated your data, but was instead chosen by whoever developed the approach to be sufficiently expressive to better achieve the optimization goal. This renders most standard ML completely uninterpretable and unable to yield sensible uncertainty estimate.

On the other extreme you have probabilistic tools like probabilistic programming languages (PPLs). A user specifies a model to a PPL in terms of a hierarchy of probability distributions with parameters θ. The PPL then uses a procedure (normally Markov Chain Monte Carlo) to learn about the posterior distribution of the parameters given the data p(θ|x). PPLs are all about interpretability and uncertainty quantification, but they place a number of pretty steep requirements on the user. PPL users must specify the model themselves from scratch, meaning they must know (or at least guess) the model. PPL users must also know how to specify such a model in a way that is compatible with the underlying inference procedure.

Example use cases

  • Combine data sources and understand how they interact. For example, we may wish to predict cognitive decline from demographics, survey or task performance, EKG data, and other clinical data. Combined, this data would typically be very sparse (most patients will not have all fields filled in), and it is difficult to know how to explicitly model the interaction of these data layers. In Lace, we would just concatenate the layers and run them through.
  • Understanding the amount and causes of uncertainty over time. For example, a farmer may wish to understand the likelihood of achieving a specific yield over the growing season. As the season progresses, new weather data can be added to the prediction in the form of conditions. Uncertainty can be visualized as variance in the prediction, disagreement between posterior samples, or multi-modality in the predictive distribution (see this blog post for more information on uncertainty)
  • Data quality control. Use surprisal to find anomalous data in the table and use -logp to identify anomalies before they enter the table. Because Lace creates a model of the data, we can also contrive methods to find data that are inconsistent with that model, which we have used to good effect in error finding.

Who should not use Lace

There are a number of use cases for which Lace is not suited

  • Non-tabular data such as images and text
  • Highly optimizing specific predictions
    • Lace would rather over-generalize than over fit.

Quick start

Installation

Lace requires rust.

To install the CLI: $ cargo install --locked lace-cli

To install pylace

$ pip install pylace

Examples

Lace comes with two pre-fit example data sets: Satellites and Animals.

```python

from lace.examples import Satellites engine = Satellites()

Predict the class of orbit given the satellite has a 75-minute

orbital period and that it has a missing value of geosynchronous

orbit longitude, and return epistemic uncertainty via Jensen-

Shannon divergence.

engine.predict( ... 'ClassofOrbit', ... given={ ... 'Periodminutes': 75.0, ... 'longituderadiansofgeo': None, ... }, ... ) ('LEO', 0.023981898950561048)

Find the top 10 most surprising (anomalous) orbital periods in

the table

engine.surprisal('Periodminutes') \ ... .sort('surprisal', reverse=True) \ ... .head(10) shape: (10, 3) ┌─────────────────────────────────────┬────────────────┬───────────┐ │ index ┆ Periodminutes ┆ surprisal │ │ --- ┆ --- ┆ --- │ │ str ┆ f64 ┆ f64 │ ╞═════════════════════════════════════╪════════════════╪═══════════╡ │ Wind (International Solar-Terres... ┆ 19700.45 ┆ 11.019368 │ │ Integral (INTErnational Gamma-Ra... ┆ 4032.86 ┆ 9.556746 │ │ Chandra X-Ray Observatory (CXO) ┆ 3808.92 ┆ 9.477986 │ │ Tango (part of Cluster quartet, ... ┆ 3442.0 ┆ 9.346999 │ │ ... ┆ ... ┆ ... │ │ Salsa (part of Cluster quartet, ... ┆ 3418.2 ┆ 9.338377 │ │ XMM Newton (High Throughput X-ra... ┆ 2872.15 ┆ 9.13493 │ │ Geotail (Geomagnetic Tail Labora... ┆ 2474.83 ┆ 8.981458 │ │ Interstellar Boundary EXplorer (... ┆ 0.22 ┆ 8.884579 │ └─────────────────────────────────────┴────────────────┴───────────┘ ```

And similarly in rust:

```rust,noplayground use lace::prelude::*; use lace::examples::Example;

fn main() { // In rust, you can create an Engine or and Oracle. The Oracle is an // immutable version of an Engine; it has the same inference functions as // the Engine, but you cannot train or edit data. let mut engine = Example::Satellites.engine().unwrap();

// Predict the class of orbit given the satellite has a 75-minute
// orbital period and that it has a missing value of geosynchronous
// orbit longitude, and return epistemic uncertainty via Jensen-
// Shannon divergence.
engine.predict(
    "Class_of_Orbit",
    &Given::Conditions(vec![
        ("Period_minutes", Datum:Continuous(75.0)),
        ("Longitude_of_radians_geo", Datum::Missing),
    ]),
    Some(PredictUncertaintyType::JsDivergence),
    None,
)

} ```

Fitting a model

To fit a model to your own data you can use the CLI

console $ lace run --csv my-data.csv -n 1000 my-data.lace

...or initialize an engine from a file or dataframe.

```python

import pandas as pd # Lace supports polars as well from lace import Engine engine = Engine.fromdf(pd.readcsv("my-data.csv", indexcol=0)) engine.update(1000) engine.save("my-data.lace") ```

You can monitor the progress of the training using diagnostic plots

```python

from lace.plot import diagnostics diagnostics(engine) ```

Animals MCMC convergence

License

Lace is licensed under the Business Source License v1.1, which restricts commercial use. See LICENSE for full details.

If you would like a license for use in commercial please contact lace@redpoll.ai

Academic use

Lace is free for academic use. Please cite lace according the the CITATION.cff metadata.

Owner

  • Name: promised-ai
  • Login: promised-ai
  • Kind: organization

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: 'Lace: Bayesian Tabular Analysis for Scientific Discovery'
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Baxter
    family-names: Eaves
    name-suffix: Jr
    email: bax@redpoll.ai
    affiliation: Redpoll
  - given-names: Michael
    family-names: Schmidt
    email: schmidt@redpoll.ai
    affiliation: Redpoll
  - given-names: Ken
    family-names: Swanson
    email: ken.swanson@redpoll.ai
    affiliation: Redpoll
identifiers:
  - type: url
    value: 'https://github.com/promised-ai/lace'
    description: Github repository
repository-code: 'https://github.com/promised-ai/lace'
url: 'https://lace.dev'
abstract: >-
  Lace is a probabilistic cross-categorization engine
  written in rust.
keywords:
  - Bayesian
  - Machine Learning
license: BUSL-1.1
version: 0.8.0
date-released: '2024-02-07'

GitHub Events

Total
  • Watch event: 22
  • Push event: 4
  • Fork event: 1
  • Create event: 1
Last Year
  • Watch event: 22
  • Push event: 4
  • Fork event: 1
  • Create event: 1

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 32
  • Total pull requests: 199
  • Average time to close issues: about 2 months
  • Average time to close pull requests: 4 days
  • Total issue authors: 10
  • Total pull request authors: 4
  • Average comments per issue: 1.34
  • Average comments per pull request: 0.17
  • Merged pull requests: 174
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • BaxterEaves (9)
  • schmidmt (6)
  • firekg (3)
  • amifalk (1)
  • TempTemperson (1)
  • perplexes (1)
  • thomasaarholt (1)
  • zamazan4ik (1)
Pull Request Authors
  • Swandog (73)
  • BaxterEaves (69)
  • schmidmt (44)
  • TempTemperson (2)
Top Labels
Issue Labels
enhancement (12) Python (6) bug (4) Rust (2) documentation (1)
Pull Request Labels
enhancement (19) Python (18) bug (3) Rust (1) chore (1)

Packages

  • Total packages: 11
  • Total downloads:
    • cargo 112,481 total
    • pypi 1,135 last-month
  • Total dependent packages: 29
    (may contain duplicates)
  • Total dependent repositories: 8
    (may contain duplicates)
  • Total versions: 96
  • Total maintainers: 6
crates.io: lace_utils

Miscellaneous utilities for Lace and shared libraries

  • Versions: 5
  • Dependent Packages: 6
  • Dependent Repositories: 1
  • Downloads: 9,267 Total
Rankings
Dependent packages count: 5.4%
Dependent repos count: 16.6%
Stargazers count: 17.3%
Average: 18.3%
Forks count: 21.2%
Downloads: 30.8%
Maintainers (3)
Last synced: 7 months ago
pypi.org: pylace

A probabalistic programming ML tool for science

  • Versions: 14
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 1,135 Last month
Rankings
Dependent packages count: 6.6%
Average: 18.6%
Dependent repos count: 30.6%
Maintainers (3)
Last synced: 7 months ago
crates.io: lace_data

Data definitions and data container definitions for Lace

  • Versions: 5
  • Dependent Packages: 5
  • Dependent Repositories: 1
  • Downloads: 8,908 Total
Rankings
Dependent packages count: 6.2%
Dependent repos count: 16.6%
Stargazers count: 17.3%
Average: 18.7%
Forks count: 21.2%
Downloads: 32.0%
Maintainers (3)
Last synced: 7 months ago
crates.io: lace_stats

Contains component model and hyperprior specifications

  • Versions: 8
  • Dependent Packages: 5
  • Dependent Repositories: 1
  • Downloads: 11,782 Total
Rankings
Dependent packages count: 6.2%
Dependent repos count: 16.6%
Stargazers count: 17.3%
Average: 18.8%
Forks count: 21.2%
Downloads: 32.7%
Maintainers (3)
Last synced: 7 months ago
crates.io: lace_consts

Default constants for Lace

  • Versions: 5
  • Dependent Packages: 4
  • Dependent Repositories: 1
  • Downloads: 8,887 Total
Rankings
Dependent packages count: 7.4%
Dependent repos count: 16.6%
Stargazers count: 17.3%
Average: 18.8%
Forks count: 21.2%
Downloads: 31.6%
Maintainers (3)
Last synced: 7 months ago
crates.io: lace_codebook

Contains the Lace codebook specification as well as utilities for generating defaults

  • Versions: 12
  • Dependent Packages: 3
  • Dependent Repositories: 1
  • Downloads: 15,256 Total
Rankings
Dependent packages count: 9.2%
Dependent repos count: 16.6%
Stargazers count: 17.3%
Average: 19.7%
Forks count: 21.2%
Downloads: 34.2%
Maintainers (3)
Last synced: 7 months ago
crates.io: lace_geweke

Geweke tester for Lace

  • Versions: 7
  • Dependent Packages: 2
  • Dependent Repositories: 1
  • Downloads: 9,803 Total
Rankings
Dependent packages count: 12.2%
Dependent repos count: 16.6%
Stargazers count: 17.3%
Average: 21.1%
Forks count: 21.2%
Downloads: 38.1%
Maintainers (3)
Last synced: 7 months ago
crates.io: lace_cc

Core of the Lace cross-categorization engine library

  • Versions: 12
  • Dependent Packages: 2
  • Dependent Repositories: 1
  • Downloads: 14,706 Total
Rankings
Dependent packages count: 12.2%
Dependent repos count: 16.6%
Stargazers count: 17.3%
Forks count: 21.2%
Average: 21.3%
Downloads: 39.1%
Maintainers (3)
Last synced: 7 months ago
crates.io: lace_metadata

Archive of the metadata (savefile) formats for Lace. In charge of versioning and conversion.

  • Versions: 12
  • Dependent Packages: 1
  • Dependent Repositories: 1
  • Downloads: 14,152 Total
Rankings
Dependent repos count: 16.6%
Stargazers count: 17.3%
Dependent packages count: 18.2%
Forks count: 21.2%
Average: 23.2%
Downloads: 42.6%
Maintainers (3)
Last synced: 7 months ago
crates.io: lace

A probabilistic cross-categorization engine

  • Versions: 13
  • Dependent Packages: 1
  • Dependent Repositories: 0
  • Downloads: 15,910 Total
Rankings
Dependent repos count: 29.3%
Dependent packages count: 33.8%
Average: 53.7%
Downloads: 97.9%
Maintainers (3)
Last synced: 7 months ago
crates.io: lace-cli

A probabilistic cross-categorization engine

  • Versions: 3
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 3,810 Total
Rankings
Dependent repos count: 30.8%
Dependent packages count: 36.1%
Average: 55.1%
Downloads: 98.4%
Maintainers (1)
Last synced: 7 months ago

Dependencies

.github/workflows/deploy-gh-pages.yaml actions
  • actions/checkout v3 composite
  • actions/configure-pages v3 composite
  • actions/deploy-pages v1 composite
  • actions/upload-pages-artifact v1 composite
  • dtolnay/rust-toolchain stable composite
.github/workflows/python-build-test.yaml actions
  • PyO3/maturin-action v1 composite
  • Swatinem/rust-cache v2 composite
  • actions/checkout v3 composite
  • actions/download-artifact v3 composite
  • actions/setup-python v4 composite
  • actions/upload-artifact v3 composite
  • dtolnay/rust-toolchain stable composite
.github/workflows/rust-build-test.yaml actions
  • Swatinem/rust-cache v2 composite
  • actions-rs/cargo v1 composite
  • actions/checkout v3 composite
  • dtolnay/rust-toolchain stable composite
book/lace_preprocess_mdbook_yaml/Cargo.lock cargo
  • 363 dependencies
book/lace_preprocess_mdbook_yaml/Cargo.toml cargo
lace/Cargo.lock cargo
  • 328 dependencies
lace/Cargo.toml cargo
  • approx 0.5.1 development
  • criterion 0.5 development
  • indoc 2.0.3 development
  • once_cell 1.13.0 development
  • plotly 0.8 development
  • tempfile 3.4 development
  • bincode 1
  • clap 4.3.17
  • ctrlc 3.2.1
  • dirs 5
  • env_logger 0.10
  • flate2 1.0.23
  • indexmap 2.0.0
  • indicatif 0.17.0
  • itertools 0.11
  • lace_cc 0.2.0
  • lace_codebook 0.2.0
  • lace_consts 0.1.4
  • lace_data 0.1.2
  • lace_geweke 0.1.2
  • lace_metadata 0.2.0
  • lace_stats 0.1.3
  • lace_utils 0.1.2
  • log 0.4
  • maplit 1
  • num 0.4
  • polars 0.33
  • rand 0.8
  • rand_distr 0.4
  • rand_xoshiro 0.6
  • rayon 1.5
  • regex 1
  • serde 1
  • serde_json 1
  • serde_yaml 0.9.4
  • special 0.10
  • thiserror 1.0.19
  • toml 0.7
lace/lace_cc/Cargo.toml cargo
  • approx 0.5.1 development
  • criterion 0.5 development
  • indoc 2.0.3 development
  • enum_dispatch 0.3.10
  • indicatif 0.17.0
  • itertools 0.11
  • lace_codebook 0.2.0
  • lace_consts 0.1.4
  • lace_data 0.1.2
  • lace_geweke 0.1.2
  • lace_stats 0.1.2
  • lace_utils 0.1.2
  • once_cell 1
  • rand 0.8
  • rand_xoshiro 0.6
  • rayon 1.5
  • serde 1
  • special 0.10
  • thiserror 1.0.19
lace/lace_codebook/Cargo.toml cargo
  • indoc 2 development
  • tempfile 3.3.0 development
  • flate2 1.0.23
  • lace_consts 0.1.4
  • lace_data 0.1.2
  • lace_stats 0.1.4
  • lace_utils 0.1.2
  • maplit 1
  • polars 0.33
  • rand 0.8.5
  • rayon 1.5
  • serde 1
  • serde_yaml 0.9.4
  • thiserror 1.0.11
lace/lace_consts/Cargo.toml cargo
lace/lace_data/Cargo.toml cargo
  • approx 0.5.1 development
  • criterion 0.5 development
  • rand 0.8 development
  • serde_json 1 development
  • lace_utils 0.1.2
  • regex 1
  • serde 1
  • thiserror 1.0.19
lace/lace_geweke/Cargo.toml cargo
lace/lace_metadata/Cargo.toml cargo
  • tempfile 3 development
  • bincode 1
  • dirs 5
  • hex 0.4
  • lace_cc 0.2.0
  • lace_codebook 0.2.0
  • lace_data 0.1.2
  • lace_stats 0.1.4
  • log 0.4
  • once_cell 1
  • rand_xoshiro 0.6
  • rayon 1.5
  • serde 1
  • serde_json 1
  • serde_yaml 0.9.4
  • thiserror 1.0.19
  • toml 0.7
lace/lace_stats/Cargo.toml cargo
  • approx 0.5.1 development
  • criterion 0.5 development
  • maplit 1 development
  • rand_distr 0.4 development
  • serde_json 1 development
  • itertools 0.11
  • lace_consts 0.1.4
  • lace_data 0.1.2
  • lace_utils 0.1.2
  • rand 0.8
  • rand_xoshiro 0.6
  • regex 1.6.0
  • serde 1
  • special 0.10
  • thiserror 1.0.11
lace/lace_utils/Cargo.toml cargo
  • approx 0.5.1 development
  • rand 0.8
lace/Dockerfile docker
  • alpine 3.11 build
  • rust 1.42-alpine build