lace

A probabalistic ML tool for science

https://github.com/promised-ai/lace

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.4%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

A probabalistic ML tool for science

Basic Info

Host: GitHub
Owner: promised-ai
License: other
Language: HTML
Default Branch: master
Homepage:
Size: 30.6 MB

Statistics

Stars: 127
Watchers: 5
Forks: 9
Open Issues: 11
Releases: 4

Created over 3 years ago · Last pushed over 1 year ago

Metadata Files

Readme Changelog Contributing License Citation Codeowners

Lace: A Probabilistic Machine Learning tool for Scientific Discovery

Documentation: User guide | Rust API | Python API |

Installation: Rust | Python | CLI

Contents: Problem | QUICK START | License

Lace is a probabilistic cross-categorization engine written in rust with an optional interface to python. Unlike traditional machine learning methods, which learn some function mapping inputs to outputs, Lace learns a joint probability distribution over your dataset, which enables users to...

predict or compute likelihoods of any number of features conditioned on any number of other features
identify, quantify, and attribute uncertainty from variance in the data, epistemic uncertainty in the model, and missing features
determine which variables are predictive of which others
determine which records/rows are similar to which others on the whole or given a specific context
simulate and manipulate synthetic data
work natively with missing data and make inferences about missingness (missing not-at-random)
work with continuous and categorical data natively, without transformation
identify anomalies, errors, and inconsistencies within the data
edit, backfill, and append data without retraining

and more, all in one place, without any explicit model building.

```python import pandas as pd import lace

Create an engine from a dataframe

df = pd.readcsv("animals.csv", indexcol=0) engine = lace.Engine.from_df(df)

Fit a model to the dataframe over 5000 steps of the fitting procedure

engine.update(5000)

Show the statistical structure of the data -- which features are likely

dependent (predictive) on each other

engine.clustermap("depprob", zmin=0, zmax=1) ```

Animals dataset dependence probability

The Problem

The goal of lace is to fill some of the massive chasm between standard machine learning (ML) methods like deep learning and random forests, and statistical methods like probabilistic programming languages. We wanted to develop a machine that allows users to experience the joy of discovery, and indeed optimizes for it.

Short version

Standard, optimization-based ML methods don't help you learn about your data. Probabilistic programming tools assume you already have learned a lot about your data. Neither approach is optimized for what we think is the most important part of data science: the science part: asking and answering questions.

Long version

Standard ML methods are easy to use. You can throw data into a random forest and start predicting with little thought. These methods attempt to learn a function f(x) -> y that maps inputs x, to outputs y. This ease-of-use comes at a cost. Generally f(x) does not reflect the reality of the process that generated your data, but was instead chosen by whoever developed the approach to be sufficiently expressive to better achieve the optimization goal. This renders most standard ML completely uninterpretable and unable to yield sensible uncertainty estimate.

On the other extreme you have probabilistic tools like probabilistic programming languages (PPLs). A user specifies a model to a PPL in terms of a hierarchy of probability distributions with parameters θ. The PPL then uses a procedure (normally Markov Chain Monte Carlo) to learn about the posterior distribution of the parameters given the data p(θ|x). PPLs are all about interpretability and uncertainty quantification, but they place a number of pretty steep requirements on the user. PPL users must specify the model themselves from scratch, meaning they must know (or at least guess) the model. PPL users must also know how to specify such a model in a way that is compatible with the underlying inference procedure.

Example use cases

Combine data sources and understand how they interact. For example, we may wish to predict cognitive decline from demographics, survey or task performance, EKG data, and other clinical data. Combined, this data would typically be very sparse (most patients will not have all fields filled in), and it is difficult to know how to explicitly model the interaction of these data layers. In Lace, we would just concatenate the layers and run them through.
Understanding the amount and causes of uncertainty over time. For example, a farmer may wish to understand the likelihood of achieving a specific yield over the growing season. As the season progresses, new weather data can be added to the prediction in the form of conditions. Uncertainty can be visualized as variance in the prediction, disagreement between posterior samples, or multi-modality in the predictive distribution (see this blog post for more information on uncertainty)
Data quality control. Use surprisal to find anomalous data in the table and use -logp to identify anomalies before they enter the table. Because Lace creates a model of the data, we can also contrive methods to find data that are inconsistent with that model, which we have used to good effect in error finding.

Who should not use Lace

There are a number of use cases for which Lace is not suited

Non-tabular data such as images and text
Highly optimizing specific predictions
- Lace would rather over-generalize than over fit.

Quick start

Installation

Lace requires rust.

To install the CLI: $ cargo install --locked lace-cli

To install pylace

$ pip install pylace

Examples

Lace comes with two pre-fit example data sets: Satellites and Animals.

```python

from lace.examples import Satellites engine = Satellites()

Predict the class of orbit given the satellite has a 75-minute

orbital period and that it has a missing value of geosynchronous

orbit longitude, and return epistemic uncertainty via Jensen-

Shannon divergence.

engine.predict( ... 'ClassofOrbit', ... given={ ... 'Periodminutes': 75.0, ... 'longituderadiansofgeo': None, ... }, ... ) ('LEO', 0.023981898950561048)

Find the top 10 most surprising (anomalous) orbital periods in

the table

engine.surprisal('Periodminutes') \ ... .sort('surprisal', reverse=True) \ ... .head(10) shape: (10, 3) ┌─────────────────────────────────────┬────────────────┬───────────┐ │ index ┆ Periodminutes ┆ surprisal │ │ --- ┆ --- ┆ --- │ │ str ┆ f64 ┆ f64 │ ╞═════════════════════════════════════╪════════════════╪═══════════╡ │ Wind (International Solar-Terres... ┆ 19700.45 ┆ 11.019368 │ │ Integral (INTErnational Gamma-Ra... ┆ 4032.86 ┆ 9.556746 │ │ Chandra X-Ray Observatory (CXO) ┆ 3808.92 ┆ 9.477986 │ │ Tango (part of Cluster quartet, ... ┆ 3442.0 ┆ 9.346999 │ │ ... ┆ ... ┆ ... │ │ Salsa (part of Cluster quartet, ... ┆ 3418.2 ┆ 9.338377 │ │ XMM Newton (High Throughput X-ra... ┆ 2872.15 ┆ 9.13493 │ │ Geotail (Geomagnetic Tail Labora... ┆ 2474.83 ┆ 8.981458 │ │ Interstellar Boundary EXplorer (... ┆ 0.22 ┆ 8.884579 │ └─────────────────────────────────────┴────────────────┴───────────┘ ```

And similarly in rust:

```rust,noplayground use lace::prelude::*; use lace::examples::Example;

fn main() { // In rust, you can create an Engine or and Oracle. The Oracle is an // immutable version of an Engine; it has the same inference functions as // the Engine, but you cannot train or edit data. let mut engine = Example::Satellites.engine().unwrap();

// Predict the class of orbit given the satellite has a 75-minute
// orbital period and that it has a missing value of geosynchronous
// orbit longitude, and return epistemic uncertainty via Jensen-
// Shannon divergence.
engine.predict(
    "Class_of_Orbit",
    &Given::Conditions(vec![
        ("Period_minutes", Datum:Continuous(75.0)),
        ("Longitude_of_radians_geo", Datum::Missing),
    ]),
    Some(PredictUncertaintyType::JsDivergence),
    None,
)

} ```

Fitting a model

To fit a model to your own data you can use the CLI

console $ lace run --csv my-data.csv -n 1000 my-data.lace

...or initialize an engine from a file or dataframe.

```python

import pandas as pd # Lace supports polars as well from lace import Engine engine = Engine.fromdf(pd.readcsv("my-data.csv", indexcol=0)) engine.update(1000) engine.save("my-data.lace") ```

You can monitor the progress of the training using diagnostic plots

```python

from lace.plot import diagnostics diagnostics(engine) ```

Animals MCMC convergence

License

Lace is licensed under the Business Source License v1.1, which restricts commercial use. See LICENSE for full details.

If you would like a license for use in commercial please contact lace@redpoll.ai

Academic use

Lace is free for academic use. Please cite lace according the the CITATION.cff metadata.

Owner

Name: promised-ai
Login: promised-ai
Kind: organization

Repositories: 1
Profile: https://github.com/promised-ai

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: 'Lace: Bayesian Tabular Analysis for Scientific Discovery'
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Baxter
    family-names: Eaves
    name-suffix: Jr
    email: bax@redpoll.ai
    affiliation: Redpoll
  - given-names: Michael
    family-names: Schmidt
    email: schmidt@redpoll.ai
    affiliation: Redpoll
  - given-names: Ken
    family-names: Swanson
    email: ken.swanson@redpoll.ai
    affiliation: Redpoll
identifiers:
  - type: url
    value: 'https://github.com/promised-ai/lace'
    description: Github repository
repository-code: 'https://github.com/promised-ai/lace'
url: 'https://lace.dev'
abstract: >-
  Lace is a probabilistic cross-categorization engine
  written in rust.
keywords:
  - Bayesian
  - Machine Learning
license: BUSL-1.1
version: 0.8.0
date-released: '2024-02-07'

GitHub Events

Total

Watch event: 22
Push event: 4
Fork event: 1
Create event: 1

Last Year

Watch event: 22
Push event: 4
Fork event: 1
Create event: 1

Issues and Pull Requests

Last synced: 11 months ago

All Time

Total issues: 32
Total pull requests: 199
Average time to close issues: about 2 months
Average time to close pull requests: 4 days
Total issue authors: 10
Total pull request authors: 4
Average comments per issue: 1.34
Average comments per pull request: 0.17
Merged pull requests: 174
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

BaxterEaves (9)
schmidmt (6)
firekg (3)
amifalk (1)
TempTemperson (1)
perplexes (1)
thomasaarholt (1)
zamazan4ik (1)

Pull Request Authors

Swandog (73)
BaxterEaves (69)
schmidmt (44)
TempTemperson (2)

Top Labels

Issue Labels

enhancement (12) Python (6) bug (4) Rust (2) documentation (1)

Pull Request Labels

enhancement (19) Python (18) bug (3) Rust (1) chore (1)

Packages

Total packages: 11
Total downloads:
- cargo 112,481 total
- pypi 1,135 last-month

Total dependent packages: 29
(may contain duplicates)
Total dependent repositories: 8
(may contain duplicates)
Total versions: 96
Total maintainers: 6

crates.io: lace_utils

Miscellaneous utilities for Lace and shared libraries

Homepage: https://www.lace.dev/
Documentation: https://docs.rs/lace_utils/
License: BUSL-1.1
Latest release: 0.3.0
published over 2 years ago

Versions: 5
Dependent Packages: 6
Dependent Repositories: 1
Downloads: 9,267 Total

Rankings

Dependent packages count: 5.4%

Dependent repos count: 16.6%

Stargazers count: 17.3%

Average: 18.3%

Forks count: 21.2%

Downloads: 30.8%

Maintainers (3)

BaxterEaves schmidmt Swandog

Last synced: 11 months ago

pypi.org: pylace

A probabalistic programming ML tool for science

Documentation: https://pylace.readthedocs.io/
License: BUSL-1.1
Latest release: 0.8.0
published almost 2 years ago

Versions: 14
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 1,135 Last month

Rankings

Dependent packages count: 6.6%

Average: 18.6%

Dependent repos count: 30.6%

Maintainers (3)

schmidmt KenSwanson bbbaaaxxx

Last synced: 11 months ago

crates.io: lace_data

Data definitions and data container definitions for Lace

Homepage: https://www.lace.dev/
Documentation: https://docs.rs/lace_data/
License: BUSL-1.1
Latest release: 0.3.0
published over 2 years ago

Versions: 5
Dependent Packages: 5
Dependent Repositories: 1
Downloads: 8,908 Total

Rankings

Dependent packages count: 6.2%

Dependent repos count: 16.6%

Stargazers count: 17.3%

Average: 18.7%

Forks count: 21.2%

Downloads: 32.0%

Maintainers (3)

BaxterEaves schmidmt Swandog

Last synced: 11 months ago

crates.io: lace_stats

Contains component model and hyperprior specifications

Homepage: https://www.lace.dev/
Documentation: https://docs.rs/lace_stats/
License: BUSL-1.1
Latest release: 0.4.0
published almost 2 years ago

Versions: 8
Dependent Packages: 5
Dependent Repositories: 1
Downloads: 11,782 Total

Rankings

Dependent packages count: 6.2%

Dependent repos count: 16.6%

Stargazers count: 17.3%

Average: 18.8%

Forks count: 21.2%

Downloads: 32.7%

Maintainers (3)

BaxterEaves schmidmt Swandog

Last synced: 11 months ago

crates.io: lace_consts

Default constants for Lace

Homepage: https://www.lace.dev/
Documentation: https://docs.rs/lace_consts/
License: BUSL-1.1
Latest release: 0.2.1
published over 2 years ago

Versions: 5
Dependent Packages: 4
Dependent Repositories: 1
Downloads: 8,887 Total

Rankings

Dependent packages count: 7.4%

Dependent repos count: 16.6%

Stargazers count: 17.3%

Average: 18.8%

Forks count: 21.2%

Downloads: 31.6%

Maintainers (3)

BaxterEaves schmidmt Swandog

Last synced: 11 months ago

crates.io: lace_codebook

Contains the Lace codebook specification as well as utilities for generating defaults

Homepage: https://www.lace.dev/
Documentation: https://docs.rs/lace_codebook/
License: BUSL-1.1
Latest release: 0.7.0
published almost 2 years ago

Versions: 12
Dependent Packages: 3
Dependent Repositories: 1
Downloads: 15,256 Total

Rankings

Dependent packages count: 9.2%

Dependent repos count: 16.6%

Stargazers count: 17.3%

Average: 19.7%

Forks count: 21.2%

Downloads: 34.2%

Maintainers (3)

BaxterEaves schmidmt Swandog

Last synced: 11 months ago

crates.io: lace_geweke

Geweke tester for Lace

Homepage: https://www.lace.dev/
Documentation: https://docs.rs/lace_geweke/
License: BUSL-1.1
Latest release: 0.4.0
published almost 2 years ago

Versions: 7
Dependent Packages: 2
Dependent Repositories: 1
Downloads: 9,803 Total

Rankings

Dependent packages count: 12.2%

Dependent repos count: 16.6%

Stargazers count: 17.3%

Average: 21.1%

Forks count: 21.2%

Downloads: 38.1%

Maintainers (3)

BaxterEaves schmidmt Swandog

Last synced: 11 months ago

crates.io: lace_cc

Core of the Lace cross-categorization engine library

Homepage: https://www.lace.dev/
Documentation: https://docs.rs/lace_cc/
License: BUSL-1.1
Latest release: 0.7.0
published almost 2 years ago

Versions: 12
Dependent Packages: 2
Dependent Repositories: 1
Downloads: 14,706 Total

Rankings

Dependent packages count: 12.2%

Dependent repos count: 16.6%

Stargazers count: 17.3%

Forks count: 21.2%

Average: 21.3%

Downloads: 39.1%

Maintainers (3)

BaxterEaves schmidmt Swandog

Last synced: 11 months ago

crates.io: lace_metadata

Archive of the metadata (savefile) formats for Lace. In charge of versioning and conversion.

Homepage: https://www.lace.dev/
Documentation: https://docs.rs/lace_metadata/
License: BUSL-1.1
Latest release: 0.7.0
published almost 2 years ago

Versions: 12
Dependent Packages: 1
Dependent Repositories: 1
Downloads: 14,152 Total

Rankings

Dependent repos count: 16.6%

Stargazers count: 17.3%

Dependent packages count: 18.2%

Forks count: 21.2%

Average: 23.2%

Downloads: 42.6%

Maintainers (3)

BaxterEaves schmidmt Swandog

Last synced: 11 months ago

crates.io: lace

A probabilistic cross-categorization engine

Homepage: https://www.lace.dev/
Documentation: https://docs.rs/lace/
License: BUSL-1.1
Latest release: 0.8.0
published almost 2 years ago

Versions: 13
Dependent Packages: 1
Dependent Repositories: 0
Downloads: 15,910 Total

Rankings

Dependent repos count: 29.3%

Dependent packages count: 33.8%

Average: 53.7%

Downloads: 97.9%

Maintainers (3)

BaxterEaves schmidmt Swandog

Last synced: 11 months ago

crates.io: lace-cli

A probabilistic cross-categorization engine

Homepage: https://www.lace.dev/
Documentation: https://docs.rs/lace-cli/
License: BUSL-1.1
Latest release: 0.7.0
published over 2 years ago

Versions: 3
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 3,810 Total

Rankings

Dependent repos count: 30.8%

Dependent packages count: 36.1%

Average: 55.1%

Downloads: 98.4%

Maintainers (1)

Swandog

Last synced: 11 months ago

Dependencies

.github/workflows/deploy-gh-pages.yaml actions

actions/checkout v3 composite
actions/configure-pages v3 composite
actions/deploy-pages v1 composite
actions/upload-pages-artifact v1 composite
dtolnay/rust-toolchain stable composite

.github/workflows/python-build-test.yaml actions

PyO3/maturin-action v1 composite
Swatinem/rust-cache v2 composite
actions/checkout v3 composite
actions/download-artifact v3 composite
actions/setup-python v4 composite
actions/upload-artifact v3 composite
dtolnay/rust-toolchain stable composite

.github/workflows/rust-build-test.yaml actions

Swatinem/rust-cache v2 composite
actions-rs/cargo v1 composite
actions/checkout v3 composite
dtolnay/rust-toolchain stable composite

book/lace_preprocess_mdbook_yaml/Cargo.lock cargo

363 dependencies

book/lace_preprocess_mdbook_yaml/Cargo.toml cargo

lace/Cargo.lock cargo

328 dependencies

lace/Cargo.toml cargo

approx 0.5.1 development
criterion 0.5 development
indoc 2.0.3 development
once_cell 1.13.0 development
plotly 0.8 development
tempfile 3.4 development
bincode 1
clap 4.3.17
ctrlc 3.2.1
dirs 5
env_logger 0.10
flate2 1.0.23
indexmap 2.0.0
indicatif 0.17.0
itertools 0.11
lace_cc 0.2.0
lace_codebook 0.2.0
lace_consts 0.1.4
lace_data 0.1.2
lace_geweke 0.1.2
lace_metadata 0.2.0
lace_stats 0.1.3
lace_utils 0.1.2
log 0.4
maplit 1
num 0.4
polars 0.33
rand 0.8
rand_distr 0.4
rand_xoshiro 0.6
rayon 1.5
regex 1
serde 1
serde_json 1
serde_yaml 0.9.4
special 0.10
thiserror 1.0.19
toml 0.7

lace/lace_cc/Cargo.toml cargo

approx 0.5.1 development
criterion 0.5 development
indoc 2.0.3 development
enum_dispatch 0.3.10
indicatif 0.17.0
itertools 0.11
lace_codebook 0.2.0
lace_consts 0.1.4
lace_data 0.1.2
lace_geweke 0.1.2
lace_stats 0.1.2
lace_utils 0.1.2
once_cell 1
rand 0.8
rand_xoshiro 0.6
rayon 1.5
serde 1
special 0.10
thiserror 1.0.19

lace/lace_codebook/Cargo.toml cargo

indoc 2 development
tempfile 3.3.0 development
flate2 1.0.23
lace_consts 0.1.4
lace_data 0.1.2
lace_stats 0.1.4
lace_utils 0.1.2
maplit 1
polars 0.33
rand 0.8.5
rayon 1.5
serde 1
serde_yaml 0.9.4
thiserror 1.0.11

lace/lace_consts/Cargo.toml cargo

lace/lace_data/Cargo.toml cargo

approx 0.5.1 development
criterion 0.5 development
rand 0.8 development
serde_json 1 development
lace_utils 0.1.2
regex 1
serde 1
thiserror 1.0.19

lace/lace_geweke/Cargo.toml cargo

lace/lace_metadata/Cargo.toml cargo

tempfile 3 development
bincode 1
dirs 5
hex 0.4
lace_cc 0.2.0
lace_codebook 0.2.0
lace_data 0.1.2
lace_stats 0.1.4
log 0.4
once_cell 1
rand_xoshiro 0.6
rayon 1.5
serde 1
serde_json 1
serde_yaml 0.9.4
thiserror 1.0.19
toml 0.7

lace/lace_stats/Cargo.toml cargo

approx 0.5.1 development
criterion 0.5 development
maplit 1 development
rand_distr 0.4 development
serde_json 1 development
itertools 0.11
lace_consts 0.1.4
lace_data 0.1.2
lace_utils 0.1.2
rand 0.8
rand_xoshiro 0.6
regex 1.6.0
serde 1
special 0.10
thiserror 1.0.11

lace/lace_utils/Cargo.toml cargo

approx 0.5.1 development
rand 0.8

lace/Dockerfile docker

alpine 3.11 build
rust 1.42-alpine build

lace

Science Score: 44.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Lace: A Probabilistic Machine Learning tool for Scientific Discovery

Create an engine from a dataframe

Fit a model to the dataframe over 5000 steps of the fitting procedure

Show the statistical structure of the data -- which features are likely

dependent (predictive) on each other

The Problem

Short version

Long version

Example use cases

Who should not use Lace

Quick start

Installation

Examples

Predict the class of orbit given the satellite has a 75-minute

orbital period and that it has a missing value of geosynchronous

orbit longitude, and return epistemic uncertainty via Jensen-

Shannon divergence.

Find the top 10 most surprising (anomalous) orbital periods in

the table

Fitting a model

License

Academic use

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

crates.io: lace_utils

Rankings

Maintainers (3)

pypi.org: pylace

Rankings

Maintainers (3)

crates.io: lace_data

Rankings

Maintainers (3)

crates.io: lace_stats

Rankings

Maintainers (3)

crates.io: lace_consts

Rankings

Maintainers (3)

crates.io: lace_codebook

Rankings

Maintainers (3)

crates.io: lace_geweke

Rankings

Maintainers (3)

crates.io: lace_cc

Rankings

Maintainers (3)

crates.io: lace_metadata

Rankings

Maintainers (3)

crates.io: lace

Rankings

Maintainers (3)

crates.io: lace-cli

Rankings

Maintainers (1)

Dependencies