lmfusion

A framework for building ai applications that reference physical data.

https://github.com/davidimprovz/lmfusion

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.0%) to scientific vocabulary

Keywords

ai geospatial llm python

Last synced: 6 months ago · JSON representation

Repository

A framework for building ai applications that reference physical data.

Basic Info

Host: GitHub
Owner: davidimprovz
License: other
Language: Jupyter Notebook
Default Branch: main
Homepage:
Size: 20.5 KB

Statistics

Stars: 1
Watchers: 2
Forks: 1
Open Issues: 3
Releases: 0

Topics

ai geospatial llm python

Created almost 2 years ago · Last pushed about 1 year ago

Metadata Files

Readme License Citation

LMFusion

An open framework for building large model applications that can reference physical data. LMFusion aims to provide an easy way to incorporate physical reality in reasoning applications (e.g., AI agents) in a simple and understandable way. It consists of a set of classes for structured geo-referencing, retreival, and model fine-tuning using geospatial data. It also provides integrations to services useful for loading, working with, and visualizing this data.

Use LMFusion to make accessing and utilizing geospatial data (e.g., satellite imagery, maps, vector data, etc) easier when building applications that use large ai models that need to reason about the world.

Use cases (notebooks coming soon!)

Why LMFusion?

This module is inspired by the writings of Yann Lecun (Meta) on model grounding in which he has highlighted why models require a mechanism for physically grounded reasoning. While large models perform well at next-token prediction, they do not have access to physical information and therefore can fail when asked for real-world predictions. This problem is still an open one. We assume that we are still in the beginning stages of modern AI with many breakthroughs to come.

Significant advances have been made in programmatic usage of models. Some projects we're familiar with include LangChain, LlamaIndex, and DSPy. These projects were some of the first to explore how to incorporate LLMs into structured pipelines. More recent engineering efforts have focused on enabling LLMs to control real systems by giving them internet access, enforcing structured outputs, and enabling "tool calling". However, none of these modules have addressed physical grounding.

There are several startups we are aware of working on physical models and reasoning (see Archetype AI and Danti). These companies are closed source and are building products with limited scope for the markets they serve as opposed to releasing frameworks for reuse. LMFusion is focused on the latter.

To understand what we mean by "limited scope", consider the spectrum of data necessary for grounding model reasoning. On one end of the data spectrum is static content. This includes information about physical systems that doesn't change. Examples that come to mind include physics and mathematical models. On the other end of the spectrum is dynamic data: Any real-time information of the physical world that gets sensed by cameras, accelerometers, etc. To date, we know of no open architectures that enable reasoning applications to incorporate these data simply and reliably. More to the point, little progress has been made on geospatial data that sits in between static and dynamic (e.g., known locations, sizes / shapes of objects, earth observation data, and characteristics of geographic areas). To be clear, great strides have been made in making these data accessible and programmable at scale. Notable examples include Google Earth Engine, Microsoft Planetary Computer, and NASA's Planetary Data System. However, outside of academic research, we have not seen a module that ties these into applications that seek to reason over them using large models. We think there's tremendous value in making this possible.

Large Models v. Frameworks

There has been some discussion in academia about whether RAG frameworks will survive the advance of language models. The argument is that RAG may be irrelvant in light of super-intelligent, large-context models.

In our view, these arguments do not seem to appreciate the challenge of grounding models in physical reality. The world is always changing and producing new data. Case in point: Daily open-source earth observation imagery generated by satellites is in the TB range! With the average cost for training a new large foundation model in the many millions (billions?) of dollars, it does not yet seem feasible to dismiss the importance of supplementing large models with new data.

Moreover, we observe that smaller, quantized foundation models typically used by the opensource community are not as competitive as their larger parents, which were built by corporations with $B budgets. Therefore, we assume that the opensource community has even more to gain by an open solution to physical data grounding.

Getting Started (tbd)

pip install lmfusion

Call for Developers

If you'd like to help us build, we welcome contributions. Please reach out. You can also check our RFQs and put in a pull request.

Owner

Name: dimprovz
Login: davidimprovz
Kind: user
Company: Improvz

Website: improvz.com
Twitter: d_comfe
Repositories: 1
Profile: https://github.com/davidimprovz

Full-stack python ai dev freeing the world from drudgery one function at a time. Tip me: 0x499266ABF6580aDCA5675f640fF621e27ee5Bb43

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science