ontogpt

LLM-based ontological extraction tools, including SPIRES

https://github.com/monarch-initiative/ontogpt

Science Score: 77.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 4 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
✓
Committers with academic emails
5 of 29 committers (17.2%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (16.5%) to scientific vocabulary

Keywords

ai chat-gpt data-modeling gpt-3 information-extraction language-models large-language-models linkml llm monarchinitiative named-entity-recognition ner nlp oaklib obofoundry relation-extraction

Keywords from Contributors

biocuration biopragmatics bioregistry

Last synced: 10 months ago · JSON representation ·

Repository

LLM-based ontological extraction tools, including SPIRES

Basic Info

Host: GitHub
Owner: monarch-initiative
License: bsd-3-clause
Language: Jupyter Notebook
Default Branch: main
Homepage: https://monarch-initiative.github.io/ontogpt/
Size: 81.6 MB

Statistics

Stars: 720
Watchers: 18
Forks: 98
Open Issues: 74
Releases: 50

Topics

ai chat-gpt data-modeling gpt-3 information-extraction language-models large-language-models linkml llm monarchinitiative named-entity-recognition ner nlp oaklib obofoundry relation-extraction

Created over 3 years ago · Last pushed 11 months ago

Metadata Files

Readme License Citation

OntoGPT

OntoGPT Logo

PyPI

Introduction

OntoGPT is a Python package for extracting structured information from text with large language models (LLMs), instruction prompts, and ontology-based grounding.

For more details, please see the full documentation.

Quick Start

OntoGPT runs on the command line, though there's also a minimal web app interface (see Web Application section below).

Ensure you have Python 3.9 or greater installed.
Install with pip:

bash pip install ontogpt
Set your OpenAI API key:

bash runoak set-apikey -e openai <your openai api key>
See the list of all OntoGPT commands:

bash ontogpt --help
Try a simple example of information extraction:

bash echo "One treatment for high blood pressure is carvedilol." > example.txt ontogpt extract -i example.txt -t drug

OntoGPT will retrieve the necessary ontologies and output results to the command line. Your output will provide all extracted objects under the heading extracted_object.

Web Application

There is a bare bones web application for running OntoGPT and viewing results.

First, install the required dependencies with pip by running the following command:

bash pip install ontogpt[web]

Then run this command to start the web application:

bash web-ontogpt

NOTE: We do not recommend hosting this webapp publicly without authentication.

Model APIs

OntoGPT uses the litellm package (https://litellm.vercel.app/) to interface with LLMs.

This means most APIs are supported, including OpenAI, Azure, Anthropic, Mistral, Replicate, and beyond.

The model name to use may be found from the command ontogpt list-models - use the name in the first column with the --model option.

In most cases, this will require setting the API key for a particular service as above:

bash runoak set-apikey -e anthropic-key <your anthropic api key>

Some endpoints, such as OpenAI models through Azure, require setting additional details. These may be set similarly:

bash runoak set-apikey -e azure-key <your azure api key> runoak set-apikey -e azure-base <your azure endpoint url> runoak set-apikey -e azure-version <your azure api version, e.g. "2023-05-15">

These details may also be set as environment variables as follows:

bash export AZURE_API_KEY="my-azure-api-key" export AZURE_API_BASE="https://example-endpoint.openai.azure.com" export AZURE_API_VERSION="2023-05-15"

Open Models

Open LLMs may be retrieved and run through the ollama package (https://ollama.com/).

You will need to install ollama (see the GitHub repo), and you may need to start it as a service with a command like ollama serve or sudo systemctl start ollama.

Then retrieve a model with ollama pull <modelname>, e.g., ollama pull llama3.

The model may then be used in OntoGPT by prefixing its name with ollama/, e.g., ollama/llama3, along with the --model option.

Some ollama models may not be listed in ontogpt list-models but the full list of downloaded LLMs can be seen with ollama list command.

Evaluations

OntoGPT's functions have been evaluated on test data. Please see the full documentation for details on these evaluations and how to reproduce them.

Related Projects

TALISMAN, a tool for generating summaries of functions enriched within a gene set. TALISMAN uses OntoGPT to work with LLMs.

Tutorials and Presentations

Presentation: "Staying grounded: assembling structured biological knowledge with help from large language models" - presented by Harry Caufield as part of the AgBioData Consortium webinar series (September 2023)
- Slides
- Video
Presentation: "Transforming unstructured biomedical texts with large language models" - presented by Harry Caufield as part of the BOSC track at ISMB/ECCB 2023 (July 2023)
- Slides
- Video
Presentation: "OntoGPT: A framework for working with ontologies and large language models" - talk by Chris Mungall at Joint Food Ontology Workgroup (May 2023)
- Slides
- Video

Citation

The information extraction approach used in OntoGPT, SPIRES, is described further in: Caufield JH, Hegde H, Emonet V, Harris NL, Joachimiak MP, Matentzoglu N, et al. Structured prompt interrogation and recursive extraction of semantics (SPIRES): A method for populating knowledge bases using zero-shot learning. Bioinformatics, Volume 40, Issue 3, March 2024, btae104, https://doi.org/10.1093/bioinformatics/btae104.

Acknowledgements

This project is part of the Monarch Initiative. We also gratefully acknowledge Bosch Research for their support of this research project.

Owner

Name: Monarch Initiative
Login: monarch-initiative
Kind: organization
Location: Globally-distributed team (see https://monarchinitiative.org/page/team)

Website: https://github.com/monarch-initiative/monarch-app/blob/master/README.md#about-monarch
Repositories: 118
Profile: https://github.com/monarch-initiative

Cross-species disease discovery and diagnosis

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use OntoGPT, please cite it as follows."
authors:
- family-names: "Caufield"
  given-names: "J. Harry"
  orcid: "https://orcid.org/0000-0001-5705-7831"
- family-names: "Hegde"
  given-names: "Harshad"
  orcid: "https://orcid.org/0000-0002-2411-565X"
- family-names: "Emonet"
  given-names: "Vincent"
  orcid: "https://orcid.org/0000-0002-1501-1082"
- family-names: "Harris"
  given-names: "Nomi L."
  orcid: "https://orcid.org/0000-0001-6315-3707"
- family-names: "Joachimiak"
  given-names: "Marcin P."
  orcid: "https://orcid.org/0000-0001-8175-045X"
- family-names: "Matentzoglu"
  given-names: "Nicolas"
  orcid: "https://orcid.org/0000-0002-7356-1779"
- family-names: "Kim"
  given-names: "HyeongSik"
  orcid: "https://orcid.org/0000-0002-3002-9838"
- family-names: "Moxon"
  given-names: "Sierra A.T."
  orcid: "https://orcid.org/0000-0002-8719-7760"
- family-names: "Reese"
  given-names: "Justin T."
  orcid: "https://orcid.org/0000-0002-2170-2250"
- family-names: "Haendel"
  given-names: "Melissa A."
  orcid: "https://orcid.org/0000-0001-9114-8737"
- family-names: "Robinson"
  given-names: "Peter N."
  orcid: "https://orcid.org/0000-0002-0736-9199"
- family-names: "Mungall"
  given-names: "Christopher J."
  orcid: "https://orcid.org/0000-0002-6601-2165"
title: "OntoGPT"
version: 0.3.8
date-released: 2024-02-08
url: "https://github.com/monarch-initiative/ontogpt"
doi: 10.5281/zenodo.7894107
preferred-citation:
  type: article
  authors:
  - family-names: "Caufield"
    given-names: "J. Harry"
    orcid: "https://orcid.org/0000-0001-5705-7831"
  - family-names: "Hegde"
    given-names: "Harshad"
    orcid: "https://orcid.org/0000-0002-2411-565X"
  - family-names: "Emonet"
    given-names: "Vincent"
    orcid: "https://orcid.org/0000-0002-1501-1082"
  - family-names: "Harris"
    given-names: "Nomi L."
    orcid: "https://orcid.org/0000-0001-6315-3707"
  - family-names: "Joachimiak"
    given-names: "Marcin P."
    orcid: "https://orcid.org/0000-0001-8175-045X"
  - family-names: "Matentzoglu"
    given-names: "Nicolas"
    orcid: "https://orcid.org/0000-0002-7356-1779"
  - family-names: "Kim"
    given-names: "HyeongSik"
    orcid: "https://orcid.org/0000-0002-3002-9838"
  - family-names: "Moxon"
    given-names: "Sierra A.T."
    orcid: "https://orcid.org/0000-0002-8719-7760"
  - family-names: "Reese"
    given-names: "Justin T."
    orcid: "https://orcid.org/0000-0002-2170-2250"
  - family-names: "Haendel"
    given-names: "Melissa A."
    orcid: "https://orcid.org/0000-0001-9114-8737"
  - family-names: "Robinson"
    given-names: "Peter N."
    orcid: "https://orcid.org/0000-0002-0736-9199"
  - family-names: "Mungall"
    given-names: "Christopher J."
    orcid: "https://orcid.org/0000-0002-6601-2165"
  doi: "10.1093/bioinformatics/btae104"
  journal: "Bioinformatics"
  title: "Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES): a method for populating knowledge bases using zero-shot learning"
  year: 2024
  url: https://doi.org/10.1093/bioinformatics/btae104

GitHub Events

Total

Create event: 40
Commit comment event: 2
Release event: 8
Issues event: 44
Watch event: 119
Delete event: 29
Issue comment event: 81
Push event: 114
Pull request review comment event: 1
Pull request review event: 2
Pull request event: 67
Fork event: 18

Last Year

Create event: 40
Commit comment event: 2
Release event: 8
Issues event: 44
Watch event: 119
Delete event: 29
Issue comment event: 81
Push event: 114
Pull request review comment event: 1
Pull request review event: 2
Pull request event: 67
Fork event: 18

Committers

Last synced: about 1 year ago

All Time

Total Commits: 1,632
Total Committers: 29
Avg Commits per committer: 56.276
Development Distribution Score (DDS): 0.192

Past Year

Commits: 556
Committers: 9
Avg Commits per committer: 61.778
Development Distribution Score (DDS): 0.025

Top Committers

Name	Email	Commits
caufieldjh	j**d@g**m	1,318
cmungall	c**m@b**g	112
Harshad Hegde	h**b@g**m	41
Justin Reese	j**e@g**m	39
AgranyaGitHub	a**4@g**m	38
Bill Duncan	w****n	16
marcin p. joachimiak	4****n	14
Agranya Ketha	a**3@b**u	9
Yaroslav Halchenko	d**n@o**m	6
Krishna Chaitanya Bandi	9****i	6
diatomsRcool	a**n@g**m	5
Nomi Harris	n****s	3
SLotreck	l**s@m**u	3
Sierra Taylor Moxon	s**r@g**m	3
Jim Balhoff	j**m@b**g	2
Nico Matentzoglu	n**u@g**m	2
Sujay Patil	s**l@g**m	2
Leonardo local Kubuntu 22.04	l**4@g**m	2
Andrew Su	a**u@s**u	1
Daiki Sakai	1****d	1
Daniel Bauer	d**r@s**e	1
Gabe Reder	g**r@g**m	1
Justin Reese	j**b@g**m	1
Mark A. Miller	M**M@l**v	1
Patrick Kalita	p**a@l**v	1
Tyler T. Procko	3****0	1
Patrick Golden	p**n@e**m	1
Vincent Emonet	v**t@g**m	1
unknown	t**k@g**m	1

Committer Domains (Top 20 + Academic)

lbl.gov: 2 email.unc.com: 1 senckenberg.de: 1 scripps.edu: 1 balhoff.org: 1 msu.edu: 1 onerussian.com: 1 berkeley.edu: 1 berkeleybop.org: 1

Issues and Pull Requests

Last synced: 11 months ago

All Time

Total issues: 206
Total pull requests: 239
Average time to close issues: 2 months
Average time to close pull requests: 8 days
Total issue authors: 49
Total pull request authors: 20
Average comments per issue: 1.94
Average comments per pull request: 0.84
Merged pull requests: 221
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 34
Pull requests: 63
Average time to close issues: 9 days
Average time to close pull requests: 1 day
Issue authors: 13
Pull request authors: 6
Average comments per issue: 1.06
Average comments per pull request: 0.56
Merged pull requests: 56
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

caufieldjh (107)
cmungall (8)
serenalotreck (5)
kluo9 (4)
yy20716 (4)
timalamenciak (3)
rebeccaito (3)
vemonet (3)
peiyaoli (3)
justaddcoffee (3)
dosumis (3)
nlharris (3)
kevinschaper (3)
Vishal-Joshi (2)
lzt5269 (2)

Pull Request Authors

caufieldjh (266)
cmungall (26)
realmarcin (8)
justaddcoffee (5)
serenalotreck (4)
hrshdhgd (4)
leokim-l (2)
yarikoptic (2)
ptgolden (2)
nlharris (2)
dnlbauer (2)
gkreder (2)
timalamenciak (2)
k-bandi (2)
andrewsu (1)

Top Labels

Issue Labels

enhancement (40) bug (21) documentation (11) question (9) template (8) annotation (3) evaluation (2) wontfix (1)

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 791 last-month

Total dependent packages: 0
Total dependent repositories: 1
Total versions: 48
Total maintainers: 3

pypi.org: ontogpt

OntoGPT is a Python package for extracting structured information from text with large language models (LLMs), instruction prompts, and ontology-based grounding.

Documentation: https://ontogpt.readthedocs.io/
License: BSD-3
Latest release: 1.0.17
published 11 months ago

Versions: 48
Dependent Packages: 0
Dependent Repositories: 1
Downloads: 791 Last month

Rankings

Dependent packages count: 10.0%

Downloads: 14.2%

Average: 15.3%

Dependent repos count: 21.7%

Maintainers (3)

cmungall hrshdhgd jharrycaufield

Last synced: 11 months ago

Dependencies

.github/workflows/deploy-docs.yml actions

actions/checkout v3 composite
actions/setup-python v3 composite
snok/install-poetry v1.3 composite

.github/workflows/pypi-publish.yml actions

actions/checkout v3.0.2 composite
actions/setup-python v4 composite
pypa/gh-action-pypi-publish v1.5.0 composite

.github/workflows/qc.yml actions

actions/checkout v3.0.2 composite
actions/setup-python v4 composite
snok/install-poetry v1.3.1 composite

poetry.lock pypi

332 dependencies

pyproject.toml pypi

mkdocs-mermaid2-plugin ^0.6.0 develop
pytest ^7.1.2 develop
setuptools >=65.5.0 develop
tox ^3.25.1 develop
Jinja2 ^3.1.2
SQLAlchemy ^1.4.32, !=1.4.46
aiohttp ^3.8.4
airium ^0.2.5
beautifulsoup4 ^4.11.1
bioc ^2.0.post5
cachier ^2.1.0
class-resolver >=0.4.2
click ^8.1.3
eutils ^0.6.0
fastapi ^0.88.0
gilda ^0.10.3
gpt4 ^0.0.1
greenlet !=2.0.2
httpx ^0.23.3
inflect ^6.0.2
inflection ^0.5.1
jsonlines ^3.1.0
langchain ^0.0.167
linkml ^1.4.10
linkml-owl ^0.2.7
linkml-runtime ^1.5.3
myst-parser ^0.18.1
nlpcloud ^1.0.39
oaklib ^0.5.6
openai ^0.27.4
pygpt4all ^1.1.0
python >=3.9,<3.9.7 || >3.9.7,<4.0
python-multipart ^0.0.5
recipe-scrapers ^14.35.0
requests-cache ^1.0.1
semsimian >=0.1.13
sphinx ^5.3.0
sphinx-autodoc-typehints ^1.19.4
sphinx-click ^4.3.0
sphinx-rtd-theme ^1.0.0
streamlit ^1.22.0
textract *
tiktoken ^0.3.3
uvicorn ^0.20.0
wikipedia ^1.4.0
wikipedia-api ^0.5.8

ontogpt

Science Score: 77.0%

Keywords

Keywords from Contributors

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

OntoGPT

Introduction

Quick Start

Web Application

Model APIs

Open Models

Evaluations

Related Projects

Tutorials and Presentations

Citation

Acknowledgements

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: ontogpt

Rankings

Maintainers (3)

Dependencies