contextgem

ContextGem: Effortless LLM extraction from documents

https://github.com/rafidsarker/contextgem

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (3.3%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

ContextGem: Effortless LLM extraction from documents

Basic Info
  • Host: GitHub
  • Owner: rafidsarker
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Homepage: https://contextgem.dev/
  • Size: 20.8 MB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 8 months ago · Last pushed 8 months ago
Metadata Files
Readme Changelog Contributing License Code of conduct Citation Security

README.md

Identifi Refactor

This example project demonstrates how to refactor a PDF extraction tool using patterns from ContextGem.

Installation

bash pip install contextgem google-generativeai faiss-cpu pdfminer.six openai

Running Extraction

bash python run_extraction.py ./sample.pdf config/extractionSchema.yaml output.json

The resulting output.json contains an array of extraction results with confidence scores and page locations.

Running Classification

bash python run_classification.py output.json config/classificationOutputSchema.yaml classified.json --pdf ./sample.pdf

Both commands accept --model to specify the Gemini model ID.

Owner

  • Login: rafidsarker
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Shcherbak
    given-names: Sergii
    email: sergii@shcherbak.ai
title: "ContextGem: Effortless LLM extraction from documents"
date-released: 2025-04-02
url: "https://github.com/shcherbak-ai/contextgem"

GitHub Events

Total
  • Push event: 1
  • Pull request event: 2
  • Create event: 1
Last Year
  • Push event: 1
  • Pull request event: 2
  • Create event: 1

Dependencies

.github/workflows/bandit-security.yml actions
  • actions/cache v4 composite
  • actions/checkout v4 composite
  • actions/setup-python v5 composite
  • snok/install-poetry v1 composite
.github/workflows/ci-tests.yml actions
  • actions/cache v4 composite
  • actions/checkout v4 composite
  • actions/download-artifact v4 composite
  • actions/setup-python v5 composite
  • actions/upload-artifact v4 composite
  • schneegans/dynamic-badges-action v1.7.0 composite
  • snok/install-poetry v1 composite
.github/workflows/codeql.yml actions
  • actions/cache v4 composite
  • actions/checkout v4 composite
  • actions/setup-python v5 composite
  • github/codeql-action/analyze v3 composite
  • github/codeql-action/init v3 composite
  • snok/install-poetry v1 composite
.github/workflows/contributor-agreement-check.yml actions
  • actions/checkout v4 composite
  • actions/github-script v7 composite
.github/workflows/daily-import-test.yml actions
  • actions/setup-python v5 composite
.github/workflows/docs.yml actions
  • actions/cache v4 composite
  • actions/checkout v4 composite
  • actions/deploy-pages v4 composite
  • actions/setup-python v5 composite
  • actions/upload-pages-artifact v3 composite
  • snok/install-poetry v1 composite
dev/requirements/requirements.dev.txt pypi
  • accessible-pygments ==0.0.5 development
  • aiohappyeyeballs ==2.6.1 development
  • aiohttp ==3.12.13 development
  • aiolimiter ==1.2.1 development
  • aiosignal ==1.3.2 development
  • alabaster ==0.7.16 development
  • annotated-types ==0.7.0 development
  • anyio ==4.9.0 development
  • argcomplete ==3.6.2 development
  • async-timeout ==5.0.1 development
  • attrs ==25.3.0 development
  • babel ==2.17.0 development
  • bandit ==1.8.3 development
  • beautifulsoup4 ==4.13.4 development
  • black ==25.1.0 development
  • build ==1.2.2.post1 development
  • certifi ==2025.6.15 development
  • cfgv ==3.4.0 development
  • charset-normalizer ==3.4.2 development
  • click ==8.2.1 development
  • colorama ==0.4.6 development
  • coloredlogs ==15.0.1 development
  • commitizen ==4.8.3 development
  • coverage ==7.9.1 development
  • decli ==0.6.3 development
  • distlib ==0.3.9 development
  • distro ==1.9.0 development
  • docutils ==0.21.2 development
  • exceptiongroup ==1.3.0 development
  • fastjsonschema ==2.21.1 development
  • filelock ==3.18.0 development
  • flatbuffers ==25.2.10 development
  • frozenlist ==1.7.0 development
  • fsspec ==2025.5.1 development
  • h11 ==0.16.0 development
  • hf-xet ==1.1.3 development
  • httpcore ==1.0.9 development
  • httpx ==0.28.1 development
  • httpx-aiohttp ==0.1.6 development
  • huggingface-hub ==0.33.0 development
  • humanfriendly ==10.0 development
  • identify ==2.6.12 development
  • idna ==3.10 development
  • imagesize ==1.4.1 development
  • importlib-metadata ==8.7.0 development
  • iniconfig ==2.1.0 development
  • isort ==6.0.1 development
  • jinja2 ==3.1.6 development
  • jiter ==0.10.0 development
  • jsonschema ==4.24.0 development
  • jsonschema-specifications ==2025.4.1 development
  • jupyter-core ==5.8.1 development
  • litellm ==1.71.1 development
  • loguru ==0.7.3 development
  • lxml ==5.4.0 development
  • markdown-it-py ==3.0.0 development
  • markupsafe ==3.0.2 development
  • mdurl ==0.1.2 development
  • mpmath ==1.3.0 development
  • multidict ==6.4.4 development
  • mypy-extensions ==1.1.0 development
  • nbformat ==5.10.4 development
  • nodeenv ==1.9.1 development
  • numpy ==2.2.6 development
  • onnxruntime ==1.22.0 development
  • openai ==1.86.0 development
  • packaging ==25.0 development
  • pathspec ==0.12.1 development
  • pbr ==6.1.1 development
  • pip ==25.1.1 development
  • pip-tools ==7.4.1 development
  • platformdirs ==4.3.8 development
  • pluggy ==1.6.0 development
  • pre-commit ==4.2.0 development
  • prompt-toolkit ==3.0.51 development
  • propcache ==0.3.2 development
  • protobuf ==6.31.1 development
  • pydantic ==2.11.7 development
  • pydantic-core ==2.33.2 development
  • pydata-sphinx-theme ==0.15.4 development
  • pygments ==2.19.1 development
  • pyproject-hooks ==1.2.0 development
  • pyreadline3 ==3.5.4 development
  • pytest ==8.4.0 development
  • pytest-cov ==6.2.1 development
  • pytest-recording ==0.13.4 development
  • python-dotenv ==1.1.0 development
  • python-ulid ==3.0.0 development
  • pywin32 ==310 development
  • pyyaml ==6.0.2 development
  • questionary ==2.1.0 development
  • referencing ==0.36.2 development
  • regex ==2024.11.6 development
  • requests ==2.32.4 development
  • rich ==14.0.0 development
  • rpds-py ==0.25.1 development
  • setuptools ==80.9.0 development
  • sniffio ==1.3.1 development
  • snowballstemmer ==3.0.1 development
  • soupsieve ==2.7 development
  • sphinx ==7.4.7 development
  • sphinx-autodoc-typehints ==2.3.0 development
  • sphinx-book-theme ==1.1.4 development
  • sphinx-copybutton ==0.5.2 development
  • sphinx-design ==0.6.1 development
  • sphinx-sitemap ==2.6.0 development
  • sphinxcontrib-applehelp ==2.0.0 development
  • sphinxcontrib-devhelp ==2.0.0 development
  • sphinxcontrib-htmlhelp ==2.1.0 development
  • sphinxcontrib-jsmath ==1.0.1 development
  • sphinxcontrib-qthelp ==2.0.0 development
  • sphinxcontrib-serializinghtml ==2.0.0 development
  • sphinxext-opengraph ==0.9.1 development
  • stevedore ==5.4.1 development
  • sympy ==1.14.0 development
  • termcolor ==3.1.0 development
  • tiktoken ==0.9.0 development
  • tokenizers ==0.21.1 development
  • tomli ==2.2.1 development
  • tomlkit ==0.13.3 development
  • tqdm ==4.67.1 development
  • traitlets ==5.14.3 development
  • typing-extensions ==4.14.0 development
  • typing-inspection ==0.4.1 development
  • urllib3 ==1.26.20 development
  • urllib3 ==2.4.0 development
  • vcrpy ==7.0.0 development
  • virtualenv ==20.31.2 development
  • wcwidth ==0.2.13 development
  • wheel ==0.45.1 development
  • win32-setctime ==1.2.0 development
  • wrapt ==1.17.2 development
  • wtpsplit-lite ==0.2.0 development
  • yarl ==1.20.1 development
  • zipp ==3.23.0 development
dev/requirements/requirements.main.txt pypi
  • aiohappyeyeballs ==2.6.1 development
  • aiohttp ==3.12.13 development
  • aiolimiter ==1.2.1 development
  • aiosignal ==1.3.2 development
  • annotated-types ==0.7.0 development
  • anyio ==4.9.0 development
  • async-timeout ==5.0.1 development
  • attrs ==25.3.0 development
  • certifi ==2025.6.15 development
  • charset-normalizer ==3.4.2 development
  • click ==8.2.1 development
  • colorama ==0.4.6 development
  • coloredlogs ==15.0.1 development
  • distro ==1.9.0 development
  • exceptiongroup ==1.3.0 development
  • filelock ==3.18.0 development
  • flatbuffers ==25.2.10 development
  • frozenlist ==1.7.0 development
  • fsspec ==2025.5.1 development
  • h11 ==0.16.0 development
  • hf-xet ==1.1.3 development
  • httpcore ==1.0.9 development
  • httpx ==0.28.1 development
  • httpx-aiohttp ==0.1.6 development
  • huggingface-hub ==0.33.0 development
  • humanfriendly ==10.0 development
  • idna ==3.10 development
  • importlib-metadata ==8.7.0 development
  • jinja2 ==3.1.6 development
  • jiter ==0.10.0 development
  • jsonschema ==4.24.0 development
  • jsonschema-specifications ==2025.4.1 development
  • litellm ==1.71.1 development
  • loguru ==0.7.3 development
  • lxml ==5.4.0 development
  • markupsafe ==3.0.2 development
  • mpmath ==1.3.0 development
  • multidict ==6.4.4 development
  • numpy ==2.2.6 development
  • onnxruntime ==1.22.0 development
  • openai ==1.86.0 development
  • packaging ==25.0 development
  • propcache ==0.3.2 development
  • protobuf ==6.31.1 development
  • pydantic ==2.11.7 development
  • pydantic-core ==2.33.2 development
  • pyreadline3 ==3.5.4 development
  • python-dotenv ==1.1.0 development
  • python-ulid ==3.0.0 development
  • pyyaml ==6.0.2 development
  • referencing ==0.36.2 development
  • regex ==2024.11.6 development
  • requests ==2.32.4 development
  • rpds-py ==0.25.1 development
  • sniffio ==1.3.1 development
  • sympy ==1.14.0 development
  • tiktoken ==0.9.0 development
  • tokenizers ==0.21.1 development
  • tqdm ==4.67.1 development
  • typing-extensions ==4.14.0 development
  • typing-inspection ==0.4.1 development
  • urllib3 ==1.26.20 development
  • urllib3 ==2.4.0 development
  • win32-setctime ==1.2.0 development
  • wtpsplit-lite ==0.2.0 development
  • yarl ==1.20.1 development
  • zipp ==3.23.0 development
poetry.lock pypi
  • 135 dependencies
pyproject.toml pypi
  • bandit ^1.8.3 develop
  • black ^25.1.0 develop
  • commitizen ^4.5.1 develop
  • coverage ^7.6.12 develop
  • isort ^6.0.1 develop
  • nbformat ^5.10.4 develop
  • pip-tools ^7.4.1 develop
  • pre-commit ^4.1.0 develop
  • pytest ^8.3.4 develop
  • pytest-cov ^6.0.0 develop
  • pytest-recording ^0.13.4 develop
  • python-dotenv ^1.0.1 develop
  • sphinx >=7.0.0,<8.0.0 develop
  • sphinx-autodoc-typehints <3.0.0 develop
  • sphinx-book-theme ^1.1.4 develop
  • sphinx-copybutton ^0.5.2 develop
  • sphinx-design ^0.6.1 develop
  • sphinx-sitemap ^2.6.0 develop
  • sphinxext-opengraph ^0.9.1 develop
  • aiolimiter (>=1.2.1,<2.0.0)
  • jinja2 (>=3.1.5,<4.0.0)
  • litellm (>=1.68.0,<1.71.2)
  • loguru (>=0.7.3,<0.8.0)
  • lxml (>=5.4.0,<6.0.0)
  • pydantic (>=2.10.6,<3.0.0)
  • python-ulid (>=3.0.0,<4.0.0)
  • wtpsplit-lite (>=0.2.0,<0.3.0)