contextgem
ContextGem: Effortless LLM extraction from documents
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (3.3%) to scientific vocabulary
Repository
ContextGem: Effortless LLM extraction from documents
Basic Info
- Host: GitHub
- Owner: rafidsarker
- License: apache-2.0
- Language: Python
- Default Branch: main
- Homepage: https://contextgem.dev/
- Size: 20.8 MB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Identifi Refactor
This example project demonstrates how to refactor a PDF extraction tool using patterns from ContextGem.
Installation
bash
pip install contextgem google-generativeai faiss-cpu pdfminer.six openai
Running Extraction
bash
python run_extraction.py ./sample.pdf config/extractionSchema.yaml output.json
The resulting output.json contains an array of extraction results with
confidence scores and page locations.
Running Classification
bash
python run_classification.py output.json config/classificationOutputSchema.yaml classified.json --pdf ./sample.pdf
Both commands accept --model to specify the Gemini model ID.
Owner
- Login: rafidsarker
- Kind: user
- Repositories: 1
- Profile: https://github.com/rafidsarker
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Shcherbak
given-names: Sergii
email: sergii@shcherbak.ai
title: "ContextGem: Effortless LLM extraction from documents"
date-released: 2025-04-02
url: "https://github.com/shcherbak-ai/contextgem"
GitHub Events
Total
- Push event: 1
- Pull request event: 2
- Create event: 1
Last Year
- Push event: 1
- Pull request event: 2
- Create event: 1
Dependencies
- actions/cache v4 composite
- actions/checkout v4 composite
- actions/setup-python v5 composite
- snok/install-poetry v1 composite
- actions/cache v4 composite
- actions/checkout v4 composite
- actions/download-artifact v4 composite
- actions/setup-python v5 composite
- actions/upload-artifact v4 composite
- schneegans/dynamic-badges-action v1.7.0 composite
- snok/install-poetry v1 composite
- actions/cache v4 composite
- actions/checkout v4 composite
- actions/setup-python v5 composite
- github/codeql-action/analyze v3 composite
- github/codeql-action/init v3 composite
- snok/install-poetry v1 composite
- actions/checkout v4 composite
- actions/github-script v7 composite
- actions/setup-python v5 composite
- actions/cache v4 composite
- actions/checkout v4 composite
- actions/deploy-pages v4 composite
- actions/setup-python v5 composite
- actions/upload-pages-artifact v3 composite
- snok/install-poetry v1 composite
- accessible-pygments ==0.0.5 development
- aiohappyeyeballs ==2.6.1 development
- aiohttp ==3.12.13 development
- aiolimiter ==1.2.1 development
- aiosignal ==1.3.2 development
- alabaster ==0.7.16 development
- annotated-types ==0.7.0 development
- anyio ==4.9.0 development
- argcomplete ==3.6.2 development
- async-timeout ==5.0.1 development
- attrs ==25.3.0 development
- babel ==2.17.0 development
- bandit ==1.8.3 development
- beautifulsoup4 ==4.13.4 development
- black ==25.1.0 development
- build ==1.2.2.post1 development
- certifi ==2025.6.15 development
- cfgv ==3.4.0 development
- charset-normalizer ==3.4.2 development
- click ==8.2.1 development
- colorama ==0.4.6 development
- coloredlogs ==15.0.1 development
- commitizen ==4.8.3 development
- coverage ==7.9.1 development
- decli ==0.6.3 development
- distlib ==0.3.9 development
- distro ==1.9.0 development
- docutils ==0.21.2 development
- exceptiongroup ==1.3.0 development
- fastjsonschema ==2.21.1 development
- filelock ==3.18.0 development
- flatbuffers ==25.2.10 development
- frozenlist ==1.7.0 development
- fsspec ==2025.5.1 development
- h11 ==0.16.0 development
- hf-xet ==1.1.3 development
- httpcore ==1.0.9 development
- httpx ==0.28.1 development
- httpx-aiohttp ==0.1.6 development
- huggingface-hub ==0.33.0 development
- humanfriendly ==10.0 development
- identify ==2.6.12 development
- idna ==3.10 development
- imagesize ==1.4.1 development
- importlib-metadata ==8.7.0 development
- iniconfig ==2.1.0 development
- isort ==6.0.1 development
- jinja2 ==3.1.6 development
- jiter ==0.10.0 development
- jsonschema ==4.24.0 development
- jsonschema-specifications ==2025.4.1 development
- jupyter-core ==5.8.1 development
- litellm ==1.71.1 development
- loguru ==0.7.3 development
- lxml ==5.4.0 development
- markdown-it-py ==3.0.0 development
- markupsafe ==3.0.2 development
- mdurl ==0.1.2 development
- mpmath ==1.3.0 development
- multidict ==6.4.4 development
- mypy-extensions ==1.1.0 development
- nbformat ==5.10.4 development
- nodeenv ==1.9.1 development
- numpy ==2.2.6 development
- onnxruntime ==1.22.0 development
- openai ==1.86.0 development
- packaging ==25.0 development
- pathspec ==0.12.1 development
- pbr ==6.1.1 development
- pip ==25.1.1 development
- pip-tools ==7.4.1 development
- platformdirs ==4.3.8 development
- pluggy ==1.6.0 development
- pre-commit ==4.2.0 development
- prompt-toolkit ==3.0.51 development
- propcache ==0.3.2 development
- protobuf ==6.31.1 development
- pydantic ==2.11.7 development
- pydantic-core ==2.33.2 development
- pydata-sphinx-theme ==0.15.4 development
- pygments ==2.19.1 development
- pyproject-hooks ==1.2.0 development
- pyreadline3 ==3.5.4 development
- pytest ==8.4.0 development
- pytest-cov ==6.2.1 development
- pytest-recording ==0.13.4 development
- python-dotenv ==1.1.0 development
- python-ulid ==3.0.0 development
- pywin32 ==310 development
- pyyaml ==6.0.2 development
- questionary ==2.1.0 development
- referencing ==0.36.2 development
- regex ==2024.11.6 development
- requests ==2.32.4 development
- rich ==14.0.0 development
- rpds-py ==0.25.1 development
- setuptools ==80.9.0 development
- sniffio ==1.3.1 development
- snowballstemmer ==3.0.1 development
- soupsieve ==2.7 development
- sphinx ==7.4.7 development
- sphinx-autodoc-typehints ==2.3.0 development
- sphinx-book-theme ==1.1.4 development
- sphinx-copybutton ==0.5.2 development
- sphinx-design ==0.6.1 development
- sphinx-sitemap ==2.6.0 development
- sphinxcontrib-applehelp ==2.0.0 development
- sphinxcontrib-devhelp ==2.0.0 development
- sphinxcontrib-htmlhelp ==2.1.0 development
- sphinxcontrib-jsmath ==1.0.1 development
- sphinxcontrib-qthelp ==2.0.0 development
- sphinxcontrib-serializinghtml ==2.0.0 development
- sphinxext-opengraph ==0.9.1 development
- stevedore ==5.4.1 development
- sympy ==1.14.0 development
- termcolor ==3.1.0 development
- tiktoken ==0.9.0 development
- tokenizers ==0.21.1 development
- tomli ==2.2.1 development
- tomlkit ==0.13.3 development
- tqdm ==4.67.1 development
- traitlets ==5.14.3 development
- typing-extensions ==4.14.0 development
- typing-inspection ==0.4.1 development
- urllib3 ==1.26.20 development
- urllib3 ==2.4.0 development
- vcrpy ==7.0.0 development
- virtualenv ==20.31.2 development
- wcwidth ==0.2.13 development
- wheel ==0.45.1 development
- win32-setctime ==1.2.0 development
- wrapt ==1.17.2 development
- wtpsplit-lite ==0.2.0 development
- yarl ==1.20.1 development
- zipp ==3.23.0 development
- aiohappyeyeballs ==2.6.1 development
- aiohttp ==3.12.13 development
- aiolimiter ==1.2.1 development
- aiosignal ==1.3.2 development
- annotated-types ==0.7.0 development
- anyio ==4.9.0 development
- async-timeout ==5.0.1 development
- attrs ==25.3.0 development
- certifi ==2025.6.15 development
- charset-normalizer ==3.4.2 development
- click ==8.2.1 development
- colorama ==0.4.6 development
- coloredlogs ==15.0.1 development
- distro ==1.9.0 development
- exceptiongroup ==1.3.0 development
- filelock ==3.18.0 development
- flatbuffers ==25.2.10 development
- frozenlist ==1.7.0 development
- fsspec ==2025.5.1 development
- h11 ==0.16.0 development
- hf-xet ==1.1.3 development
- httpcore ==1.0.9 development
- httpx ==0.28.1 development
- httpx-aiohttp ==0.1.6 development
- huggingface-hub ==0.33.0 development
- humanfriendly ==10.0 development
- idna ==3.10 development
- importlib-metadata ==8.7.0 development
- jinja2 ==3.1.6 development
- jiter ==0.10.0 development
- jsonschema ==4.24.0 development
- jsonschema-specifications ==2025.4.1 development
- litellm ==1.71.1 development
- loguru ==0.7.3 development
- lxml ==5.4.0 development
- markupsafe ==3.0.2 development
- mpmath ==1.3.0 development
- multidict ==6.4.4 development
- numpy ==2.2.6 development
- onnxruntime ==1.22.0 development
- openai ==1.86.0 development
- packaging ==25.0 development
- propcache ==0.3.2 development
- protobuf ==6.31.1 development
- pydantic ==2.11.7 development
- pydantic-core ==2.33.2 development
- pyreadline3 ==3.5.4 development
- python-dotenv ==1.1.0 development
- python-ulid ==3.0.0 development
- pyyaml ==6.0.2 development
- referencing ==0.36.2 development
- regex ==2024.11.6 development
- requests ==2.32.4 development
- rpds-py ==0.25.1 development
- sniffio ==1.3.1 development
- sympy ==1.14.0 development
- tiktoken ==0.9.0 development
- tokenizers ==0.21.1 development
- tqdm ==4.67.1 development
- typing-extensions ==4.14.0 development
- typing-inspection ==0.4.1 development
- urllib3 ==1.26.20 development
- urllib3 ==2.4.0 development
- win32-setctime ==1.2.0 development
- wtpsplit-lite ==0.2.0 development
- yarl ==1.20.1 development
- zipp ==3.23.0 development
- 135 dependencies
- bandit ^1.8.3 develop
- black ^25.1.0 develop
- commitizen ^4.5.1 develop
- coverage ^7.6.12 develop
- isort ^6.0.1 develop
- nbformat ^5.10.4 develop
- pip-tools ^7.4.1 develop
- pre-commit ^4.1.0 develop
- pytest ^8.3.4 develop
- pytest-cov ^6.0.0 develop
- pytest-recording ^0.13.4 develop
- python-dotenv ^1.0.1 develop
- sphinx >=7.0.0,<8.0.0 develop
- sphinx-autodoc-typehints <3.0.0 develop
- sphinx-book-theme ^1.1.4 develop
- sphinx-copybutton ^0.5.2 develop
- sphinx-design ^0.6.1 develop
- sphinx-sitemap ^2.6.0 develop
- sphinxext-opengraph ^0.9.1 develop
- aiolimiter (>=1.2.1,<2.0.0)
- jinja2 (>=3.1.5,<4.0.0)
- litellm (>=1.68.0,<1.71.2)
- loguru (>=0.7.3,<0.8.0)
- lxml (>=5.4.0,<6.0.0)
- pydantic (>=2.10.6,<3.0.0)
- python-ulid (>=3.0.0,<4.0.0)
- wtpsplit-lite (>=0.2.0,<0.3.0)