https://github.com/centre-for-humanities-computing/lex-db
A repository for interacting with the lex database for the Lex AI project.
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.7%) to scientific vocabulary
Repository
A repository for interacting with the lex database for the Lex AI project.
Basic Info
- Host: GitHub
- Owner: centre-for-humanities-computing
- Language: Python
- Default Branch: main
- Size: 153 KB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 4
- Releases: 2
Metadata Files
README.md
lex-db
A repository for interacting with the lex database for the Lex AI project. This project provides a wrapper around a SQLite database to enable querying encyclopedia articles via API requests, supporting both vector (semantic) search and full-text/keyword search.
Features
- SQLite database access with
sqlite-vecfor vector search - Full-text search capabilities using FTS5
- FastAPI-based REST API with automatic OpenAPI documentation
- Hybrid querying via metadata filtering and text search
- Vector index management and semantic search
Requirements
- Python 3.12+
- Astral UV for package management
- SQLite compiled with the
sqlite-vecextension (required for vector search)
Installation
Clone the repository:
bash git clone https://github.com/yourusername/lex-db.git cd lex-dbInstall dependencies using Make:
bash make installCreate a
.envfile (or modify the existing one) to set the database path: ```Database settings
DATABASEURL=PATH/TO/DBFILE.db ```
Usage
Scripts
create_fts_index.py: Creates a full-text search index on a specified column in a table.bash uv run src/scripts/create_fts_index.py <table_name> <column_name>create_vector_index.py: Creates a new vector index for semantic search on a given column.bash uv run src/scripts/create_vector_index.py <table_name> <column_name>update_vector_indexes.py: Populates vector indexes with embeddings using OpenAI or another embedding provider.bash uv run src/scripts/update_vector_indexes.py
⚠️ Note: While
create_openai_embedding_batches.pyandadd_batch_embeddings_to_index.pyexist for batch processing, they are not recommended due to reliability issues with the OpenAI Batch API. Useupdate_vector_indexes.pyinstead.
Running the API Server
Start the FastAPI server:
bash
make run
The server will be available at http://0.0.0.0:8000.
API Endpoints
List Database Tables
GET /api/tables- Returns a list of all tables in the database.
- Example response:
json { "tables": ["articles", "vector_index", "metadata"] }
Filter Articles by Metadata
GET /api/articles- Retrieve articles filtered by ID or full-text search query.
- Supports optional query parameters:
query: Text-based search in article content.ids: Filter by article IDs (supports comma-separated string, JSON array, or repeatedidsparameter).limit: Maximum number of results (1–100, default: 50).
Examples:
- By IDs only (comma-separated):
GET /api/articles?ids=1,2,5
- By IDs (repeated parameters):
GET /api/articles?ids=1&ids=2&ids=5
- By IDs and text search:
GET /api/articles?query=Rundetårn&ids=1,2&limit=10
- Full-text search only:
GET /api/articles?query=Denmark
Response:
Returns structured search results including matched articles with metadata and scores.
Vector Search
POST /api/vector-search/indexes/{index_name}/query- Perform semantic search on a specific vector index.
- Path Parameter:
index_name: Name of the vector index to search.- Request Body (JSON):
json { "query_text": "What is the capital of Denmark?", "top_k": 5 } query_text: The search query (required).top_k: Number of top results to return (optional, default: 5).
Example Request: ```http POST /api/vector-search/indexes/article_embeddings/query Content-Type: application/json
{ "querytext": "Scandinavian history", "topk": 3 } ```
Response:
Returns a list of semantically similar documents with metadata and similarity scores.
List All Vector Indexes
GET /api/vector-search/indexes- Retrieve metadata for all available vector indexes.
- Example response:
json [ { "index_name": "article_embeddings", "embedding_model": "text-embedding-3-small", "dimension": 1536, "created_at": "2025-04-05T12:00:00Z" } ]
Get Metadata for a Specific Vector Index
GET /api/vector-search/indexes/{index_name}- Retrieve metadata for a specific vector index.
- Path Parameter:
index_name: The name of the vector index.- Returns details such as model used, dimension, and creation timestamp.
API Documentation
Once the server is running, access auto-generated API documentation at:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
The OpenAPI 3.1 specification is available at:
/openapi/openapi.yaml
You can generate clients in various languages using OpenAPI Generator:
bash
openapi-generator-cli generate -i openapi/openapi.yaml -g <language> -o ./client
Replace <language> with your target (e.g., python, typescript-fetch, java).
Development
This project uses a Makefile to streamline development tasks:
| Command | Description |
|-------------------------|-----------|
| make install | Install dependencies |
| make run | Start the API server |
| make lint | Format code and fix lint issues (using Ruff) |
| make lint-check | Check formatting and linting without applying fixes |
| make static-type-check| Run static type checking with Mypy |
| make test | Run tests using Pytest |
| make pr | Run all pre-PR checks (linting, type checking, testing) |
| make help | Show all available commands |
License
N/A
Owner
- Name: Center for Humanities Computing Aarhus
- Login: centre-for-humanities-computing
- Kind: organization
- Email: chcaa@cas.au.dk
- Location: Aarhus, Denmark
- Website: https://chc.au.dk/
- Repositories: 130
- Profile: https://github.com/centre-for-humanities-computing
GitHub Events
Total
- Create event: 1
- Release event: 1
- Issues event: 6
- Push event: 3
- Public event: 1
- Pull request event: 2
Last Year
- Create event: 1
- Release event: 1
- Issues event: 6
- Push event: 3
- Public event: 1
- Pull request event: 2
Dependencies
- fastapi >=0.110.0
- mypy >=1.15.0
- openai >=1.79.0
- pydantic >=2.6.0
- pydantic-settings >=2.1.0
- pytest >=8.3.5
- pytest-cov >=6.1.1
- python-dotenv >=1.0.0
- ruff >=0.11.9
- sentence-transformers >=4.1.0
- setuptools >=80.7.1
- sqlite-utils >=3.38
- sqlite-vec >=0.1.6
- tiktoken >=0.9.0
- types-pyyaml >=6.0.12.20250516
- types-setuptools >=80.8.0.20250521
- uvicorn >=0.27.0
- annotated-types 0.7.0
- anyio 4.9.0
- certifi 2025.4.26
- charset-normalizer 3.4.2
- click 8.2.0
- click-default-group 1.2.4
- colorama 0.4.6
- coverage 7.8.0
- distro 1.9.0
- fastapi 0.115.12
- filelock 3.18.0
- fsspec 2025.3.2
- h11 0.16.0
- httpcore 1.0.9
- httpx 0.28.1
- huggingface-hub 0.31.4
- idna 3.10
- iniconfig 2.1.0
- jinja2 3.1.6
- jiter 0.10.0
- joblib 1.5.0
- lex-db 0.1.0
- markupsafe 3.0.2
- mpmath 1.3.0
- mypy 1.15.0
- mypy-extensions 1.1.0
- networkx 3.4.2
- numpy 2.2.6
- nvidia-cublas-cu12 12.6.4.1
- nvidia-cuda-cupti-cu12 12.6.80
- nvidia-cuda-nvrtc-cu12 12.6.77
- nvidia-cuda-runtime-cu12 12.6.77
- nvidia-cudnn-cu12 9.5.1.17
- nvidia-cufft-cu12 11.3.0.4
- nvidia-cufile-cu12 1.11.1.6
- nvidia-curand-cu12 10.3.7.77
- nvidia-cusolver-cu12 11.7.1.2
- nvidia-cusparse-cu12 12.5.4.2
- nvidia-cusparselt-cu12 0.6.3
- nvidia-nccl-cu12 2.26.2
- nvidia-nvjitlink-cu12 12.6.85
- nvidia-nvtx-cu12 12.6.77
- openai 1.79.0
- packaging 25.0
- pillow 11.2.1
- pluggy 1.5.0
- pydantic 2.11.4
- pydantic-core 2.33.2
- pydantic-settings 2.9.1
- pytest 8.3.5
- pytest-cov 6.1.1
- python-dateutil 2.9.0.post0
- python-dotenv 1.1.0
- pyyaml 6.0.2
- regex 2024.11.6
- requests 2.32.3
- ruff 0.11.9
- safetensors 0.5.3
- scikit-learn 1.6.1
- scipy 1.15.3
- sentence-transformers 4.1.0
- setuptools 80.7.1
- six 1.17.0
- sniffio 1.3.1
- sqlite-fts4 1.0.3
- sqlite-utils 3.38
- sqlite-vec 0.1.6
- starlette 0.46.2
- sympy 1.14.0
- tabulate 0.9.0
- threadpoolctl 3.6.0
- tiktoken 0.9.0
- tokenizers 0.21.1
- torch 2.7.0
- tqdm 4.67.1
- transformers 4.51.3
- triton 3.3.0
- types-pyyaml 6.0.12.20250516
- types-setuptools 80.8.0.20250521
- typing-extensions 4.13.2
- typing-inspection 0.4.0
- urllib3 2.4.0
- uvicorn 0.34.2