canada-labour-research-assistant
The Canada Labour Research Assistant (CLaRA) is a privacy-first LLM-powered RAG AI assistant proposing Easily Verifiable Direct Quotations (EVDQ) to mitigate hallucinations in answering questions about Canadian labour laws, standards, and regulations. It works entirely offline and locally, guaranteeing the confidentiality of your conversations.
https://github.com/pierreolivierbonin/canada-labour-research-assistant
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.5%) to scientific vocabulary
Keywords
Repository
The Canada Labour Research Assistant (CLaRA) is a privacy-first LLM-powered RAG AI assistant proposing Easily Verifiable Direct Quotations (EVDQ) to mitigate hallucinations in answering questions about Canadian labour laws, standards, and regulations. It works entirely offline and locally, guaranteeing the confidentiality of your conversations.
Basic Info
Statistics
- Stars: 7
- Watchers: 2
- Forks: 3
- Open Issues: 20
- Releases: 1
Topics
Metadata Files
README.md
Canada Labour Research Assistant (CLaRA)
An LLM-powered assistant that directly quotes retrieved passages
Key Features • Quick start • Use Case & Portability • Telemetry & API Calls • Contributions • Acknowledgements
The Canada Labour Research Assistant (CLaRA) is a privacy-first LLM-powered research assistant that directly quotes sources to mitigate hallucinations and construct context-grounded answers to questions about Canadian labour laws, standards, and regulations. It can be run locally and without any Internet connection, thus guaranteeing the confidentiality of your conversations.
Preview (click to expand)

CLaRA comes in two builds
- one running on an Ollama serving backend, suitable for experimentation or low user numbers.
- one running on a vLLM serving backend, suitable for use cases requiring more scalability.
Key Features
✅ Retrieval-Augmented Generation (RAG) to infuse context in each query.
✅ Chunking strategy to improve question answering.
✅ Metadata leveraging to improve question answering and make the information easily verifiable.
✅ Reranking to prioritize relevant sources when detecting a query mentioning legal provisions.
✅ Dynamic context window allocation to prevent source document chunks from getting truncated, and manage memory efficiently.
✅ Performance optimizations to reduce latency (database caching, tokenizer caching, response streaming).
✅ Locally Runs on CPU and/or consumer-grade GPUs for small and medium enterprises/organizations.
✅ Production-Ready for multiple scenarios with two builds offered out-of-the-box (Ollama or vLLM).
✅ Runs offline with no Internet connection required (see instructions further below).
✅ Guaranteed confidentiality as a result of local-and-offline runtime mode.
✅ Minimalist set of base dependencies for more portability and resilience (see pyproject.toml).
✅ Bring-Your-Own-Model with Ollama (or supported pre-trained models) and vLLM (or supported pre-trained models here).
✅ Bring-Your-Own-Inference-Provider and easily switch between two inference modes (local vs. remote) in the UI.
✅ RAG-enabled conversation history that includes previous document chunks for deeper context and research.
✅ UI Databases Dropdown to easily swap between databases on-the-fly.
✅ On-the-Fly LoRA Adapters for your Fine-Tuned Models. With vLLM, simply pass the path to your fine-tuned LoRA adapter.
✅ 3 runtime modes: normal, evaluation (to assess the LLM answers), or profiling (to track performance).
✅ Evaluation mode (still in early development) allows to measure the quality of responses generated.
✅ Profiling mode provides analytics to measure the impact of each function call, and component added/subtracted from the architecture.
✅ Streamlined Installation process in one easy step (*Ollama build only; we've streamlined the installation of the vLLM build nonetheless, see quick start - build #2 below).
Quick Start
How to set up this system for 100% local-and-off-the-Internet inference
Because models require tokenizers, and because the open source models we use both for the embedding of documents and for LLM inference are stored on [Hugging Face](https://huggingface.co/), the methods and functions coming with libraries like `sentence-transformers` are on the first call pulling the models and tokenizers, then saving a copy in a cache to increase future performance (see e.g. the definition of the `SentenceTransformer` class). Once you have downloaded the models, you can still use these libraries locally, without the need for any Internet connection. The same can be done with the LLM's tokenizer in order to avoid making external calls unnecessarily (this is what we've done with the [.tokenizers](.tokenizers) folder) For LLM inference, you can do the same thing and download the LLM model, store its main files, then run the system completely offline.
Build #1 - Ollama local server & (optional) remote server
#### Preliminary Steps Ensure you have Ollama installed and a bash terminal available. Then, clone this repo and cd into the new directory: ```sh git clone https://github.com/pierreolivierbonin/Canada-Labour-Research-Assistant.git cd canada-labour-research-assistant ``` #### All-in-one setup Run `./full_install_pipeline_ollama.sh`. If you prefer to do it one step at a time: #### Step 1 Install the virtual environment by running the following command in your bash terminal: ```sh ./setup/ollama_build/install_venv.sh ``` #### Step 2 Make sure your virtual environment is activated. Then, create the database by running the following command in a terminal: ```sh ./setup/create_or_update_database.sh ``` #### Step 3 You are now ready to launch the application with: ```sh ./run_app_ollama.sh ``` You can now enter the mode of your choice in the console to run the application. The default mode to enter in the console is 'local', *i.e.* **local mode**. It will run and use your machine to run the application, thereby protecting your privacy and data. Should you want to use **remote mode** and take advantage of third party compute for larger models and workloads, it is possible to do so, and to switch between each mode on-the-fly through the UI's toggle button. Please note that **the privacy of your conversations will not be guaranteed anymore** if you do so. To enable **remote mode**, simply add the necessary credentials in `.streamlit/secrets.toml`, following the format below: > authorization = "> api_url = "
Build #2 - vLLM local server & (optional) remote server
#### Preliminary Step For Windows users: [install WSL2](https://learn.microsoft.com/en-us/windows/wsl/install) to have a Linux kernel. Then, install the drivers as appropriate to run [GPU paravirtualization on WSL-Ubuntu](https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=WSL-Ubuntu&target_version=2.0). If you intend to use LoRA adapters, install `jq` by running `sudo apt-get install jq`. #### All-in-one setup Run `./full_install_pipeline_vllm.sh`. If you prefer to do it one step at a time: #### Step #1 Install the virtual environment by running: ```sh source ./setup/vllm_build/install_venv.sh ``` #### Step #2 Activate your virtual environemnt with `source .venv/bin/activate`, then run: ```sh source ./setup/create_or_update_database.sh ``` #### Step #3 Launch the application with: ```sh source ./run_app_vllm.sh ``` By default, **local mode** will run and use your machine to run the application, thereby protecting your privacy and data. **Please note:** while running on WSL, vLLM sometimes has trouble releasing memory once you shutdown or close your terminal. To make sure your memory is released, run `wsl --shutdown` in another terminal. Should you want to use **remote mode** and take advantage of third party compute for larger models and workloads, it is possible to do so, and to switch between each mode on-the-fly through the UI's toggle button. Please note that **the privacy of your conversations will not be guaranteed anymore** if you do so. To enable **remote mode**, simply add the necessary credentials in `.streamlit/secrets.toml`, following the format below: > authorization = "> api_url = "
Database Creation Explained & How to Create Your Own Knowledge Base
The application can be customized for your own use case by creating new databases. To add a new database:
- Create a JSON configuration file in the
collections/folder - Update
VectorDBDataFiles.included_databasesindb_config.pyto include your database - Run the database creation script - your database will be automatically created and included in the app
See below for detailed instructions.
Refer to collections/example.json for a template config file, or collections/examplewithcomments.txt for a detailed commented example with path format explanations.
Configuration
- Create or edit database configuration files in the
collections/folder. Each database is defined by a JSON file in this directory. - Configure database metadata in your JSON file:
name: The database identifieris_default: Set totrueto make this database the default selection in the UIsave_html: Set totrueto save HTML content locallylanguages: List of language codes (e.g.,["en", "fr"]) that your database supportsressource_name: Dictionary mapping language codes to display names for the UI (e.g.,{"en": "Labour", "fr": "Travail"})
- Add your data sources using these supported formats, organized by language:
- Web pages: Add URLs under the
"page"key. Each web page entry is an array with format["NAME", "URL", depth]where:depth = 0: Extract only the page itselfdepth = 1: Extract the page and all links within itdepth = 2: Extract the page, all links within it, and links within those links (2 levels deep)- Maximum depth limit is 2
- Legal pages: Add law URLs under the
"law"key as arrays with format["name", "URL"] - IPG pages: Add IPG URLs under the
"ipg"key, organized by language - PDF files: Add URLs or local file/folder paths under the
"pdf"key, organized by language. Local paths can be anywhere on your computer using OS-appropriate formats. - Page blacklist: Add URLs to exclude under the
"page_blacklist"key, organized by language - Note: Data sources must be organized by language codes (e.g.,
"en","fr"). You can support one or more languages per database
- Web pages: Add URLs under the
- Add your database to the application by updating
VectorDBDataFiles.included_databasesindb_config.py. Add your database name (must match the"name"field in your JSON file) to the list. For example:python included_databases = ["labour", "equity", "transport", "your_new_database"]
Supported Data Sources
- External PDFs: Direct URLs to PDF files
- Local PDFs: Absolute file paths to local PDFs anywhere on your computer (supports folder paths to include all PDFs in a directory). Use OS-appropriate path formats (e.g.,
C:/Documents/file.pdfon Windows,/home/user/Documents/file.pdfon Linux/Mac) - Web pages: URLs to web content (supports blacklisting specific pages)
Important: PDF files can be located anywhere on your computer (including the application folder, but avoid the
static/folder as it's managed automatically). Just specify the paths, the database script will automatically import and process them.
Building Your Database
Once you have created your JSON configuration file and updated VectorDBDataFiles.included_databases, run the database creation script:
bash
./setup/create_or_update_database.sh
This script will automatically:
1. Process all databases listed in VectorDBDataFiles.included_databases
2. Extract content from all configured sources in your JSON files
3. Create vector databases for RAG (Retrieval-Augmented Generation)
File Management
- PDF files are automatically downloaded to the
static/folder for offline access - Static files are accessible via
app/static/...URLs within the application - Note: Removing a database JSON file doesn't delete its files from the
static/folder
Use Case and Portability
The solution is designed so you can easily verify the information used by the LLM to construct its responses. To do so, 'direct quotations' mode will format and highlight relevant passages taken from the sources. You can click on these passages to directly go to the source and validate the information.
Using the current configuration of for webcrawling, you can create two distinct databases and swap between each of them in the UI. Each database includes the following documents:
Labour Database: * Canada Labour Code (CLC) * Canada Labour Standards and Regulations (CLSR) * Interpretations, Policies, and Guidelines (IPGs) * Canada webpages on topics covering: labour standards, occupational health and safety, etc.
Equity Database: * Workplace equity, etc.
Transport Database: * Acts and regulations related to transport.
Telemetry and API Calls
In an effort to ensure the highest standards of privacy protection, we have tested and confirmed that the system works offline, without any required Internet connection, thus guaranteeing your conversations remain private.
In addition, we have researched and taken the following measures:
* ChromaDB allows to disable telemetry, and we've done just that by following the instructions here.
* Ollama does not have any telemetry. See this explainer.
* Streamlit allows to disable telemetry, and we've done just that by turning gatherUsageStats to 'false'. See this explainer.
* vLLM allows opting out from telemetry using the DO_NOT_TRACK environment variable, and we've done just that. See the doc
* Hugging Face allows disabling calls to its website via the HF_HUB_OFFLINE environment variable, and we've done just that. See this PR
Roadmap
See the open issues for a full list of proposed features (and known issues).
Contributions
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
License
Distributed under the MIT License. See LICENSE for more information.
Acknowledgements
Special thanks to Hadi Hojjati @hhojjati98 for the stimulating discussions, brainstormings, and general advice. Both of us appreciated those.
We would like to thank everyone who participates in conducting open research as well as sharing knowledge and code.
In particular, we are grateful to the creators and contributors who made it possible to build CLaRA:
Webcrawling & Preprocessing
- Webcrawling and html processing: Beautiful Soup
- PDF files content extraction: PyMuPDF
Backend
- GPU Paravirtualization: NVIDIA
- Llama3.2-Instruct model: Meta
- Vector database: Chroma
- LLM inference serving: Ollama & vLLM
- Embedding models: SentenceTransformers and Hugging Face
Frontend
- User Interface: Streamlit
References
We are grateful to, and would like to acknowledge the AI research community. In particular, we drew ideas and inspiration from the following papers, articles, and conference:
Bengio, Yoshua. "Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?". Presentation given at the World Summit AI Canada on April 16 (2025).
Bengio, Yoshua, Michael Cohen, Damiano Fornasiere, Joumana Ghosn, Pietro Greiner, Matt MacDermott, Sören Mindermann et al. "Superintelligent agents pose catastrophic risks: Can scientist ai offer a safer path?." arXiv preprint arXiv:2502.15657 (2025).
He, Jia, Mukund Rungta, David Koleczek, Arshdeep Sekhon, Franklin X. Wang, and Sadid Hasan. "Does Prompt Formatting Have Any Impact on LLM Performance?." Online. https://arxiv.org/abs/2411.10541 arXiv:2411.10541 (2024).
Laban, Philippe, Tobias Schnabel, Paul N. Bennett, and Marti A. Hearst. "SummaC: Re-visiting NLI-based models for inconsistency detection in summarization." Transactions of the Association for Computational Linguistics 10 (2022): 163-177. Arxiv: https://arxiv.org/abs/2111.09525. Repository: https://github.com/tingofurro/summac
Lin, Chin-Yew, and Franz Josef Och. "Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics." In Proceedings of the 42nd annual meeting of the association for computational linguistics (ACL-04), pp. 605-612. https://aclanthology.org/P04-1077.pdf. 2004.
Wikipedia. "ROUGE (metric)." Online. https://en.wikipedia.org/wiki/ROUGE_(metric). 2023.
Wikipedia. "Longest common subsequence". Online. https://en.wikipedia.org/wiki/Longestcommonsubsequence. 2025.
Yeung, Matt. "Deterministic Quoting: Making LLMs Safer for Healthcare." Online. https://mattyyeung.github.io/deterministic-quoting (2024).
Citation
If you draw inspiration or use this solution, please cite the following work:
bibtex
@misc{clara-2025,
author = {Bonin, Pierre-Olivier, and Allard, Marc-André},
title = {Canada Labour Research Assistant (CLaRA)},
howpublished = {\url{https://github.com/pierreolivierbonin/Canada-Labour-Research-Assistant}},
year = {2025},
}
Contact
Owner
- Name: Pierre-Olivier Bonin, Ph.D.
- Login: pierreolivierbonin
- Kind: user
- Location: Montreal, Qc (Canada)
- Website: https://www.linkedin.com/in/pierreolivierbonin/
- Repositories: 2
- Profile: https://github.com/pierreolivierbonin
Data science, machine learning, and automation
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: Canada Labour Research Agent (CLaRA)
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Pierre-Olivier
family-names: Bonin
- given-names: Marc-André
family-names: Allard
repository-code: >-
https://github.com/pierreolivierbonin/Canada-Labour-Research-Assistant
abstract: >-
The Canada Labour Research Assistant (CLaRA) is an
LLM-powered research assistant that can directly quote
sources using retrieval-augmented generation to answer
questions about a wide range of topics related to Canadian
labour laws, standards, and regulations. All made with
open-source software.
keywords:
- direct quotations
- question-answering
- labour
- lcs-algorithm
- streamlit
- sentence-transformers
- chromadb
- llm
- llm inference
- local llm
- llm-serving
- retrieval-augmented generation
- ollama
- string-matching
- rag chatbot
- source referencing
- metadata
license: MIT
version: '1.0'
date-released: '2025-05-08'
GitHub Events
Total
- Create event: 7
- Issues event: 5
- Watch event: 2
- Delete event: 1
- Issue comment event: 35
- Member event: 1
- Push event: 78
- Public event: 1
- Pull request review event: 24
- Pull request review comment event: 9
- Pull request event: 14
- Fork event: 2
Last Year
- Create event: 7
- Issues event: 5
- Watch event: 2
- Delete event: 1
- Issue comment event: 35
- Member event: 1
- Push event: 78
- Public event: 1
- Pull request review event: 24
- Pull request review comment event: 9
- Pull request event: 14
- Fork event: 2
Committers
Last synced: 8 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Pierre-Olivier Bonin, Ph.D. | 3****n | 8 |
Issues and Pull Requests
Last synced: 8 months ago
All Time
- Total issues: 13
- Total pull requests: 2
- Average time to close issues: N/A
- Average time to close pull requests: 23 minutes
- Total issue authors: 2
- Total pull request authors: 1
- Average comments per issue: 0.46
- Average comments per pull request: 4.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 13
- Pull requests: 2
- Average time to close issues: N/A
- Average time to close pull requests: 23 minutes
- Issue authors: 2
- Pull request authors: 1
- Average comments per issue: 0.46
- Average comments per pull request: 4.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- pierreolivierbonin (13)
- marca116 (4)
Pull Request Authors
- marca116 (8)
- pierreolivierbonin (2)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- beautifulsoup4 ==4.13.4
- chromadb ==1.0.12
- flashinfer-python ==0.2.5
- llmcompressor ==0.5.1
- nltk ==3.9.1
- ollama ==0.5.1
- protobuf ==3.20.3
- pymupdf4llm ==0.0.24
- sentence-transformers ==4.1.0
- streamlit ==1.45.1
- vllm ==0.9.0.1
- beautifulsoup4 ==4.13.4
- chromadb ==1.0.12
- nltk ==3.9.1
- ollama ==0.4.2
- protobuf ==3.20.3
- pymupdf4llm ==0.0.24
- sentence-transformers ==3.0.1
- sentencepiece ==0.2.0
- streamlit ==1.45.1
- summac ==0.0.4
- torch ==2.6.0+cu124
- transformers >=4.8.1
- Deprecated ==1.2.18
- GitPython ==3.1.44
- Jinja2 ==3.1.6
- MarkupSafe ==3.0.2
- PyMuPDF ==1.26.0
- PyPika ==0.48.9
- PyYAML ==6.0.2
- Pygments ==2.19.1
- accelerate ==1.7.0
- aiohappyeyeballs ==2.6.1
- aiohttp ==3.12.7
- aiosignal ==1.3.2
- airportsdata ==20250523
- altair ==5.5.0
- annotated-types ==0.7.0
- anyio ==4.9.0
- asgiref ==3.8.1
- astor ==0.8.1
- attrs ==25.3.0
- backoff ==2.2.1
- bcrypt ==4.3.0
- beautifulsoup4 ==4.13.4
- blake3 ==1.0.5
- blinker ==1.9.0
- build ==1.2.2.post1
- cachetools ==5.5.2
- certifi ==2025.4.26
- charset-normalizer ==3.4.2
- chromadb ==1.0.12
- click ==8.2.1
- cloudpickle ==3.1.1
- coloredlogs ==15.0.1
- compressed-tensors ==0.9.4
- cupy-cuda12x ==13.4.1
- datasets ==3.6.0
- depyf ==0.18.0
- dill ==0.3.8
- diskcache ==5.6.3
- distro ==1.9.0
- dnspython ==2.7.0
- durationpy ==0.10
- einops ==0.8.1
- email_validator ==2.2.0
- fastapi ==0.115.9
- fastapi-cli ==0.0.7
- fastrlock ==0.8.3
- filelock ==3.18.0
- flashinfer-python ==0.2.5
- flatbuffers ==25.2.10
- frozenlist ==1.6.0
- fsspec ==2025.3.0
- gguf ==0.17.0
- gitdb ==4.0.12
- google-auth ==2.40.2
- googleapis-common-protos ==1.70.0
- grpcio ==1.72.1
- h11 ==0.16.0
- hf-xet ==1.1.2
- httpcore ==1.0.9
- httptools ==0.6.4
- httpx ==0.28.1
- huggingface-hub ==0.32.3
- humanfriendly ==10.0
- idna ==3.10
- importlib_metadata ==8.4.0
- importlib_resources ==6.5.2
- interegular ==0.3.3
- jiter ==0.10.0
- joblib ==1.5.1
- jsonschema ==4.24.0
- jsonschema-specifications ==2025.4.1
- kubernetes ==32.0.1
- lark ==1.2.2
- llguidance ==0.7.26
- llmcompressor ==0.5.1
- llvmlite ==0.44.0
- lm-format-enforcer ==0.10.11
- loguru ==0.7.3
- markdown-it-py ==3.0.0
- mdurl ==0.1.2
- mistral_common ==1.5.6
- mmh3 ==5.1.0
- mpmath ==1.3.0
- msgpack ==1.1.0
- msgspec ==0.19.0
- multidict ==6.4.4
- multiprocess ==0.70.16
- narwhals ==1.41.0
- nest-asyncio ==1.6.0
- networkx ==3.5
- ninja ==1.11.1.4
- nltk ==3.9.1
- numba ==0.61.2
- numpy ==1.26.4
- nvidia-cublas-cu12 ==12.6.4.1
- nvidia-cuda-cupti-cu12 ==12.6.80
- nvidia-cuda-nvrtc-cu12 ==12.6.77
- nvidia-cuda-runtime-cu12 ==12.6.77
- nvidia-cudnn-cu12 ==9.5.1.17
- nvidia-cufft-cu12 ==11.3.0.4
- nvidia-cufile-cu12 ==1.11.1.6
- nvidia-curand-cu12 ==10.3.7.77
- nvidia-cusolver-cu12 ==11.7.1.2
- nvidia-cusparse-cu12 ==12.5.4.2
- nvidia-cusparselt-cu12 ==0.6.3
- nvidia-ml-py ==12.575.51
- nvidia-nccl-cu12 ==2.26.2
- nvidia-nvjitlink-cu12 ==12.6.85
- nvidia-nvtx-cu12 ==12.6.77
- oauthlib ==3.2.2
- ollama ==0.5.1
- onnxruntime ==1.22.0
- openai ==1.82.1
- opencv-python-headless ==4.11.0.86
- opentelemetry-api ==1.27.0
- opentelemetry-exporter-otlp ==1.27.0
- opentelemetry-exporter-otlp-proto-common ==1.27.0
- opentelemetry-exporter-otlp-proto-grpc ==1.27.0
- opentelemetry-exporter-otlp-proto-http ==1.27.0
- opentelemetry-instrumentation ==0.48b0
- opentelemetry-instrumentation-asgi ==0.48b0
- opentelemetry-instrumentation-fastapi ==0.48b0
- opentelemetry-proto ==1.27.0
- opentelemetry-sdk ==1.27.0
- opentelemetry-semantic-conventions ==0.48b0
- opentelemetry-semantic-conventions-ai ==0.4.9
- opentelemetry-util-http ==0.48b0
- orjson ==3.10.18
- outlines ==0.1.11
- outlines_core ==0.1.26
- overrides ==7.7.0
- packaging ==24.2
- pandas ==2.2.3
- partial-json-parser ==0.2.1.1.post5
- pillow ==11.2.1
- posthog ==4.2.0
- prometheus-fastapi-instrumentator ==7.1.0
- prometheus_client ==0.22.1
- propcache ==0.3.1
- protobuf ==3.20.3
- psutil ==7.0.0
- py-cpuinfo ==9.0.0
- pyarrow ==20.0.0
- pyasn1 ==0.6.1
- pyasn1_modules ==0.4.2
- pycountry ==24.6.1
- pydantic ==2.11.5
- pydantic_core ==2.33.2
- pydeck ==0.9.1
- pymupdf4llm ==0.0.24
- pynvml ==12.0.0
- pyproject_hooks ==1.2.0
- python-dateutil ==2.9.0.post0
- python-dotenv ==1.1.0
- python-json-logger ==3.3.0
- python-multipart ==0.0.20
- pytz ==2025.2
- pyzmq ==26.4.0
- ray ==2.46.0
- referencing ==0.36.2
- regex ==2024.11.6
- requests ==2.32.3
- requests-oauthlib ==2.0.0
- rich ==14.0.0
- rich-toolkit ==0.14.7
- rpds-py ==0.25.1
- rsa ==4.9.1
- safetensors ==0.5.3
- scikit-learn ==1.6.1
- scipy ==1.15.3
- sentence-transformers ==4.1.0
- sentencepiece ==0.2.0
- setuptools ==79.0.1
- shellingham ==1.5.4
- six ==1.17.0
- smmap ==5.0.2
- sniffio ==1.3.1
- soupsieve ==2.7
- starlette ==0.45.3
- streamlit ==1.45.1
- sympy ==1.14.0
- tenacity ==9.1.2
- threadpoolctl ==3.6.0
- tiktoken ==0.9.0
- tokenizers ==0.21.1
- toml ==0.10.2
- torch ==2.7.0
- torchaudio ==2.7.0
- torchvision ==0.22.0
- tornado ==6.5.1
- tqdm ==4.67.1
- transformers ==4.52.4
- triton ==3.3.0
- typer ==0.16.0
- typing-inspection ==0.4.1
- typing_extensions ==4.14.0
- tzdata ==2025.2
- urllib3 ==2.4.0
- uv ==0.7.9
- uvicorn ==0.34.3
- uvloop ==0.21.0
- vllm ==0.9.0.1
- watchdog ==6.0.0
- watchfiles ==1.0.5
- websocket-client ==1.8.0
- websockets ==15.0.1
- wrapt ==1.17.2
- xformers ==0.0.30
- xgrammar ==0.1.19
- xxhash ==3.5.0
- yarl ==1.20.0
- zipp ==3.22.0