canada-labour-research-assistant

The Canada Labour Research Assistant (CLaRA) is a privacy-first LLM-powered RAG AI assistant proposing Easily Verifiable Direct Quotations (EVDQ) to mitigate hallucinations in answering questions about Canadian labour laws, standards, and regulations. It works entirely offline and locally, guaranteeing the confidentiality of your conversations.

https://github.com/pierreolivierbonin/canada-labour-research-assistant

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.5%) to scientific vocabulary

Keywords

chatbot-application chatbot-framework labour labour-relations lcs-algorithm llm llm-inference llm-serving metadata ollama question-answering quotations rag-chatbot retrieval-augmented-generation sentence-transformers source-referencing streamlit string-matching-algorithms vector-database vllm

Last synced: 6 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: pierreolivierbonin
License: mit
Language: Python
Default Branch: main
Homepage:
Size: 11.3 MB

Statistics

Stars: 7
Watchers: 2
Forks: 3
Open Issues: 20
Releases: 1

Topics

Created 8 months ago · Last pushed 6 months ago

Metadata Files

Readme License Citation

README.md

Canada Labour Research Assistant (CLaRA)
_{An LLM-powered assistant that directly quotes retrieved passages}

Key Features • Quick start • Use Case & Portability • Telemetry & API Calls • Contributions • Acknowledgements

The Canada Labour Research Assistant (CLaRA) is a privacy-first LLM-powered research assistant that directly quotes sources to mitigate hallucinations and construct context-grounded answers to questions about Canadian labour laws, standards, and regulations. It can be run locally and without any Internet connection, thus guaranteeing the confidentiality of your conversations.

Preview (click to expand)

![alt text](.assets/CLaRA_demo.gif)

alt text

CLaRA comes in two builds

one running on an Ollama serving backend, suitable for experimentation or low user numbers.
one running on a vLLM serving backend, suitable for use cases requiring more scalability.

Key Features

✅ Retrieval-Augmented Generation (RAG) to infuse context in each query.
✅ Chunking strategy to improve question answering.
✅ Metadata leveraging to improve question answering and make the information easily verifiable.
✅ Reranking to prioritize relevant sources when detecting a query mentioning legal provisions.
✅ Dynamic context window allocation to prevent source document chunks from getting truncated, and manage memory efficiently.
✅ Performance optimizations to reduce latency (database caching, tokenizer caching, response streaming).
✅ Locally Runs on CPU and/or consumer-grade GPUs for small and medium enterprises/organizations.
✅ Production-Ready for multiple scenarios with two builds offered out-of-the-box (Ollama or vLLM).
✅ Runs offline with no Internet connection required (see instructions further below).
✅ Guaranteed confidentiality as a result of local-and-offline runtime mode.
✅ Minimalist set of base dependencies for more portability and resilience (see pyproject.toml).
✅ Bring-Your-Own-Model with Ollama (or supported pre-trained models) and vLLM (or supported pre-trained models here).
✅ Bring-Your-Own-Inference-Provider and easily switch between two inference modes (local vs. remote) in the UI.
✅ RAG-enabled conversation history that includes previous document chunks for deeper context and research. ✅ UI Databases Dropdown to easily swap between databases on-the-fly.
✅ On-the-Fly LoRA Adapters for your Fine-Tuned Models. With vLLM, simply pass the path to your fine-tuned LoRA adapter.
✅ 3 runtime modes: normal, evaluation (to assess the LLM answers), or profiling (to track performance).
✅ Evaluation mode (still in early development) allows to measure the quality of responses generated.
✅ Profiling mode provides analytics to measure the impact of each function call, and component added/subtracted from the architecture.
✅ Streamlined Installation process in one easy step (*Ollama build only; we've streamlined the installation of the vLLM build nonetheless, see quick start - build #2 below).

Quick Start

How to set up this system for 100% local-and-off-the-Internet inference

Because models require tokenizers, and because the open source models we use both for the embedding of documents and for LLM inference are stored on [Hugging Face](https://huggingface.co/), the methods and functions coming with libraries like `sentence-transformers` are on the first call pulling the models and tokenizers, then saving a copy in a cache to increase future performance (see e.g. the definition of the `SentenceTransformer` class). Once you have downloaded the models, you can still use these libraries locally, without the need for any Internet connection. The same can be done with the LLM's tokenizer in order to avoid making external calls unnecessarily (this is what we've done with the [.tokenizers](.tokenizers) folder) For LLM inference, you can do the same thing and download the LLM model, store its main files, then run the system completely offline.

Build #1 - Ollama local server & (optional) remote server

#### Preliminary Steps Ensure you have Ollama installed and a bash terminal available. Then, clone this repo and cd into the new directory: ```sh git clone https://github.com/pierreolivierbonin/Canada-Labour-Research-Assistant.git cd canada-labour-research-assistant ``` #### All-in-one setup Run `./full_install_pipeline_ollama.sh`. If you prefer to do it one step at a time: #### Step 1 Install the virtual environment by running the following command in your bash terminal: ```sh ./setup/ollama_build/install_venv.sh ``` #### Step 2 Make sure your virtual environment is activated. Then, create the database by running the following command in a terminal: ```sh ./setup/create_or_update_database.sh ``` #### Step 3 You are now ready to launch the application with: ```sh ./run_app_ollama.sh ``` You can now enter the mode of your choice in the console to run the application. The default mode to enter in the console is 'local', *i.e.* **local mode**. It will run and use your machine to run the application, thereby protecting your privacy and data. Should you want to use **remote mode** and take advantage of third party compute for larger models and workloads, it is possible to do so, and to switch between each mode on-the-fly through the UI's toggle button. Please note that **the privacy of your conversations will not be guaranteed anymore** if you do so. To enable **remote mode**, simply add the necessary credentials in `.streamlit/secrets.toml`, following the format below: > authorization = ""
> api_url = "" Then, enter 'remote' in the console when launching the app. Streamlit will pick up those credentials and use them to call the API you chose as your third-party inference provider. You can always switch back to local mode later on through the UI if need.

Build #2 - vLLM local server & (optional) remote server

#### Preliminary Step For Windows users: [install WSL2](https://learn.microsoft.com/en-us/windows/wsl/install) to have a Linux kernel. Then, install the drivers as appropriate to run [GPU paravirtualization on WSL-Ubuntu](https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=WSL-Ubuntu&target_version=2.0). If you intend to use LoRA adapters, install `jq` by running `sudo apt-get install jq`. #### All-in-one setup Run `./full_install_pipeline_vllm.sh`. If you prefer to do it one step at a time: #### Step #1 Install the virtual environment by running: ```sh source ./setup/vllm_build/install_venv.sh ``` #### Step #2 Activate your virtual environemnt with `source .venv/bin/activate`, then run: ```sh source ./setup/create_or_update_database.sh ``` #### Step #3 Launch the application with: ```sh source ./run_app_vllm.sh ``` By default, **local mode** will run and use your machine to run the application, thereby protecting your privacy and data. **Please note:** while running on WSL, vLLM sometimes has trouble releasing memory once you shutdown or close your terminal. To make sure your memory is released, run `wsl --shutdown` in another terminal. Should you want to use **remote mode** and take advantage of third party compute for larger models and workloads, it is possible to do so, and to switch between each mode on-the-fly through the UI's toggle button. Please note that **the privacy of your conversations will not be guaranteed anymore** if you do so. To enable **remote mode**, simply add the necessary credentials in `.streamlit/secrets.toml`, following the format below: > authorization = ""
> api_url = "" Once this setup completed, you will be able to switch to **remote mode** via the UI.

Database Creation Explained & How to Create Your Own Knowledge Base

The application can be customized for your own use case by creating new databases. To add a new database:

Create a JSON configuration file in the collections/ folder
Update VectorDBDataFiles.included_databases in db_config.py to include your database
Run the database creation script - your database will be automatically created and included in the app

See below for detailed instructions.

Refer to collections/example.json for a template config file, or collections/examplewithcomments.txt for a detailed commented example with path format explanations.

Configuration

Create or edit database configuration files in the collections/ folder. Each database is defined by a JSON file in this directory.
Configure database metadata in your JSON file:
- name: The database identifier
- is_default: Set to true to make this database the default selection in the UI
- save_html: Set to true to save HTML content locally
- languages: List of language codes (e.g., ["en", "fr"]) that your database supports
- ressource_name: Dictionary mapping language codes to display names for the UI (e.g., {"en": "Labour", "fr": "Travail"})
Add your data sources using these supported formats, organized by language:
- Web pages: Add URLs under the "page" key. Each web page entry is an array with format ["NAME", "URL", depth] where:
  - depth = 0: Extract only the page itself
  - depth = 1: Extract the page and all links within it
  - depth = 2: Extract the page, all links within it, and links within those links (2 levels deep)
  - Maximum depth limit is 2
- Legal pages: Add law URLs under the "law" key as arrays with format ["name", "URL"]
- IPG pages: Add IPG URLs under the "ipg" key, organized by language
- PDF files: Add URLs or local file/folder paths under the "pdf" key, organized by language. Local paths can be anywhere on your computer using OS-appropriate formats.
- Page blacklist: Add URLs to exclude under the "page_blacklist" key, organized by language
- Note: Data sources must be organized by language codes (e.g., "en", "fr"). You can support one or more languages per database
Add your database to the application by updating VectorDBDataFiles.included_databases in db_config.py. Add your database name (must match the "name" field in your JSON file) to the list. For example: python included_databases = ["labour", "equity", "transport", "your_new_database"]

Supported Data Sources

External PDFs: Direct URLs to PDF files
Local PDFs: Absolute file paths to local PDFs anywhere on your computer (supports folder paths to include all PDFs in a directory). Use OS-appropriate path formats (e.g., C:/Documents/file.pdf on Windows, /home/user/Documents/file.pdf on Linux/Mac)
Web pages: URLs to web content (supports blacklisting specific pages)

Important: PDF files can be located anywhere on your computer (including the application folder, but avoid the static/ folder as it's managed automatically). Just specify the paths, the database script will automatically import and process them.

Building Your Database

Once you have created your JSON configuration file and updated VectorDBDataFiles.included_databases, run the database creation script: bash ./setup/create_or_update_database.sh

This script will automatically: 1. Process all databases listed in VectorDBDataFiles.included_databases 2. Extract content from all configured sources in your JSON files 3. Create vector databases for RAG (Retrieval-Augmented Generation)

File Management

PDF files are automatically downloaded to the static/ folder for offline access
Static files are accessible via app/static/... URLs within the application
Note: Removing a database JSON file doesn't delete its files from the static/ folder

Use Case and Portability

The solution is designed so you can easily verify the information used by the LLM to construct its responses. To do so, 'direct quotations' mode will format and highlight relevant passages taken from the sources. You can click on these passages to directly go to the source and validate the information.

Using the current configuration of for webcrawling, you can create two distinct databases and swap between each of them in the UI. Each database includes the following documents:

Labour Database: * Canada Labour Code (CLC) * Canada Labour Standards and Regulations (CLSR) * Interpretations, Policies, and Guidelines (IPGs) * Canada webpages on topics covering: labour standards, occupational health and safety, etc.

Equity Database: * Workplace equity, etc.

Transport Database: * Acts and regulations related to transport.

Telemetry and API Calls

In an effort to ensure the highest standards of privacy protection, we have tested and confirmed that the system works offline, without any required Internet connection, thus guaranteeing your conversations remain private.

In addition, we have researched and taken the following measures: * ChromaDB allows to disable telemetry, and we've done just that by following the instructions here.
* Ollama does not have any telemetry. See this explainer.
* Streamlit allows to disable telemetry, and we've done just that by turning gatherUsageStats to 'false'. See this explainer. * vLLM allows opting out from telemetry using the DO_NOT_TRACK environment variable, and we've done just that. See the doc * Hugging Face allows disabling calls to its website via the HF_HUB_OFFLINE environment variable, and we've done just that. See this PR

Roadmap

See the open issues for a full list of proposed features (and known issues).

Contributions

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

Distributed under the MIT License. See LICENSE for more information.

Acknowledgements

Special thanks to Hadi Hojjati @hhojjati98 for the stimulating discussions, brainstormings, and general advice. Both of us appreciated those.

We would like to thank everyone who participates in conducting open research as well as sharing knowledge and code.

In particular, we are grateful to the creators and contributors who made it possible to build CLaRA:

Webcrawling & Preprocessing

Webcrawling and html processing: Beautiful Soup
PDF files content extraction: PyMuPDF

Backend

GPU Paravirtualization: NVIDIA
Llama3.2-Instruct model: Meta
Vector database: Chroma
LLM inference serving: Ollama & vLLM
Embedding models: SentenceTransformers and Hugging Face

Frontend

User Interface: Streamlit

References

We are grateful to, and would like to acknowledge the AI research community. In particular, we drew ideas and inspiration from the following papers, articles, and conference:

Bengio, Yoshua. "Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?". Presentation given at the World Summit AI Canada on April 16 (2025).

Bengio, Yoshua, Michael Cohen, Damiano Fornasiere, Joumana Ghosn, Pietro Greiner, Matt MacDermott, Sören Mindermann et al. "Superintelligent agents pose catastrophic risks: Can scientist ai offer a safer path?." arXiv preprint arXiv:2502.15657 (2025).

He, Jia, Mukund Rungta, David Koleczek, Arshdeep Sekhon, Franklin X. Wang, and Sadid Hasan. "Does Prompt Formatting Have Any Impact on LLM Performance?." Online. https://arxiv.org/abs/2411.10541 arXiv:2411.10541 (2024).

Laban, Philippe, Tobias Schnabel, Paul N. Bennett, and Marti A. Hearst. "SummaC: Re-visiting NLI-based models for inconsistency detection in summarization." Transactions of the Association for Computational Linguistics 10 (2022): 163-177. Arxiv: https://arxiv.org/abs/2111.09525. Repository: https://github.com/tingofurro/summac

Lin, Chin-Yew, and Franz Josef Och. "Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics." In Proceedings of the 42nd annual meeting of the association for computational linguistics (ACL-04), pp. 605-612. https://aclanthology.org/P04-1077.pdf. 2004.

Wikipedia. "ROUGE (metric)." Online. https://en.wikipedia.org/wiki/ROUGE_(metric). 2023.

Wikipedia. "Longest common subsequence". Online. https://en.wikipedia.org/wiki/Longestcommonsubsequence. 2025.

Yeung, Matt. "Deterministic Quoting: Making LLMs Safer for Healthcare." Online. https://mattyyeung.github.io/deterministic-quoting (2024).

Citation

If you draw inspiration or use this solution, please cite the following work:

bibtex @misc{clara-2025, author = {Bonin, Pierre-Olivier, and Allard, Marc-André}, title = {Canada Labour Research Assistant (CLaRA)}, howpublished = {\url{https://github.com/pierreolivierbonin/Canada-Labour-Research-Assistant}}, year = {2025}, }

Contact

Pierre-Olivier Bonin, Marc-André Allard

Owner

Name: Pierre-Olivier Bonin, Ph.D.
Login: pierreolivierbonin
Kind: user
Location: Montreal, Qc (Canada)

Website: https://www.linkedin.com/in/pierreolivierbonin/
Repositories: 2
Profile: https://github.com/pierreolivierbonin

Data science, machine learning, and automation

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Canada Labour Research Agent (CLaRA)
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Pierre-Olivier
    family-names: Bonin
  - given-names: Marc-André
    family-names: Allard
repository-code: >-
  https://github.com/pierreolivierbonin/Canada-Labour-Research-Assistant
abstract: >-
  The Canada Labour Research Assistant (CLaRA) is an
  LLM-powered research assistant that can directly quote
  sources using retrieval-augmented generation to answer
  questions about a wide range of topics related to Canadian
  labour laws, standards, and regulations. All made with
  open-source software.
keywords:
  - direct quotations
  - question-answering
  - labour
  - lcs-algorithm
  - streamlit
  - sentence-transformers
  - chromadb
  - llm
  - llm inference
  - local llm
  - llm-serving
  - retrieval-augmented generation
  - ollama
  - string-matching
  - rag chatbot
  - source referencing
  - metadata
license: MIT
version: '1.0'
date-released: '2025-05-08'

GitHub Events

Total

Create event: 7
Issues event: 5
Watch event: 2
Delete event: 1
Issue comment event: 35
Member event: 1
Push event: 78
Public event: 1
Pull request review event: 24
Pull request review comment event: 9
Pull request event: 14
Fork event: 2

Last Year

Create event: 7
Issues event: 5
Watch event: 2
Delete event: 1
Issue comment event: 35
Member event: 1
Push event: 78
Public event: 1
Pull request review event: 24
Pull request review comment event: 9
Pull request event: 14
Fork event: 2

Committers

Last synced: 8 months ago

All Time

Total Commits: 8
Total Committers: 1
Avg Commits per committer: 8.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 8
Committers: 1
Avg Commits per committer: 8.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Pierre-Olivier Bonin, Ph.D.	3****n	8

Issues and Pull Requests

Last synced: 8 months ago

All Time

Total issues: 13
Total pull requests: 2
Average time to close issues: N/A
Average time to close pull requests: 23 minutes
Total issue authors: 2
Total pull request authors: 1
Average comments per issue: 0.46
Average comments per pull request: 4.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 13
Pull requests: 2
Average time to close issues: N/A
Average time to close pull requests: 23 minutes
Issue authors: 2
Pull request authors: 1
Average comments per issue: 0.46
Average comments per pull request: 4.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

pierreolivierbonin (13)
marca116 (4)

Pull Request Authors

marca116 (8)
pierreolivierbonin (2)

Top Labels

Issue Labels

enhancement (6) bug (4) good first issue (4) question (2) invalid (1)

Pull Request Labels

Dependencies

.setup_vllm/pyproject.toml pypi

beautifulsoup4 ==4.13.4
chromadb ==1.0.12
flashinfer-python ==0.2.5
llmcompressor ==0.5.1
nltk ==3.9.1
ollama ==0.5.1
protobuf ==3.20.3
pymupdf4llm ==0.0.24
sentence-transformers ==4.1.0
streamlit ==1.45.1
vllm ==0.9.0.1

pyproject.toml pypi

beautifulsoup4 ==4.13.4
chromadb ==1.0.12
nltk ==3.9.1
ollama ==0.4.2
protobuf ==3.20.3
pymupdf4llm ==0.0.24
sentence-transformers ==3.0.1
sentencepiece ==0.2.0
streamlit ==1.45.1
summac ==0.0.4
torch ==2.6.0+cu124
transformers >=4.8.1

requirements.txt pypi

Deprecated ==1.2.18
GitPython ==3.1.44
Jinja2 ==3.1.6
MarkupSafe ==3.0.2
PyMuPDF ==1.26.0
PyPika ==0.48.9
PyYAML ==6.0.2
Pygments ==2.19.1
accelerate ==1.7.0
aiohappyeyeballs ==2.6.1
aiohttp ==3.12.7
aiosignal ==1.3.2
airportsdata ==20250523
altair ==5.5.0
annotated-types ==0.7.0
anyio ==4.9.0
asgiref ==3.8.1
astor ==0.8.1
attrs ==25.3.0
backoff ==2.2.1
bcrypt ==4.3.0
beautifulsoup4 ==4.13.4
blake3 ==1.0.5
blinker ==1.9.0
build ==1.2.2.post1
cachetools ==5.5.2
certifi ==2025.4.26
charset-normalizer ==3.4.2
chromadb ==1.0.12
click ==8.2.1
cloudpickle ==3.1.1
coloredlogs ==15.0.1
compressed-tensors ==0.9.4
cupy-cuda12x ==13.4.1
datasets ==3.6.0
depyf ==0.18.0
dill ==0.3.8
diskcache ==5.6.3
distro ==1.9.0
dnspython ==2.7.0
durationpy ==0.10
einops ==0.8.1
email_validator ==2.2.0
fastapi ==0.115.9
fastapi-cli ==0.0.7
fastrlock ==0.8.3
filelock ==3.18.0
flashinfer-python ==0.2.5
flatbuffers ==25.2.10
frozenlist ==1.6.0
fsspec ==2025.3.0
gguf ==0.17.0
gitdb ==4.0.12
google-auth ==2.40.2
googleapis-common-protos ==1.70.0
grpcio ==1.72.1
h11 ==0.16.0
hf-xet ==1.1.2
httpcore ==1.0.9
httptools ==0.6.4
httpx ==0.28.1
huggingface-hub ==0.32.3
humanfriendly ==10.0
idna ==3.10
importlib_metadata ==8.4.0
importlib_resources ==6.5.2
interegular ==0.3.3
jiter ==0.10.0
joblib ==1.5.1
jsonschema ==4.24.0
jsonschema-specifications ==2025.4.1
kubernetes ==32.0.1
lark ==1.2.2
llguidance ==0.7.26
llmcompressor ==0.5.1
llvmlite ==0.44.0
lm-format-enforcer ==0.10.11
loguru ==0.7.3
markdown-it-py ==3.0.0
mdurl ==0.1.2
mistral_common ==1.5.6
mmh3 ==5.1.0
mpmath ==1.3.0
msgpack ==1.1.0
msgspec ==0.19.0
multidict ==6.4.4
multiprocess ==0.70.16
narwhals ==1.41.0
nest-asyncio ==1.6.0
networkx ==3.5
ninja ==1.11.1.4
nltk ==3.9.1
numba ==0.61.2
numpy ==1.26.4
nvidia-cublas-cu12 ==12.6.4.1
nvidia-cuda-cupti-cu12 ==12.6.80
nvidia-cuda-nvrtc-cu12 ==12.6.77
nvidia-cuda-runtime-cu12 ==12.6.77
nvidia-cudnn-cu12 ==9.5.1.17
nvidia-cufft-cu12 ==11.3.0.4
nvidia-cufile-cu12 ==1.11.1.6
nvidia-curand-cu12 ==10.3.7.77
nvidia-cusolver-cu12 ==11.7.1.2
nvidia-cusparse-cu12 ==12.5.4.2
nvidia-cusparselt-cu12 ==0.6.3
nvidia-ml-py ==12.575.51
nvidia-nccl-cu12 ==2.26.2
nvidia-nvjitlink-cu12 ==12.6.85
nvidia-nvtx-cu12 ==12.6.77
oauthlib ==3.2.2
ollama ==0.5.1
onnxruntime ==1.22.0
openai ==1.82.1
opencv-python-headless ==4.11.0.86
opentelemetry-api ==1.27.0
opentelemetry-exporter-otlp ==1.27.0
opentelemetry-exporter-otlp-proto-common ==1.27.0
opentelemetry-exporter-otlp-proto-grpc ==1.27.0
opentelemetry-exporter-otlp-proto-http ==1.27.0
opentelemetry-instrumentation ==0.48b0
opentelemetry-instrumentation-asgi ==0.48b0
opentelemetry-instrumentation-fastapi ==0.48b0
opentelemetry-proto ==1.27.0
opentelemetry-sdk ==1.27.0
opentelemetry-semantic-conventions ==0.48b0
opentelemetry-semantic-conventions-ai ==0.4.9
opentelemetry-util-http ==0.48b0
orjson ==3.10.18
outlines ==0.1.11
outlines_core ==0.1.26
overrides ==7.7.0
packaging ==24.2
pandas ==2.2.3
partial-json-parser ==0.2.1.1.post5
pillow ==11.2.1
posthog ==4.2.0
prometheus-fastapi-instrumentator ==7.1.0
prometheus_client ==0.22.1
propcache ==0.3.1
protobuf ==3.20.3
psutil ==7.0.0
py-cpuinfo ==9.0.0
pyarrow ==20.0.0
pyasn1 ==0.6.1
pyasn1_modules ==0.4.2
pycountry ==24.6.1
pydantic ==2.11.5
pydantic_core ==2.33.2
pydeck ==0.9.1
pymupdf4llm ==0.0.24
pynvml ==12.0.0
pyproject_hooks ==1.2.0
python-dateutil ==2.9.0.post0
python-dotenv ==1.1.0
python-json-logger ==3.3.0
python-multipart ==0.0.20
pytz ==2025.2
pyzmq ==26.4.0
ray ==2.46.0
referencing ==0.36.2
regex ==2024.11.6
requests ==2.32.3
requests-oauthlib ==2.0.0
rich ==14.0.0
rich-toolkit ==0.14.7
rpds-py ==0.25.1
rsa ==4.9.1
safetensors ==0.5.3
scikit-learn ==1.6.1
scipy ==1.15.3
sentence-transformers ==4.1.0
sentencepiece ==0.2.0
setuptools ==79.0.1
shellingham ==1.5.4
six ==1.17.0
smmap ==5.0.2
sniffio ==1.3.1
soupsieve ==2.7
starlette ==0.45.3
streamlit ==1.45.1
sympy ==1.14.0
tenacity ==9.1.2
threadpoolctl ==3.6.0
tiktoken ==0.9.0
tokenizers ==0.21.1
toml ==0.10.2
torch ==2.7.0
torchaudio ==2.7.0
torchvision ==0.22.0
tornado ==6.5.1
tqdm ==4.67.1
transformers ==4.52.4
triton ==3.3.0
typer ==0.16.0
typing-inspection ==0.4.1
typing_extensions ==4.14.0
tzdata ==2025.2
urllib3 ==2.4.0
uv ==0.7.9
uvicorn ==0.34.3
uvloop ==0.21.0
vllm ==0.9.0.1
watchdog ==6.0.0
watchfiles ==1.0.5
websocket-client ==1.8.0
websockets ==15.0.1
wrapt ==1.17.2
xformers ==0.0.30
xgrammar ==0.1.19
xxhash ==3.5.0
yarl ==1.20.0
zipp ==3.22.0

canada-labour-research-assistant

Science Score: 54.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Canada Labour Research Assistant (CLaRA) An LLM-powered assistant that directly quotes retrieved passages

CLaRA comes in two builds

Key Features

Quick Start

Database Creation Explained & How to Create Your Own Knowledge Base

Configuration

Supported Data Sources

Building Your Database

File Management

Use Case and Portability

Telemetry and API Calls

Roadmap

Contributions

License

Acknowledgements

Webcrawling & Preprocessing

Backend

Frontend

References

Citation

Contact

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies

Canada Labour Research Assistant (CLaRA)
_{An LLM-powered assistant that directly quotes retrieved passages}