olaw

AI + Legal APIs: A Tool-Based Retrieval Augmented Generation Workbench for Legal AI UX Research.

https://github.com/harvard-lil/olaw

Science Score: 52.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
    Organization harvard-lil has institutional domain (lil.law.harvard.edu)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.6%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

AI + Legal APIs: A Tool-Based Retrieval Augmented Generation Workbench for Legal AI UX Research.

Basic Info
Statistics
  • Stars: 66
  • Watchers: 9
  • Forks: 16
  • Open Issues: 1
  • Releases: 3
Created over 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

Open Legal AI Workbench (OLAW)

AI + Legal APIs: A Tool-Based Retrieval Augmented Generation Workbench for Legal AI UX Research.

More info: - "Cracking the justice barrier: announcing the Open Legal AI Workbench". Mar 08 2024 - lil.law.harvard.edu

https://github.com/harvard-lil/olaw/assets/625889/65dd61db-42f8-490b-a737-0612d97c5c81

Video: OLAW’s chatbot retrieving court opinions from the CourtListener API to help answer a legal question. Information is interpreted by the AI model, which may make mistakes.


Summary


Concept

OLAW is a tool-based Retrieval Augmented Generation (RAG) workbench for legal AI UX research. It consists of a customizable chatbot that can use legal APIs to augment its responses.

The goal of this project is to simplify and streamline experimentation with APIs-based RAG in legal contexts by: - Keeping it simple: The tool should be easy to operate, modify and interpret. - Being highly customizable and modular: Adding a tool to this workbench should be as simple as possible. - Being open and collaborative: A lot of this work generally happens behind the scenes. This project aims at amplifying collaborative research on the uses of AI in legal contexts.

The focus here is on ease of access and experimentation, as opposed to overall performance or production-readiness.

Tool-based RAG?

There are as many "flavors" of RAG as there are implementations of it. This workbench focuses on a tool-based approach, in which the LLM is indirectly given access to APIs as a way to augment its responses.

This process takes place in three steps: 1. Upon receiving a message from the user, the pipeline asks the LLM to analyze the message to: - Detect if it contains a legal question - Use a prompt to determine where to look for additional information (search target) - Use that same prompt to generate a search statement to use against the search target 2. Upon identifying a search suggestion. - The UI presents the search suggestion to the user and ask for confirmation. 3. Upon confirmation from the user: - The pipeline performs the suggested search against the search target ... - ... and uses the results as additional context when asking the LLM to answer the user's question

☝️ Summary


Installation

OLAW requires the following machine-level dependencies to be installed.

Use the following commands to clone the project and instal its dependencies:

```bash

MacOS / Linux / WSL

git clone https://github.com/harvard-lil/olaw.git cd olaw poetry install ```

The workbench itself doesn't have specific hardware requirements. If you would like to use Ollama for local inference with open-source language models, be sure to check their system requirements.

☝️ Summary


Configuring the application

This program uses environment variables to handle settings. Copy .env.example into a new .env file and edit it as needed.

bash cp .env.example .env

See details for individual settings in .env.example.

A few notes: - OLAW can interact with both the OpenAI API and Ollama for local inference. - Both can be used at the same time, but at least one is needed. - By default, the program will try to communicate with Ollama's API at http://localhost:11434. - It is also possible to use OpenAI's client to interact with compatible providers, such as HuggingFace's Message API or vLLM. To do so, set values for both OPENAI_BASE_URL and OPENAI_COMPATIBLE_MODEL environment variables. - Prompts can be edited directly in the configuration file.

☝️ Summary


Starting the server

The following command will start the OLAW (development) server on port 5000.

```bash poetry run flask run

Not: Use --port to use a different port

```

☝️ Summary


Recommended models

While this pipeline can in theory be run against a wide variety of text generation models, there are two key constraints to keep in mind when picking an LLM: - The size of the context window. The target model needs to be able to handle long input, as the pipeline may pull additional context from APIs it has access to. - Ability to reliably return JSON data. This feature is used by the /api/extract-search-statement route.

We have tested this software with the following models: - OpenAI: openai/gpt-4-turbo-preview (128K tokens context) - Ollama: Any version of ollama/mixtral(32K tokens context + sliding window)

We have observed performance with openai/gpt-4-turbo-preview during out initial tests, using the default prompts.

☝️ Summary


Interacting with the WEB UI

Once the server is started, the application's web UI should be available at http://localhost:5000.

The interface automatically handles a basic chat history, allowing for few-shots / chain-of-thoughts prompting.

☝️ Summary


Interacting with the API

OLAW comes with a REST API that can be used to interact programmatically with the workbench.

New to REST APIs? See this tutorial.

[GET] /api/models

Returns a list of available models as JSON.

Sample output ```json [ "openai/gpt-4-vision-preview", "openai/gpt-4-0613", "openai/gpt-4-0125-preview", "openai/gpt-4-turbo-preview", "openai/gpt-4", "openai/gpt-4-1106-preview", "ollama/llama2:13b", "ollama/llama2:13b-chat-fp16", "ollama/llama2:70b", "ollama/llama2:70b-chat-fp16", "ollama/llama2:7b", "ollama/llama2:latest", "ollama/mistral:7b", "ollama/mistral:7b-instruct-fp16", "ollama/mistral:7b-instruct-v0.2-fp16", "ollama/mixtral:8x7b-instruct-v0.1-fp16", "ollama/mixtral:8x7b-instruct-v0.1-q6_K", "ollama/mixtral:latest", "ollama/phi:2.7b-chat-v2-fp16" ] ```

[POST] /api/extract-search-statement

Uses the search statement extraction prompt to: - Detect if the user asked a question that requires pulling information from a legal database - If so, transform said question into a search statement that can be run against a known search target (See SEARCH_TARGETS).

Returns a JSON object containing search_statement and search_target. These properties can be empty.

Sample input ```json { "model": "ollama/mixtral", "temperature": 0.0, "message": "Tell me everything you know about Miranda v. Arizona (1966)" } ``` **Notes:** - `temperature` is optional.
Sample output ```json { "search_statement": "caseName:(\"Miranda v. Arizona\") AND dateFiled:[1966-01-01 TO 1966-12-31]", "search_target": "courtlistener" } ```

[POST] /api/search

Performs search using what /api/extract-search-statement returned.

Returns a JSON object with search results indexed by SEARCH_TARGET.

Sample input ```json { "search_statement": "caseName:(\"Miranda v. Arizona\") AND dateFiled:[1966-01-01 TO 1966-12-31]", "search_target": "courtlistener" } ```
Sample output ```json { "courtlistener": [ { "absolute_url": "https://www.courtlistener.com/opinion/107252/miranda-v-arizona/", "case_name": "Miranda v. Arizona", "court": "Supreme Court of the United States", "date_filed": "1966-06-13T00:00:00-07:00", "id": 107252, "ref_tag": 1, "status": "Precedential", "text": "..." }, { "absolute_url": "https://www.courtlistener.com/opinion/8976604/miranda-v-arizona/", "case_name": "Miranda v. Arizona", "court": "Supreme Court of the United States", "date_filed": "1969-10-13T00:00:00-07:00", "id": 8968349, "ref_tag": 2, "status": "Precedential", "text": "..." }, { "absolute_url": "https://www.courtlistener.com/opinion/8962758/miranda-v-arizona/", "case_name": "Miranda v. Arizona", "court": "Supreme Court of the United States", "date_filed": "1965-11-22T00:00:00-08:00", "id": 8953989, "ref_tag": 3, "status": "Precedential", "text": "..." } ] } ```

[POST] /api/complete

Passes messages and context to target LLM and starts streaming text completion. Returns raw text, streamed.

Sample input ```json { "message": "Tell me everything you know about Miranda v. Arizona (1966)", "model": "openai/gpt-4-turbo-preview", "temperature": 0.0, "max_tokens": 4000, "search_results": { "courtlistener": [...] }, "history": [ {"role": "user", "content": "Hi there!"}, {"role": "assistant", "content": "How may I help you?"} ] } ``` **Notes:** - `temperature` is optional. - `max_tokens` is optional. - `history` must be an array of objects containing `role` and `content` keys. `role` can be either `user` or `assistant`.

☝️ Summary


Adding new tools

This section of the documentation describes the process of making OLAW understand and use additional "search target" beyond the Court Listener API.

1. Declare a new search target Edit the [`SEARCH_TARGETS`](/olaw/search_targets/__init__.py) list under [`olaw/search_results/__init__.py`](/olaw/search_targets/__init__.py) to declare a new search target. Lets call this new target `casedotlaw`. ```python SEARCH_TARGETS = ["courtlistener", "casedotlaw"] ```
2. Edit search statement extraction prompt Edit `EXTRACT_SEARCH_STATEMENT_PROMPT` in your `.env` file to let the LLM know how to write search statements for this new tool. This prompt is used by `/api/extract-search-statement`, which is then able to output objects as follows: ```json { "search_statement": "(Platform-specific search statement based on user question)", "search_target": "casedotlaw" } ``` The process of designing a performant prompt for that task generally requires a few iterations.
3. Add handling logic Add a file under the `olaw/search_targets/` folder, named after your search target. In that case: `casedotlaw.py`. This file must contain a class inheriting from `SearchTarget`, which defines 1 property and 1 static method: - `RESULTS_DATA_FORMAT` determining how search results data is structured - `search()` containing logic for returning search results You may refer to [`courtlistener.py` as an example](/olaw/search_targets/courtlistener.py). You will also need to edit [`olaw/search_results/__init__.py`](/olaw/search_targets/__init__.py) as follows: - Import `casedotlaw.py` - Edit `route_search()` to account for that new target

☝️ Summary


Getting Involved

This project is collaborative at its core and we warmly welcome feedback and contributions.

  • The issues tab is a good place to start to report bugs, suggest features or volunteer to contribute to the codebase on a specific issue.
  • Don't hesitate to use the discussions tab to ask more general questions about this project.

☝️ Summary


Cite this repository

Cargnelutti, M., & Cushman, J. (2024). Open Legal AI Workbench (OLAW) (Version 0.0.1) [Computer software]

See also: - Our citation file - The "Cite this repository" button in the About section of this repository.

☝️ Summary


Disclaimer

The Library Innovation Lab is an organization based at the Harvard Law School Library. We are a cross-functional group of software developers, librarians, lawyers, and researchers doing work at the edges of technology and digital information.

Our work is rooted in library principles including longevity, authenticity, reliability, and privacy. Any work that we produce takes these principles as a primary lens. However due to the nature of exploration and a desire to prototype our work with real users, we do not guarantee service or performance at the level of a production-grade software for all of our releases. This includes this project, which is an experimental boilerplate released under MIT License.

Open Legal AI Workbench is an experimental tool for evaluating legal retrieval software and should not be used for legal advice.

☝️ Summary

Owner

  • Name: Harvard Library Innovation Laboratory
  • Login: harvard-lil
  • Kind: organization
  • Email: lil@law.harvard.edu
  • Location: Cambridge, MA

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Cargnelutti
    given-names: Matteo
  - family-names: Cushman
    given-names: Jack
title: "Open Legal AI Workbench (OLAW)"
version: 0.0.1
date-released: 2024-03-06

GitHub Events

Total
  • Watch event: 48
  • Issue comment event: 1
  • Push event: 1
  • Pull request review event: 1
  • Pull request event: 2
  • Fork event: 11
Last Year
  • Watch event: 48
  • Issue comment event: 1
  • Push event: 1
  • Pull request review event: 1
  • Pull request event: 2
  • Fork event: 11

Dependencies

poetry.lock pypi
  • annotated-types 0.6.0
  • anyio 4.2.0
  • black 23.12.1
  • blinker 1.7.0
  • certifi 2024.2.2
  • charset-normalizer 3.3.2
  • click 8.1.7
  • colorama 0.4.6
  • deprecated 1.2.14
  • distro 1.9.0
  • flake8 6.1.0
  • flask 3.0.2
  • flask-limiter 3.5.1
  • h11 0.14.0
  • html2text 2020.1.16
  • httpcore 1.0.3
  • httpx 0.25.2
  • idna 3.6
  • importlib-resources 6.1.1
  • itsdangerous 2.1.2
  • jinja2 3.1.3
  • limits 3.9.0
  • markdown-it-py 3.0.0
  • markupsafe 2.1.5
  • mccabe 0.7.0
  • mdurl 0.1.2
  • mypy-extensions 1.0.0
  • ollama 0.1.6
  • openai 1.12.0
  • ordered-set 4.1.0
  • packaging 23.2
  • pathspec 0.12.1
  • platformdirs 4.2.0
  • pycodestyle 2.11.1
  • pydantic 2.6.1
  • pydantic-core 2.16.2
  • pyflakes 3.1.0
  • pygments 2.17.2
  • python-dotenv 1.0.1
  • requests 2.31.0
  • rich 13.7.0
  • sniffio 1.3.0
  • tqdm 4.66.2
  • typing-extensions 4.9.0
  • urllib3 2.2.0
  • werkzeug 3.0.1
  • wrapt 1.16.0
pyproject.toml pypi
  • black ^23.10.1 develop
  • flake8 ^6.1.0 develop
  • click ^8.1.7
  • flask ^3.0.0
  • flask-limiter ^3.5.1
  • html2text ^2020.1.16
  • ollama ^0.1.6
  • openai ^1.11.1
  • python ^3.11
  • python-dotenv ^1.0.0
  • requests ^2.31.0