llm_citations

Exploration around llm citations with structured queries + context free grammar

https://github.com/prestonblackburn/llm_citations

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.5%) to scientific vocabulary
Last synced: 9 months ago · JSON representation ·

Repository

Exploration around llm citations with structured queries + context free grammar

Basic Info
  • Host: GitHub
  • Owner: PrestonBlackburn
  • Language: Python
  • Default Branch: main
  • Size: 11.7 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 1 year ago · Last pushed about 1 year ago
Metadata Files
Readme Citation

README.md

LLM Citations Exploration

A repo to do some digging into a few different options for citations. Citations can gives users more confidence in the model's response by making it easier to validate the response. We'll compare a few different options for citation generation, but all of the options will assume that a number of documents have been first retreived from a RAG workflow.


The options for citation generation we'll test are:
1. Vanilla plain text responses (and cross our fingers)
2. Structured output for citations
3. Custom context free grammar rules


See the associated blog post for more info: (TBD)

Testing the approaches

Get local model ps1 python get_models.py

run locally ps1 python chat.py

As part of the process I found out that the outlines and guidance libraries can't enforce CFGs with the OpenAI API, since we need lower level access to the models inference results for token level grammar enforcement. Guidance can work around this with a soft constraint enforcement but isn't true CFG. I still wanted to test them out, so I ended up using a local model to test CFG with the outlines library.

Approaches breakdown

Simple Results

didn't want to spend enough to do a more representative comparison

n = 10

| Strategy | Model | Avg TTFT (s) | Avg Response Time (s) | Avg Completion Tokens |
| ------------------| ------------ | ------------- | --------------------- | --------- | | Standard | gpt-4.1-mini | 0.039 | 2.89 | 124.0 | | Structured Output | gpt-4.1-mini | 0.012 | 2.66 | 114.5 |

Outlines doesn't support streaming for the OpenAI API yet

Owner

  • Name: Preston Blackburn
  • Login: PrestonBlackburn
  • Kind: user

Citation (citation_streaming.py)

# Playing around with streaming responses, structure, and post processing for linking

import os
import asyncio
import time
from typing import List, Union, Callable, AsyncGenerator, Dict
import re
import logging
from openai import AsyncOpenAI
from openai.types.chat import ParsedChatCompletion

# few shot prompt construction
from few_shot_examples import (
    mock_query_vector_db, 
    get_context, 
    get_structures_for_rag_docs,
    get_few_shot_examples,
)

import logging

import few_shot_examples

_logger = logging.getLogger(__name__)


client = AsyncOpenAI(
    # This is the default and can be omitted
    api_key=os.environ.get("OPENAI_API_KEY"),
)

async def parse_plaintext_citations(text:str) -> List[str]:
    # Example parsing the responses for the cited docs
    pattern = r"\[\s*(?:\d+\s*(?:,\s*\d+\s*)*)?\]"
    matches = re.findall(pattern, text) # will block
    # ex: ['[1]', '[2, 3]', '[]', '[10,20,30]']
    return matches

async def get_citation_idx(citation: str) -> List[int]:
    match_list = citation.split(",")
    match_vals = []
    for match_val in match_list:
        print(match_val)
        try:
            match_vals.append(int(match_val.replace("[", "").replace("]", "")))
        except:
            pass
    # ex: '[2, 3]' -> [2, 3]
    return match_vals


async def parse_final_response(
        structured_response:"ResponseWithCitations", 
        document_map: dict
    ) -> str:
    # also need document names + chunks for reference and to be used in the template
    text = structured_response.response

    citations = await parse_plaintext_citations(text)

    for citation in citations:
        citation_idx = await get_citation_idx(citation)
        citation_idx_str = [str(citation_id) for citation_id in citation_idx]
        citation_text = f"""<sup class="text-blue-600 cursor-pointer citation" data-citation-id="citation-{"-".join(citation_idx_str)}">{citation}</sup>"""
        text = text.replace(citation, citation_text)

        file_name = "**DesignToolsGrading.md**"
        citation_content = """### Grading Criteria
The project is worth a maximum of 2 points. You can receive partial credit..."""
        reference = f"{file_name}\n{citation_content}"
        template = f"""<template id="citation-{"-".join(citation_idx_str)}"> {reference} </template>"""
        text += template

    return f"----FINAL PARSED RESPONSE----\n {text}"


# goal is to get something like:
# {"[1]": "doc1.txt", "[1, 2]": "doc1.txt, doc2.txt", "[3]": "doc3.txt"}
# then do a simple find and replace
# the only issue is if bracketed citations are already present, but probably ok...

# we'll return a html response to make it easy to construct the links
# <sup class="text-blue-600 cursor-pointer citation" data-citation-id="citation-1">[1]</sup> 
# and the citation info gets closed in a template tag:
#   <template id="citation-1">
#     **DesignToolsGrading.md**
#     ### Grading Criteria
    
#     The project is worth a maximum of 2 points. You can receive partial credit...
#   </template>


async def openai_structured_streamer(
        user_query: str, 
        prompt_context: str, 
        few_shot_examples: List[Dict[str, str]],
        ResponseFormat: "ResponseWithCitations"
        ) -> AsyncGenerator[Union[str, ParsedChatCompletion], None]:
    # Incrementally processes the streaming response to strip out json formatting
    expected_json_start = '{"response":'
    expected_json_end = 'citations=[<DocumentEnum' # we don't need to show the unformatted output
    buffer = expected_json_start
    stripped_response = False
    messages = []
    messages.append({"role": "system", "content": prompt_context})
    messages.extend(few_shot_examples)
    messages.append({"role": "user", "content": user_query})

    async with client.beta.chat.completions.stream(
        model="gpt-4.1-mini",
        messages=messages,
        response_format=ResponseFormat,
        temperature=0.0
    ) as stream:
        async for event in stream:
            if event.type == "content.delta":
                if event.parsed is not None:
                    # we can stream the json reponse, but will be more difficult to parse incrementally
                    if not stripped_response:
                        if buffer.startswith(event.delta):
                            buffer = buffer[len(event.delta):]
                            continue
                        else:
                            stripped_response = True
                    if stripped_response:
                        yield event.delta

            elif event.type == "content.done":
                pass
                # print("content.done")
            elif event.type == "error":
                print("Error in stream:", event.error)

        
    # we need final completion to actually parse results
    final_completion = await stream.get_final_completion()
    # print("Final Parsed Citations: ", final_completion.choices[0].message.parsed.citations)
    # print("Final Parsed Citation Type: ", type(final_completion.choices[0].message.parsed.citations[0]))
    _logger.debug(f"FINAL COMPLETION: {final_completion}")
    yield final_completion

async def stream_aggregator(
        streamer: AsyncGenerator[Union[str, ParsedChatCompletion], None]
    ) -> AsyncGenerator[Union[str, "ResponseWithCitations"], None]:
    text_stream = ""
    async for token in streamer:
        if isinstance(token, str):
            text_stream += token
            yield text_stream
    
        if isinstance(token, ParsedChatCompletion):
            # last itteration is actually the structured output
            yield await parse_final_response(token.choices[0].message.parsed, {})


async def structured_query(
        user_query: str, 
        structured_streamer:Callable[
            [
                str, # user query
                str, # sys prompt
                List[Dict[str, str]], # few shot examples
                "ResponseWithCitations" # response format structure
            ], 
            AsyncGenerator[Union[str, ParsedChatCompletion], None]]
    ) -> AsyncGenerator[Union[str, "ResponseWithCitations"], None]:
    user_uuid = "a1"
    documents = await mock_query_vector_db(user_query, user_uuid)
    ResponseWithCitations = await get_structures_for_rag_docs(documents)
    context = await get_context(documents)
    few_shot_examples = await get_few_shot_examples()

    streamer = structured_streamer(user_query, context, few_shot_examples, ResponseWithCitations)
    aggregator = stream_aggregator(streamer)
    async for event in aggregator:
        print(event)
        # yield event

# Also, if no citations show up anywhere, append citations to end of text based on RAG docs
# Or maybe just note all of the referenced files somewhere else on the page?


if __name__ == "__main__":
    logging.basicConfig(level=logging.DEBUG)
    user_query = "What is the grading criteria for the design tools project?"
    asyncio.run(structured_query(user_query, openai_structured_streamer))

GitHub Events

Total
  • Watch event: 1
  • Push event: 1
  • Create event: 3
Last Year
  • Watch event: 1
  • Push event: 1
  • Create event: 3

Dependencies

requirements.txt pypi
  • accelerate *
  • huggingface_hub *
  • openai *
  • outlines *
  • pydantic *
  • sentencepiece *
  • transformers *