llm_citations
Exploration around llm citations with structured queries + context free grammar
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.5%) to scientific vocabulary
Repository
Exploration around llm citations with structured queries + context free grammar
Basic Info
- Host: GitHub
- Owner: PrestonBlackburn
- Language: Python
- Default Branch: main
- Size: 11.7 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
LLM Citations Exploration
A repo to do some digging into a few different options for citations. Citations can gives users more confidence in the model's response by making it easier to validate the response. We'll compare a few different options for citation generation, but all of the options will assume that a number of documents have been first retreived from a RAG workflow.
The options for citation generation we'll test are:
1. Vanilla plain text responses (and cross our fingers)
2. Structured output for citations
3. Custom context free grammar rules
See the associated blog post for more info: (TBD)
Testing the approaches
Get local model
ps1
python get_models.py
run locally
ps1
python chat.py
As part of the process I found out that the outlines and guidance libraries can't enforce CFGs with the OpenAI API, since we need lower level access to the models inference results for token level grammar enforcement. Guidance can work around this with a soft constraint enforcement but isn't true CFG. I still wanted to test them out, so I ended up using a local model to test CFG with the outlines library.
Approaches breakdown
Simple Results
didn't want to spend enough to do a more representative comparison
n = 10
| Strategy | Model | Avg TTFT (s) | Avg Response Time (s) | Avg Completion Tokens |
| ------------------| ------------ | ------------- | --------------------- | --------- |
| Standard | gpt-4.1-mini | 0.039 | 2.89 | 124.0 |
| Structured Output | gpt-4.1-mini | 0.012 | 2.66 | 114.5 |
Outlines doesn't support streaming for the OpenAI API yet
Owner
- Name: Preston Blackburn
- Login: PrestonBlackburn
- Kind: user
- Website: prestonblackburn.com
- Repositories: 5
- Profile: https://github.com/PrestonBlackburn
Citation (citation_streaming.py)
# Playing around with streaming responses, structure, and post processing for linking
import os
import asyncio
import time
from typing import List, Union, Callable, AsyncGenerator, Dict
import re
import logging
from openai import AsyncOpenAI
from openai.types.chat import ParsedChatCompletion
# few shot prompt construction
from few_shot_examples import (
mock_query_vector_db,
get_context,
get_structures_for_rag_docs,
get_few_shot_examples,
)
import logging
import few_shot_examples
_logger = logging.getLogger(__name__)
client = AsyncOpenAI(
# This is the default and can be omitted
api_key=os.environ.get("OPENAI_API_KEY"),
)
async def parse_plaintext_citations(text:str) -> List[str]:
# Example parsing the responses for the cited docs
pattern = r"\[\s*(?:\d+\s*(?:,\s*\d+\s*)*)?\]"
matches = re.findall(pattern, text) # will block
# ex: ['[1]', '[2, 3]', '[]', '[10,20,30]']
return matches
async def get_citation_idx(citation: str) -> List[int]:
match_list = citation.split(",")
match_vals = []
for match_val in match_list:
print(match_val)
try:
match_vals.append(int(match_val.replace("[", "").replace("]", "")))
except:
pass
# ex: '[2, 3]' -> [2, 3]
return match_vals
async def parse_final_response(
structured_response:"ResponseWithCitations",
document_map: dict
) -> str:
# also need document names + chunks for reference and to be used in the template
text = structured_response.response
citations = await parse_plaintext_citations(text)
for citation in citations:
citation_idx = await get_citation_idx(citation)
citation_idx_str = [str(citation_id) for citation_id in citation_idx]
citation_text = f"""<sup class="text-blue-600 cursor-pointer citation" data-citation-id="citation-{"-".join(citation_idx_str)}">{citation}</sup>"""
text = text.replace(citation, citation_text)
file_name = "**DesignToolsGrading.md**"
citation_content = """### Grading Criteria
The project is worth a maximum of 2 points. You can receive partial credit..."""
reference = f"{file_name}\n{citation_content}"
template = f"""<template id="citation-{"-".join(citation_idx_str)}"> {reference} </template>"""
text += template
return f"----FINAL PARSED RESPONSE----\n {text}"
# goal is to get something like:
# {"[1]": "doc1.txt", "[1, 2]": "doc1.txt, doc2.txt", "[3]": "doc3.txt"}
# then do a simple find and replace
# the only issue is if bracketed citations are already present, but probably ok...
# we'll return a html response to make it easy to construct the links
# <sup class="text-blue-600 cursor-pointer citation" data-citation-id="citation-1">[1]</sup>
# and the citation info gets closed in a template tag:
# <template id="citation-1">
# **DesignToolsGrading.md**
# ### Grading Criteria
# The project is worth a maximum of 2 points. You can receive partial credit...
# </template>
async def openai_structured_streamer(
user_query: str,
prompt_context: str,
few_shot_examples: List[Dict[str, str]],
ResponseFormat: "ResponseWithCitations"
) -> AsyncGenerator[Union[str, ParsedChatCompletion], None]:
# Incrementally processes the streaming response to strip out json formatting
expected_json_start = '{"response":'
expected_json_end = 'citations=[<DocumentEnum' # we don't need to show the unformatted output
buffer = expected_json_start
stripped_response = False
messages = []
messages.append({"role": "system", "content": prompt_context})
messages.extend(few_shot_examples)
messages.append({"role": "user", "content": user_query})
async with client.beta.chat.completions.stream(
model="gpt-4.1-mini",
messages=messages,
response_format=ResponseFormat,
temperature=0.0
) as stream:
async for event in stream:
if event.type == "content.delta":
if event.parsed is not None:
# we can stream the json reponse, but will be more difficult to parse incrementally
if not stripped_response:
if buffer.startswith(event.delta):
buffer = buffer[len(event.delta):]
continue
else:
stripped_response = True
if stripped_response:
yield event.delta
elif event.type == "content.done":
pass
# print("content.done")
elif event.type == "error":
print("Error in stream:", event.error)
# we need final completion to actually parse results
final_completion = await stream.get_final_completion()
# print("Final Parsed Citations: ", final_completion.choices[0].message.parsed.citations)
# print("Final Parsed Citation Type: ", type(final_completion.choices[0].message.parsed.citations[0]))
_logger.debug(f"FINAL COMPLETION: {final_completion}")
yield final_completion
async def stream_aggregator(
streamer: AsyncGenerator[Union[str, ParsedChatCompletion], None]
) -> AsyncGenerator[Union[str, "ResponseWithCitations"], None]:
text_stream = ""
async for token in streamer:
if isinstance(token, str):
text_stream += token
yield text_stream
if isinstance(token, ParsedChatCompletion):
# last itteration is actually the structured output
yield await parse_final_response(token.choices[0].message.parsed, {})
async def structured_query(
user_query: str,
structured_streamer:Callable[
[
str, # user query
str, # sys prompt
List[Dict[str, str]], # few shot examples
"ResponseWithCitations" # response format structure
],
AsyncGenerator[Union[str, ParsedChatCompletion], None]]
) -> AsyncGenerator[Union[str, "ResponseWithCitations"], None]:
user_uuid = "a1"
documents = await mock_query_vector_db(user_query, user_uuid)
ResponseWithCitations = await get_structures_for_rag_docs(documents)
context = await get_context(documents)
few_shot_examples = await get_few_shot_examples()
streamer = structured_streamer(user_query, context, few_shot_examples, ResponseWithCitations)
aggregator = stream_aggregator(streamer)
async for event in aggregator:
print(event)
# yield event
# Also, if no citations show up anywhere, append citations to end of text based on RAG docs
# Or maybe just note all of the referenced files somewhere else on the page?
if __name__ == "__main__":
logging.basicConfig(level=logging.DEBUG)
user_query = "What is the grading criteria for the design tools project?"
asyncio.run(structured_query(user_query, openai_structured_streamer))
GitHub Events
Total
- Watch event: 1
- Push event: 1
- Create event: 3
Last Year
- Watch event: 1
- Push event: 1
- Create event: 3
Dependencies
- accelerate *
- huggingface_hub *
- openai *
- outlines *
- pydantic *
- sentencepiece *
- transformers *