Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (2.6%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: denisbog
  • Language: Python
  • Default Branch: main
  • Size: 28.3 KB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 7 months ago · Last pushed 7 months ago
Metadata Files
Readme Citation

readme.md

Notes

implementation

soure code copied from the application generated by:

sh npx create-llama@latest

run

```sh pip install -r requirements.txt python generate.py python app.py

```

run with Chorma vector db

sh CHROMA_IDX=true python generate.py CHROMA_IDX=true python app.py

Owner

  • Login: denisbog
  • Kind: user

Citation (citation.py)

from typing import Any, List, Optional

from llama_index.core import QueryBundle
from llama_index.core.postprocessor.types import BaseNodePostprocessor
from llama_index.core.prompts import PromptTemplate
from llama_index.core.query_engine.retriever_query_engine import RetrieverQueryEngine
from llama_index.core.response_synthesizers import Accumulate
from llama_index.core.schema import NodeWithScore
from llama_index.core.tools.query_engine import QueryEngineTool


# Used as a prompt for synthesizer
# Override this prompt by setting the `CITATION_PROMPT` environment variable
CITATION_PROMPT = """
Context information is below.
------------------
{context_str}
------------------
The context are multiple text chunks, each text chunk has its own citation_id, citation_file_path at the beginning.
Use the citation_id, citation_file_path for citation construction.

Answer the following query with citations:
------------------
{query_str}
------------------

## Citation format

[citation:id]
[citation:file_path]

Where:
- [citation id=, path=] is a matching pattern which is required for all citations.
- `id` is the `citation_id` provided in the context or previous response.
- `path` is the `citation_file_path` provided in the context or previous response.

Example:
```
    Here is a response that uses context information [citation id=90ca859f-4f32-40ca-8cd0-edfad4fb298b, path: /tmp/document/path/file.html]
    and other ideas that don't use context information [citation id=17b2cc9a-27ae-4b6d-bede-5ca60fc00ff4, path: /tmp/document2/path/file.html] .\n
    The citation block will be displayed automatically with useful information for the user in the UI [citation id=1c606612-e75f-490e-8374-44e79f818d19, path: /tmp/document/another path/file.html] .
```

## Requirements:
1. Always include citations for every fact from the context information in your response.
2. Make sure that the citation_id citation_file_path is correct with the context, don't mix up the citation_id, citation_file_path with other information.

Now, you answer the query with citations:
"""


class NodeCitationProcessor(BaseNodePostprocessor):
    """
    Add a new field `citation_id` to the metadata of the node by copying the id from the node.
    Useful for citation construction.
    """

    def _postprocess_nodes(
        self,
        nodes: List[NodeWithScore],
        query_bundle: Optional[QueryBundle] = None,
    ) -> List[NodeWithScore]:
        for node_score in nodes:
            node_score.node.metadata["citation_id"] = node_score.node.node_id
        return nodes


class CitationSynthesizer(Accumulate):
    """
    Overload the Accumulate synthesizer to:
    1. Update prepare node metadata for citation id
    2. Update text_qa_template to include citations
    """

    def __init__(self, **kwargs: Any) -> None:
        text_qa_template = kwargs.pop("text_qa_template", None)
        if text_qa_template is None:
            text_qa_template = PromptTemplate(template=CITATION_PROMPT)
        super().__init__(text_qa_template=text_qa_template, **kwargs)


# Add this prompt to your agent system prompt
CITATION_SYSTEM_PROMPT = (
    "\nAnswer the user question using the response from the query tool. "
    "It's important to respect the citation information in the response. "
    "Don't mix up the citation_id, keep them at the correct fact."
)


def enable_citation(query_engine_tool: QueryEngineTool) -> QueryEngineTool:
    """
    Enable citation for a query engine tool by using CitationSynthesizer and NodePostprocessor.
    Note: This function will override the response synthesizer of your query engine.
    """
    query_engine = query_engine_tool.query_engine
    if not isinstance(query_engine, RetrieverQueryEngine):
        raise ValueError(
            "Citation feature requires a RetrieverQueryEngine. Your tool's query engine is a "
            f"{type(query_engine)}."
        )
    # Update the response synthesizer and node postprocessors
    query_engine._response_synthesizer = CitationSynthesizer()
    query_engine._node_postprocessors += [NodeCitationProcessor()]
    query_engine_tool._query_engine = query_engine

    # Update tool metadata
    query_engine_tool.metadata.description += "\nThe output will include citations with the format [citation:id] for each chunk of information in the knowledge base."
    return query_engine_tool

GitHub Events

Total
  • Push event: 5
  • Create event: 1
Last Year
  • Push event: 5
  • Create event: 1

Dependencies

requirements.txt pypi
  • dotenv *
  • llama_index *
  • llama_index.embeddings.huggingface *
  • llama_index.embeddings.ollama *
  • llama_index.llms.ollama *
  • llama_index.vector_stores.chroma *