citation-util

This is a small Python example which shows how we can streamline the citations in a summary response.

https://github.com/david-vectara/citation-util

Science Score: 18.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (5.8%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

This is a small Python example which shows how we can streamline the citations in a summary response.

Basic Info
  • Host: GitHub
  • Owner: david-vectara
  • Language: Python
  • Default Branch: main
  • Size: 7.81 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 2 years ago · Last pushed about 2 years ago
Metadata Files
Readme Citation

README.md

Citation Util for Vectara

This is a demonstration on how we can streamline the summary response by reducing the citation numbers to an ordered list in which they appear. After streamlining the summary, we are left with a summary that is easier to read as well as an array containing the results to show, in the order they should be cited (starting with an index of 1 rather than 0)

To Run

If you want to run using the example response included repository, use the following command:

python pip install -r requirements.txt python retrieve_response.py

If you want to generate a summary response from your own corpus, please first configure vectara-skunk-client using the instructions here https://github.com/davidglevy/vectara-skunk-client

Once done run the following (using either corpusid or corpusname): python pip install -r requirements.txt python retrieve_response.py -c [corpus_id] -C [corpus_name] -q [query]

You may also specify a filename to use as a cache or destination to write response to.

Owner

  • Login: david-vectara
  • Kind: user

Citation (citation_util.py)

import logging
import json
import re

logger = logging.getLogger(__name__)

def streamline(search_response):
    """
    Streamlines a vectara response (starting with a specific response under the outer array)

    :param search_response: the first response inside the array response
    :return: a dict containing a "summary" with citations streamlined and a list of results used with ordering preserved
    """

    # TODO 1. Create a list of search results
    # TODO 2. Iterate through the summary, and record which search result is used at which index

    summary_text = search_response['summary'][0]['text']
    logger.info(f"Found summary:\n{summary_text}")

    find_result = re.finditer(r"\[(\d)\]", summary_text)

    index = 1
    dict_map = {}

    # The end index of the last match.
    last_end = None

    results = []

    result_index = []

    for match_obj in find_result:
        match = match_obj.group(1)
        logger.info(f"At {index} we found {match}")
        start = match_obj.start()
        end = match_obj.end()

        if match in dict_map:
            logger.info(f"We already have result [{match}] at position [{dict_map[match]}]")
            pass
        else:
            logger.info(f"Recording result [{match}] at position [{index}]")
            dict_map[match] = index
            result_index.append(int(match))
            index += 1

        if last_end:
            logger.info("Using end of the last match to get last text chunk")
            results.append(summary_text[last_end:start])
        else:
            logger.info("First iteration, use everything from beginning to start")
            if start > 0:
                results.append(summary_text[0:start])

        results.append("[" + str(dict_map[match]) + "]")

        last_end = end

    if last_end < len(summary_text):
        results.append(summary_text[last_end:])

    result_summary = "".join(results)

    logger.info(f"After transformation, we have the following summary:\n{result_summary}")

    logger.info(f"The mapping to the result summary is as follows:\n{result_index}")

    streamline_response = {
        "summary": result_summary,
        "result_indexes": result_index
    }
    return streamline_response

GitHub Events

Total
Last Year

Committers

Last synced: over 1 year ago

All Time
  • Total Commits: 5
  • Total Committers: 1
  • Avg Commits per committer: 5.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 5
  • Committers: 1
  • Avg Commits per committer: 5.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
David Levy d****d@v****m 5
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: about 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels