citation-util
This is a small Python example which shows how we can streamline the citations in a summary response.
Science Score: 18.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (5.8%) to scientific vocabulary
Repository
This is a small Python example which shows how we can streamline the citations in a summary response.
Basic Info
- Host: GitHub
- Owner: david-vectara
- Language: Python
- Default Branch: main
- Size: 7.81 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Citation Util for Vectara
This is a demonstration on how we can streamline the summary response by reducing the citation numbers to an ordered list in which they appear. After streamlining the summary, we are left with a summary that is easier to read as well as an array containing the results to show, in the order they should be cited (starting with an index of 1 rather than 0)
To Run
If you want to run using the example response included repository, use the following command:
python
pip install -r requirements.txt
python retrieve_response.py
If you want to generate a summary response from your own corpus, please first configure vectara-skunk-client using the instructions here https://github.com/davidglevy/vectara-skunk-client
Once done run the following (using either corpusid or corpusname):
python
pip install -r requirements.txt
python retrieve_response.py -c [corpus_id] -C [corpus_name] -q [query]
You may also specify a filename to use as a cache or destination to write response to.
Owner
- Login: david-vectara
- Kind: user
- Repositories: 1
- Profile: https://github.com/david-vectara
Citation (citation_util.py)
import logging
import json
import re
logger = logging.getLogger(__name__)
def streamline(search_response):
"""
Streamlines a vectara response (starting with a specific response under the outer array)
:param search_response: the first response inside the array response
:return: a dict containing a "summary" with citations streamlined and a list of results used with ordering preserved
"""
# TODO 1. Create a list of search results
# TODO 2. Iterate through the summary, and record which search result is used at which index
summary_text = search_response['summary'][0]['text']
logger.info(f"Found summary:\n{summary_text}")
find_result = re.finditer(r"\[(\d)\]", summary_text)
index = 1
dict_map = {}
# The end index of the last match.
last_end = None
results = []
result_index = []
for match_obj in find_result:
match = match_obj.group(1)
logger.info(f"At {index} we found {match}")
start = match_obj.start()
end = match_obj.end()
if match in dict_map:
logger.info(f"We already have result [{match}] at position [{dict_map[match]}]")
pass
else:
logger.info(f"Recording result [{match}] at position [{index}]")
dict_map[match] = index
result_index.append(int(match))
index += 1
if last_end:
logger.info("Using end of the last match to get last text chunk")
results.append(summary_text[last_end:start])
else:
logger.info("First iteration, use everything from beginning to start")
if start > 0:
results.append(summary_text[0:start])
results.append("[" + str(dict_map[match]) + "]")
last_end = end
if last_end < len(summary_text):
results.append(summary_text[last_end:])
result_summary = "".join(results)
logger.info(f"After transformation, we have the following summary:\n{result_summary}")
logger.info(f"The mapping to the result summary is as follows:\n{result_index}")
streamline_response = {
"summary": result_summary,
"result_indexes": result_index
}
return streamline_response
GitHub Events
Total
Last Year
Committers
Last synced: over 1 year ago
Top Committers
| Name | Commits | |
|---|---|---|
| David Levy | d****d@v****m | 5 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: about 1 year ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0