text-citation-agent
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.0%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: atimcenko
- Language: Python
- Default Branch: main
- Size: 16.6 KB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Citation Finder Agent
A lightweight command-line tool that automatically detects factual claims in your text, retrieves relevant scholarly references via OpenAlex, and annotates your document with inline citations and a generated bibliography.
Features
Claim Detection
Uses a tuned LLM prompt to identify factual claims within each sentence and extract minimal “claim spans.”Query Generation
Converts each claim span into concise search queries tailored for scholarly discovery.Reference Retrieval
Leverages the OpenAlex API to fetch candidate papers for each query.Candidate Reranking
Summarizes paper abstracts via LLM and ranks them by relevance to each claim.Multi-Citation Support
Attaches multiple high-scoring references to each claim, rather than a single “best” match.Automatic Annotation
Inserts inline citations (e.g.(Smith et al., 2020; Doe et al., 2018)) and compiles a “References” section at the end of your document.Configurable Parameters
Control maximum candidates, top-K citations, retry logic, LLM model choice, verbosity, and more via.env.Easy to Extend
Modular architecture—swap out LLM providers, retrieval backends, or tuning prompts with minimal code changes.
Installation
Ensure that you have uv installed on your system!
- Clone the repository
```bash git clone https://github.com/yourusername/tu-llm-agent.git cd tu-llm-agent - Set up the virtual environment using uv ```bash uv sync
- Copy and populate environment variables ```bash cp .env.example .env
Owner
- Name: Aleksejs Timcenko
- Login: atimcenko
- Kind: user
- Location: Riga, Latvia
- Twitter: AlexeyTimchenk3
- Repositories: 1
- Profile: https://github.com/atimcenko
Senior Python developer. Skills: - List sorting - List inversion - Text printing
Citation (citation_agent/agent.py)
#!/usr/bin/env python3
"""
Citation Finder Agent
Usage:
uv run citation_agent/agent.py data/your_input.txt [--verbose]
python citation_agent/agent.py data/your_input.txt [-v]
This will read <your_input>.txt, annotate each claim‐span with inline citations,
and write the result to data/your_input_output.txt.
Options:
-v, --verbose Print debug logging for LLM calls and retrieval steps.
Configuration is via environment variables documented in .env.example.
"""
import argparse
from pathlib import Path
from utils.text import split_sentences
from retrieval.openalex import get_top_references
from models.llm import (
set_verbose,
detect_claims,
gen_queries,
rerank
)
def process_paragraph(text: str, verbose: bool = False):
sentences = split_sentences(text)
output = []
# loop through the sentences
for s in sentences:
if verbose:
print(f"\n▶ [Sentence] {s}")
# tag if the sentence needs supporting evidence at all
tag = detect_claims(s)
if verbose:
print(" [Tag] ", tag)
# if does not need, proceed
if not tag["needs_cite"]:
if verbose:
print(" → No citation needed")
output.append({"sentence": s, "claims": []})
continue
# fetch unique claims created by claim extractor agent
spans = list(dict.fromkeys(tag["claim_spans"]))
span_results = []
# for each claim span run the loop
for span in spans:
if verbose:
print(f" [Claim Span] {span}")
# 1) generate queries
queries = gen_queries(span, s)
if verbose:
print(" [Queries] ", queries)
# 2) retrieve candidate references
all_cands = []
for q in queries:
if verbose:
print(f" [Search] '{q}'")
try:
# get top references from openalex
refs = get_top_references(q)
if not isinstance(refs, list):
raise RuntimeError(f"Expected list, got {type(refs)}")
if verbose:
print(f" [Results] {len(refs)}")
except Exception as e:
print(f" ⚠️ OpenAlex search failed for '{q}': {e}")
refs = []
all_cands.extend(refs)
# 3) dedupe by DOI
seen = set()
unique = []
for c in all_cands:
doi = c.get("doi", "")
if doi and doi not in seen:
seen.add(doi)
unique.append(c)
if verbose:
print(f" [Dedupe] {len(unique)} unique candidates")
# 4) rerank this span’s candidates
top_cits = rerank(span, s, unique, top_k=5)
if verbose:
print(" [Top citations]:")
for r in top_cits:
print(f" • {r.get('doi','')} (score={r.get('score')})")
span_results.append({
"span": span,
"citations": top_cits
})
output.append({
"sentence": s,
"claims": span_results
})
mapping = output # list of dicts with sentence and top_cits for each claim span
return mapping
def write_with_references(input_path: str, mapping: list[dict], output_path: str):
"""
Writes a new text file where:
- Each claim‐span in each sentence gets its inline citations inserted:
(Author1 et al., Year; Author2 et al., Year; …)
- At the end, a References section listing all unique DOI entries.
"""
original = Path(input_path).read_text(encoding="utf-8")
sentences = split_sentences(original)
seen = {} # citation_key -> doi for bibliography
annotated = []
def annotate_sentence(s: str, claims: list[dict]) -> str:
inserts = []
for claim in claims:
span = claim["span"]
start = s.find(span)
if start == -1:
continue
end = start + len(span)
keys = []
for c in claim["citations"]:
authors = c.get("authors", [])
author_last = authors[0].split()[-1] if authors else "Unknown"
year = c.get("year") or "n.d."
key = f"{author_last} et al., {year}"
seen[key] = c.get("doi", "")
keys.append(key)
if not keys:
continue
# inline citations
insert_text = " (" + "; ".join(keys) + ")"
inserts.append((start, end, insert_text))
# apply in reverse order
new_s = s
for start, end, text in sorted(inserts, key=lambda x: x[0], reverse=True):
new_s = new_s[:end] + text + new_s[end:]
return new_s
for item in mapping:
s = item["sentence"].strip()
if not item["claims"]:
annotated.append(s)
else:
annotated.append(annotate_sentence(s, item["claims"]))
body = " ".join(annotated)
# Build References section
ref_lines = ["\n\nReferences:"]
for key, doi in seen.items():
ref_lines.append(f"- {key}: DOI {doi}")
refs = "\n".join(ref_lines)
Path(output_path).write_text(body + refs, encoding="utf-8")
print(f"Wrote annotated file to {output_path} with {len(seen)} references.")
def main():
ap = argparse.ArgumentParser(description="Citation Finder Agent")
ap.add_argument("input_file", help="Text file with paragraph(s)")
ap.add_argument("--verbose", "-v", action="store_true", help="Print debug info")
args = ap.parse_args()
set_verbose(args.verbose)
text = Path(args.input_file).read_text(encoding="utf-8")
mapping = process_paragraph(text, verbose=args.verbose)
out_path = Path(args.input_file).with_name(
Path(args.input_file).stem + "_with_references.txt"
)
write_with_references(args.input_file, mapping, str(out_path))
if __name__ == "__main__":
main()
GitHub Events
Total
- Push event: 3
- Pull request event: 1
- Create event: 2
Last Year
- Push event: 3
- Pull request event: 1
- Create event: 2
Dependencies
- tu-llm-agent 0.1.0