text-citation-agent

https://github.com/atimcenko/text-citation-agent

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.0%) to scientific vocabulary

Last synced: 6 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: atimcenko
Language: Python
Default Branch: main
Size: 16.6 KB

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0

Created 7 months ago · Last pushed 7 months ago

Metadata Files

Readme Citation

Citation Finder Agent

A lightweight command-line tool that automatically detects factual claims in your text, retrieves relevant scholarly references via OpenAlex, and annotates your document with inline citations and a generated bibliography.

Features

Claim Detection
Uses a tuned LLM prompt to identify factual claims within each sentence and extract minimal “claim spans.”
Query Generation
Converts each claim span into concise search queries tailored for scholarly discovery.
Reference Retrieval
Leverages the OpenAlex API to fetch candidate papers for each query.
Candidate Reranking
Summarizes paper abstracts via LLM and ranks them by relevance to each claim.
Multi-Citation Support
Attaches multiple high-scoring references to each claim, rather than a single “best” match.
Automatic Annotation
Inserts inline citations (e.g. (Smith et al., 2020; Doe et al., 2018)) and compiles a “References” section at the end of your document.
Configurable Parameters
Control maximum candidates, top-K citations, retry logic, LLM model choice, verbosity, and more via .env.
Easy to Extend
Modular architecture—swap out LLM providers, retrieval backends, or tuning prompts with minimal code changes.

Installation

Ensure that you have uv installed on your system!

Clone the repository
```bash git clone https://github.com/yourusername/tu-llm-agent.git cd tu-llm-agent
Set up the virtual environment using uv ```bash uv sync
Copy and populate environment variables ```bash cp .env.example .env

Owner

Name: Aleksejs Timcenko
Login: atimcenko
Kind: user
Location: Riga, Latvia

Twitter: AlexeyTimchenk3
Repositories: 1
Profile: https://github.com/atimcenko

Senior Python developer. Skills: - List sorting - List inversion - Text printing

Citation (citation_agent/agent.py)

#!/usr/bin/env python3
"""
Citation Finder Agent

Usage:
  uv run citation_agent/agent.py data/your_input.txt [--verbose]
  python citation_agent/agent.py data/your_input.txt [-v]

This will read <your_input>.txt, annotate each claim‐span with inline citations,
and write the result to data/your_input_output.txt.

Options:
  -v, --verbose    Print debug logging for LLM calls and retrieval steps.

Configuration is via environment variables documented in .env.example.
"""

import argparse
from pathlib import Path
from utils.text import split_sentences
from retrieval.openalex import get_top_references
from models.llm import (
    set_verbose,
    detect_claims,
    gen_queries,
    rerank
)

def process_paragraph(text: str, verbose: bool = False):
    sentences = split_sentences(text)
    output = []

    # loop through the sentences
    for s in sentences:
        if verbose:
            print(f"\n▶ [Sentence] {s}")
        
        # tag if the sentence needs supporting evidence at all
        tag = detect_claims(s)
        if verbose:
            print("   [Tag] ", tag)

        # if does not need, proceed
        if not tag["needs_cite"]:
            if verbose:
                print("   → No citation needed")
            output.append({"sentence": s, "claims": []})
            continue
        
        # fetch unique claims created by claim extractor agent
        spans = list(dict.fromkeys(tag["claim_spans"]))

        span_results = []
        # for each claim span run the loop
        for span in spans:
            if verbose:
                print(f"  [Claim Span] {span}")

            # 1) generate queries
            queries = gen_queries(span, s)
            if verbose:
                print("    [Queries] ", queries)

            # 2) retrieve candidate references
            all_cands = []
            for q in queries:
                if verbose:
                    print(f"    [Search] '{q}'")
                try:
                    # get top references from openalex
                    refs = get_top_references(q)
                    if not isinstance(refs, list):
                        raise RuntimeError(f"Expected list, got {type(refs)}")
                    if verbose:
                        print(f"      [Results] {len(refs)}")
                except Exception as e:
                    print(f"      ⚠️  OpenAlex search failed for '{q}': {e}")
                    refs = []
                all_cands.extend(refs)


            # 3) dedupe by DOI
            seen = set()
            unique = []
            for c in all_cands:
                doi = c.get("doi", "")
                if doi and doi not in seen:
                    seen.add(doi)
                    unique.append(c)
            if verbose:
                print(f"    [Dedupe] {len(unique)} unique candidates")

            # 4) rerank this span’s candidates
            top_cits = rerank(span, s, unique, top_k=5)
            if verbose:
                print("    [Top citations]:")
                for r in top_cits:
                    print(f"       • {r.get('doi','')} (score={r.get('score')})")

            span_results.append({
                "span": span,
                "citations": top_cits
            })

        output.append({
            "sentence": s,
            "claims": span_results
        })

    mapping = output # list of dicts with sentence and top_cits for each claim span

    return mapping


def write_with_references(input_path: str, mapping: list[dict], output_path: str):
    """
    Writes a new text file where:
    - Each claim‐span in each sentence gets its inline citations inserted:
        (Author1 et al., Year; Author2 et al., Year; …)
    - At the end, a References section listing all unique DOI entries.
    """
    original = Path(input_path).read_text(encoding="utf-8")
    sentences = split_sentences(original)

    seen = {}      # citation_key -> doi for bibliography
    annotated = []

    def annotate_sentence(s: str, claims: list[dict]) -> str:
        inserts = []
        for claim in claims:
            span = claim["span"]
            start = s.find(span)
            if start == -1:
                continue
            end = start + len(span)

            keys = []
            for c in claim["citations"]:
                authors = c.get("authors", [])
                author_last = authors[0].split()[-1] if authors else "Unknown"
                year = c.get("year") or "n.d."
                key = f"{author_last} et al., {year}"
                seen[key] = c.get("doi", "")
                keys.append(key)

            if not keys:
                continue
            
            # inline citations
            insert_text = " (" + "; ".join(keys) + ")"
            inserts.append((start, end, insert_text))

        # apply in reverse order
        new_s = s
        for start, end, text in sorted(inserts, key=lambda x: x[0], reverse=True):
            new_s = new_s[:end] + text + new_s[end:]
        return new_s

    for item in mapping:
        s = item["sentence"].strip()
        if not item["claims"]:
            annotated.append(s)
        else:
            annotated.append(annotate_sentence(s, item["claims"]))

    body = " ".join(annotated)

    # Build References section
    ref_lines = ["\n\nReferences:"]
    for key, doi in seen.items():
        ref_lines.append(f"- {key}: DOI {doi}")
    refs = "\n".join(ref_lines)

    Path(output_path).write_text(body + refs, encoding="utf-8")
    print(f"Wrote annotated file to {output_path} with {len(seen)} references.")


def main():
    ap = argparse.ArgumentParser(description="Citation Finder Agent")
    ap.add_argument("input_file", help="Text file with paragraph(s)")
    ap.add_argument("--verbose", "-v", action="store_true", help="Print debug info")
    args = ap.parse_args()

    set_verbose(args.verbose)

    text = Path(args.input_file).read_text(encoding="utf-8")
    mapping = process_paragraph(text, verbose=args.verbose)

    out_path = Path(args.input_file).with_name(
        Path(args.input_file).stem + "_with_references.txt"
    )
    write_with_references(args.input_file, mapping, str(out_path))


if __name__ == "__main__":
    main()

GitHub Events

Total

Push event: 3
Pull request event: 1
Create event: 2

Last Year

Push event: 3
Pull request event: 1
Create event: 2

Dependencies

pyproject.toml pypi

uv.lock pypi

tu-llm-agent 0.1.0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science