Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.0%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: atimcenko
  • Language: Python
  • Default Branch: main
  • Size: 16.6 KB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 7 months ago · Last pushed 7 months ago
Metadata Files
Readme Citation

README.md

Citation Finder Agent

A lightweight command-line tool that automatically detects factual claims in your text, retrieves relevant scholarly references via OpenAlex, and annotates your document with inline citations and a generated bibliography.


Features

  • Claim Detection
    Uses a tuned LLM prompt to identify factual claims within each sentence and extract minimal “claim spans.”

  • Query Generation
    Converts each claim span into concise search queries tailored for scholarly discovery.

  • Reference Retrieval
    Leverages the OpenAlex API to fetch candidate papers for each query.

  • Candidate Reranking
    Summarizes paper abstracts via LLM and ranks them by relevance to each claim.

  • Multi-Citation Support
    Attaches multiple high-scoring references to each claim, rather than a single “best” match.

  • Automatic Annotation
    Inserts inline citations (e.g. (Smith et al., 2020; Doe et al., 2018)) and compiles a “References” section at the end of your document.

  • Configurable Parameters
    Control maximum candidates, top-K citations, retry logic, LLM model choice, verbosity, and more via .env.

  • Easy to Extend
    Modular architecture—swap out LLM providers, retrieval backends, or tuning prompts with minimal code changes.


Installation

Ensure that you have uv installed on your system!

  1. Clone the repository
    ```bash git clone https://github.com/yourusername/tu-llm-agent.git cd tu-llm-agent
  2. Set up the virtual environment using uv ```bash uv sync
  3. Copy and populate environment variables ```bash cp .env.example .env

Owner

  • Name: Aleksejs Timcenko
  • Login: atimcenko
  • Kind: user
  • Location: Riga, Latvia

Senior Python developer. Skills: - List sorting - List inversion - Text printing

Citation (citation_agent/agent.py)

#!/usr/bin/env python3
"""
Citation Finder Agent

Usage:
  uv run citation_agent/agent.py data/your_input.txt [--verbose]
  python citation_agent/agent.py data/your_input.txt [-v]

This will read <your_input>.txt, annotate each claim‐span with inline citations,
and write the result to data/your_input_output.txt.

Options:
  -v, --verbose    Print debug logging for LLM calls and retrieval steps.

Configuration is via environment variables documented in .env.example.
"""

import argparse
from pathlib import Path
from utils.text import split_sentences
from retrieval.openalex import get_top_references
from models.llm import (
    set_verbose,
    detect_claims,
    gen_queries,
    rerank
)

def process_paragraph(text: str, verbose: bool = False):
    sentences = split_sentences(text)
    output = []

    # loop through the sentences
    for s in sentences:
        if verbose:
            print(f"\n▶ [Sentence] {s}")
        
        # tag if the sentence needs supporting evidence at all
        tag = detect_claims(s)
        if verbose:
            print("   [Tag] ", tag)

        # if does not need, proceed
        if not tag["needs_cite"]:
            if verbose:
                print("   → No citation needed")
            output.append({"sentence": s, "claims": []})
            continue
        
        # fetch unique claims created by claim extractor agent
        spans = list(dict.fromkeys(tag["claim_spans"]))

        span_results = []
        # for each claim span run the loop
        for span in spans:
            if verbose:
                print(f"  [Claim Span] {span}")

            # 1) generate queries
            queries = gen_queries(span, s)
            if verbose:
                print("    [Queries] ", queries)

            # 2) retrieve candidate references
            all_cands = []
            for q in queries:
                if verbose:
                    print(f"    [Search] '{q}'")
                try:
                    # get top references from openalex
                    refs = get_top_references(q)
                    if not isinstance(refs, list):
                        raise RuntimeError(f"Expected list, got {type(refs)}")
                    if verbose:
                        print(f"      [Results] {len(refs)}")
                except Exception as e:
                    print(f"      ⚠️  OpenAlex search failed for '{q}': {e}")
                    refs = []
                all_cands.extend(refs)


            # 3) dedupe by DOI
            seen = set()
            unique = []
            for c in all_cands:
                doi = c.get("doi", "")
                if doi and doi not in seen:
                    seen.add(doi)
                    unique.append(c)
            if verbose:
                print(f"    [Dedupe] {len(unique)} unique candidates")

            # 4) rerank this span’s candidates
            top_cits = rerank(span, s, unique, top_k=5)
            if verbose:
                print("    [Top citations]:")
                for r in top_cits:
                    print(f"       • {r.get('doi','')} (score={r.get('score')})")

            span_results.append({
                "span": span,
                "citations": top_cits
            })

        output.append({
            "sentence": s,
            "claims": span_results
        })

    mapping = output # list of dicts with sentence and top_cits for each claim span

    return mapping


def write_with_references(input_path: str, mapping: list[dict], output_path: str):
    """
    Writes a new text file where:
    - Each claim‐span in each sentence gets its inline citations inserted:
        (Author1 et al., Year; Author2 et al., Year; …)
    - At the end, a References section listing all unique DOI entries.
    """
    original = Path(input_path).read_text(encoding="utf-8")
    sentences = split_sentences(original)

    seen = {}      # citation_key -> doi for bibliography
    annotated = []

    def annotate_sentence(s: str, claims: list[dict]) -> str:
        inserts = []
        for claim in claims:
            span = claim["span"]
            start = s.find(span)
            if start == -1:
                continue
            end = start + len(span)

            keys = []
            for c in claim["citations"]:
                authors = c.get("authors", [])
                author_last = authors[0].split()[-1] if authors else "Unknown"
                year = c.get("year") or "n.d."
                key = f"{author_last} et al., {year}"
                seen[key] = c.get("doi", "")
                keys.append(key)

            if not keys:
                continue
            
            # inline citations
            insert_text = " (" + "; ".join(keys) + ")"
            inserts.append((start, end, insert_text))

        # apply in reverse order
        new_s = s
        for start, end, text in sorted(inserts, key=lambda x: x[0], reverse=True):
            new_s = new_s[:end] + text + new_s[end:]
        return new_s

    for item in mapping:
        s = item["sentence"].strip()
        if not item["claims"]:
            annotated.append(s)
        else:
            annotated.append(annotate_sentence(s, item["claims"]))

    body = " ".join(annotated)

    # Build References section
    ref_lines = ["\n\nReferences:"]
    for key, doi in seen.items():
        ref_lines.append(f"- {key}: DOI {doi}")
    refs = "\n".join(ref_lines)

    Path(output_path).write_text(body + refs, encoding="utf-8")
    print(f"Wrote annotated file to {output_path} with {len(seen)} references.")


def main():
    ap = argparse.ArgumentParser(description="Citation Finder Agent")
    ap.add_argument("input_file", help="Text file with paragraph(s)")
    ap.add_argument("--verbose", "-v", action="store_true", help="Print debug info")
    args = ap.parse_args()

    set_verbose(args.verbose)

    text = Path(args.input_file).read_text(encoding="utf-8")
    mapping = process_paragraph(text, verbose=args.verbose)

    out_path = Path(args.input_file).with_name(
        Path(args.input_file).stem + "_with_references.txt"
    )
    write_with_references(args.input_file, mapping, str(out_path))


if __name__ == "__main__":
    main()

GitHub Events

Total
  • Push event: 3
  • Pull request event: 1
  • Create event: 2
Last Year
  • Push event: 3
  • Pull request event: 1
  • Create event: 2

Dependencies

pyproject.toml pypi
uv.lock pypi
  • tu-llm-agent 0.1.0