Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.6%) to scientific vocabulary
Last synced: 7 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: beiyonder
  • Language: Python
  • Default Branch: main
  • Size: 5.86 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 11 months ago · Last pushed 11 months ago
Metadata Files
Readme Citation

readme.md

RepoMap-ish Symbol Ranker

Scans Python files, extracts symbol definitions and references using Tree-sitter (make sure it's tree-sitter==0.21.3), builds a directed graph of symbol usage, and runs personalized PageRank to figure out which files matter most. Outputs a rough LLM-friendly context blob.

But why?

LLM friendly context: Unlike raw code, which can overwhelm LLMs with tokens (e.g., thousands of lines), the output is a distilled summary of key symbols and their locations.

Token Efficiency: The output is compact (154 tokens), preserving LLM context window space for user queries or additional data.

Indexing: This is a precursor to a full indexing system for RAG.

Graph-Based Ranking: The PageRank algorithm prioritizes important files based on symbol references, ensuring the LLM sees the most relevant parts first

How it helps apna llm: Architectural Understanding, Reduced Cognitive Load, Contextual Relevance

What next? -- full-blown context mgmnt sys

Current: Only processes Python files. Goal: Handle codebases, docs, emails, and database schemas.

Current: In-memory tags with simple dictionary cache. Goal: Store indexed data in a structured, queryable format.

Current: Extracts definitions and references with basic graph connections. Goal: Deeper analysis of code relationships and semantics.

RAG Retrieval:: Current: Static output with no query mechanism. Goal: Query-based retrieval of relevant context. sentence-transformers, top-k relevant items

Current: Outputs context but doesn’t interact with an LLM. Goal: Prepare prompts, send to LLM, and process responses.

Example output:

``` Scanning files... Parsing files: 100%|██████████| 4/4 [00:00<00:00, 25.00it/s]

--- Repo Map Context (approx. 200 tokens) --- aider\utils.py: line 16: def IgnorantTemporaryDirectory(suffix=None, prefix=None, dir=None): line 17: docstring: A context manager for temporary directories... line 62: def GitTemporaryDirectory(): line 63: docstring: Creates a temporary git repository...

--- LLM Response --- To create a temporary git repository, use the GitTemporaryDirectory function from aider\utils.py. Here's an example:

from aider.utils import GitTemporaryDirectory

with GitTemporaryDirectory() as repodir: # Use repodir as a temporary git repository print(f"Repository created at {repo_dir}") ```

Owner

  • Login: beiyonder
  • Kind: user

Citation (citation/citations.md)

<!-- Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG -->
![Long-Context LLMs Meet RAG](lclm_techniques.png)
*Bowen Jin, Jinsung Yoon, Jiawei Han, Sercan Ö. Arık. Google Cloud AI Research, University of Illinois at Urbana-Champaign*

<!-- Hierarchical Summarization: Scaling Up Multi-Document Summarization -->
![Hierarchical Summarization](heirar.png)
*Janara Christensen, Stephen Soderland, Gagan Bansal, Mausam. University of Washington, Indian Institute of Technology Delhi*

<!-- GRAG: Graph Retrieval-Augmented Generation -->
![GRAG: Graph Retrieval-Augmented Generation](grag.png)
*Yuntong Hu, Zhihan Lei, Zheng Zhang, Bo Pan, Chen Ling, Liang Zhao. Emory University*

<!-- A Comprehensive Survey on Long Context Language Modeling -->
![A Comprehensive Survey on Long Context Language Modeling](image.png)
*Jiaheng Liu, Dawei Zhu, Zhiqi Bai, Yancheng He, Huanxuan Liao, Haoran Que, Zekun Wang, Chenchen Zhang, Ge Zhang, Jiebin Zhang, Yuanxing Zhang, Zhuo Chen, Hangyu Guo, Shilong Li, Ziqiang Liu, Yong Shan, Yifan Song, Jiayi Tian, Wenhao Wu, Zhejian Zhou, Ruijie Zhu, Junlan Feng, Yang Gao, Shizhu He, Zhoujun Li, Tianyu Liu, Fanyu Meng, Wenbo Su, Yingshui Tan, Zili Wang, Jian Yang, Wei Ye, Bo Zheng, Wangchunshu Zhou, Wenhao Huang, Sujian Li, Zhaoxiang Zhang. NJU, PKU, CASIA, Alibaba, ByteDance, Tencent, Kuaishou, M-A-P*

GitHub Events

Total
  • Push event: 3
  • Create event: 2
Last Year
  • Push event: 3
  • Create event: 2

Dependencies

requirements.txt pypi
  • networkx *
  • pygments *
  • tqdm *
  • tree-sitter ==0.21.3
  • tree-sitter-languages *