chronos

Debugging-first language model achieving 65.3% autonomous bug fixing (6-7x better than GPT-4). Research, benchmarks & evaluation framework. Model available Q1 2026 via Kodezi OS.

https://github.com/kodezi/chronos

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.0%) to scientific vocabulary

Keywords

artificial-intelligence autonomous-debugging benchmark benchmark-report bug-fixing chronos code code-analysis code-analysis-tool code-debugger code-understanding debugging developer-tools kodezi language-model machine-learning program-repair software-engineering

Last synced: 6 months ago · JSON representation

Repository

Debugging-first language model achieving 65.3% autonomous bug fixing (6-7x better than GPT-4). Research, benchmarks & evaluation framework. Model available Q1 2026 via Kodezi OS.

Basic Info

Host: GitHub
Owner: Kodezi
License: other
Language: Python
Default Branch: main
Homepage: https://chronos.so/
Size: 219 KB

Statistics

Stars: 1
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0

Topics

Created 7 months ago · Last pushed 7 months ago

Metadata Files

Readme Changelog Contributing License Code of conduct Citation

README.md

# Kodezi Chronos ## The World's First Debugging-First Language Model for Repository-Scale, Memory-Driven Code Understanding [![arXiv](https://img.shields.io/badge/arXiv-2507.12482-b31b1b.svg?style=for-the-badge)](https://arxiv.org/abs/2507.12482) [![Model Access](https://img.shields.io/badge/Model-Chronos%20Waitlist-4B7BFF.svg?style=for-the-badge)](https://chronos.so) [![License](https://img.shields.io/badge/License-MIT-green.svg?style=for-the-badge)](LICENSE) [![Research](https://img.shields.io/badge/Research-Paper-orange.svg?style=for-the-badge)](paper/chronos-research.md) [![Benchmark](https://img.shields.io/badge/Benchmark-MRR-purple.svg?style=for-the-badge)](benchmarks/multi-random-retrieval/)

67.3% Autonomous Debugging Success 89% Human Preference 7.8 Average Fix Iterations 40% Time Reduction

Chronos Architecture

### Model Access Notice **Chronos is proprietary and available exclusively through Kodezi OS** | Timeline | Access | Details | |:--------:|:------:|:-------:| | **Q4 2025** | Beta | Limited enterprise access | | **Q1 2026** | GA | Via [Kodezi OS](https://kodezi.com/os) | This repository contains the MRR benchmark suite and evaluation framework only.

### Revolutionary AI That Debugs Like a Senior Developer **[Quick Start](QUICK_START.md)** **[Get Early Access](https://chronos.so)** **[Read Paper](paper/chronos-research.md)** **[View Benchmarks](benchmarks/)** **[Documentation](docs/)** **[Case Studies](results/case_studies/)**

MRR Benchmark Results

### Overall Performance (5,000 MRR Scenarios) | Metric | **Chronos** | **GPT-4.1** | **Claude-4** | **Gemini-2.0** | **Improvement** | |:------:|:------------------:|:---------:|:-----------------:|:------------------:|:---------------:| | **Debug Success Rate** | **67.3%** | 13.8% | 14.2% | 15.0% | **4.5x** | | **Root Cause Accuracy** | **89%*** | 12.3%1.8% | 11.7%2.0% | 15.8%1.5% | **5.6-7.6x** | | **Average Fix Iterations** | **7.8** | 1-2 | 1-2 | 1-2 | **More thorough** | | **Retrieval Precision** | **92%*** | 68%2.3% | 67%2.4% | 74%1.8% | **1.2-1.4x** | | **Time Reduction** | **40%** | - | - | - | **40% faster** | ***p < 0.001 compared to best baseline (two-tailed t-test, n=5,000)**

Performance by Bug Category

| Bug Category | **Chronos** | **GPT-4** | **Claude-3** | **Gemini-1.5** | **Chronos Advantage** | |:------------:|:-----------:|:---------:|:------------:|:--------------:|:--------------------:| | **Syntax Errors** | 94.2% | 82.3% | 79.8% | 85.1% | 1.1x | | **Logic Bugs** | 72.8% | 12.1% | 10.7% | 15.3% | **6.0x** | | **Concurrency Issues** | 58.3% | 3.2% | 2.8% | 4.1% | **18.2x** | | **Memory Problems** | 61.7% | 5.7% | 4.3% | 6.9% | **10.8x** | | **API Misuse** | 79.1% | 18.9% | 16.2% | 22.4% | **4.2x** | | **Performance Bugs** | 65.4% | 7.4% | 6.1% | 9.8% | **8.8x** |

Repository Scale Performance

| Repository Size | **Chronos Success** | **Best Baseline** | **Baseline Model** | **Improvement** | |:---------------:|:-------------------:|:-----------------:|:------------------:|:---------------:| | **<10K LOC** | 71.2%2.8% | 21.3%3.5% | Gemini-1.5-Pro | **3.3x** | | **10K-100K LOC** | 68.9%2.5% | 14.7%3.2% | Gemini-1.5-Pro | **4.7x** | | **100K-1M LOC** | 64.3%2.9% | 8.9%2.8% | Gemini-1.5-Pro | **7.2x** | | **>1M LOC** | 59.7%3.1% | 3.8%1.9% | Gemini-1.5-Pro | **15.7x** |

Key Innovations in Chronos

1. Debugging-First Architecture

Trained on 42.5M real debugging examples (not code completion)
Specialized for root cause analysis and multi-file patches
78.4% root cause accuracy vs 15.8% best baseline

2. Persistent Debug Memory (PDM)

Repository-specific learning from debugging sessions
Improves from 35% 65% success rate over time
Cross-session pattern recognition

3. Adaptive Graph-Guided Retrieval (AGR)

O(k log d) complexity with dynamic k-hop expansion
92% precision, 85% recall on multi-file context
Handles unlimited repository scale intelligently

4. Output-Optimized Design

Optimized for ~3K output tokens (fixes, tests, docs)
47.2% output entropy density vs 12.8% for completion models
Designed for complex patch generation

5. Autonomous Debugging Loop

Average 7.8 iterations to successful fix
Propose test analyze refine cycles
67.3% fully autonomous success rate

Architecture Overview

Seven-Layer System Design

Multi-Source Input Layer: Processes code, logs, traces, tests, docs simultaneously
Adaptive Retrieval Engine (AGR): Dynamic k-hop graph traversal (92% precision)
Debug-Tuned LLM Core: 42.5M debugging examples, not code completion
Orchestration Controller: Autonomous debugging loop management
Persistent Debug Memory: Repository-specific learning (35% 65% improvement)
Execution Sandbox: Isolated test validation environment
Explainability Layer: Human-readable root cause analysis

See architecture documentation for detailed specifications.

Multi-Random Retrieval (MRR) Benchmark

What is MRR?

MRR simulates real-world debugging complexity by: - Spatial Distribution: Bug context scattered across 10-50 files - Temporal Dispersion: Relevant information from 3-12 months of history - Obfuscation Levels: Low/medium/high code complexity - 5,000 Scenarios: Comprehensive evaluation across languages and bug types

MRR Results

| Metric | Chronos | GPT-4+RAG | Claude-3+VectorDB | Gemini-1.5+Graph | |:-------|:-------:|:---------:|:-----------------:|:----------------:| | Precision@10 | 89.2% | 42.3% | 48.1% | 51.7% | | Recall@10 | 84.7% | 31.7% | 36.2% | 41.8% | | Fix Accuracy | 67.3% | 8.9% | 11.2% | 14.6% | | Context Efficiency | 0.71 | 0.23 | 0.28 | 0.31 |

Full benchmark available in benchmarks/multi-random-retrieval/

Getting Started

Running the MRR Benchmark

```bash

Clone the repository

git clone https://github.com/kodezi/chronos-research.git cd chronos-research

Install dependencies

pip install -r requirements.txt

Run MRR benchmark on your model

python benchmarks/runmrrbenchmark2025.py \ --model yourmodel \ --scenarios 100 # Start with subset

Analyze results

python benchmarks/analyze_results.py ```

Model Access

** The Chronos model is not included in this repository**

Chronos will be available via Kodezi OS: - Q4 2025: Enterprise beta - Q1 2026: General availability - Join waitlist: chronos.so

Repository Contents

chronos-research/ benchmarks/ # MRR Benchmark Suite multi-random-retrieval/ # 5,000 scenario benchmark evaluation_metrics/ # Metrics implementation run_mrr_benchmark_2025.py # Main benchmark runner reference_implementations/ # Algorithm references (NOT the model) algorithms/ # AGR, PDM implementations NOTICE.md # Proprietary model notice paper/ # Research paper chronos-research-2025.md # Full paper (arXiv:2507.12482) results/ # Performance data raw_data/ # 5,000 scenario results case_studies/ # Debugging examples figures/ # Paper visualizations paper_figures/ # 11 paper figures docs/ # Documentation MODEL_ACCESS.md # How to access Chronos LEADERBOARD.md # Performance rankings

Research Highlights

Training Dataset

42.5M debugging examples (not code completion)
15M GitHub issues with fixes
8M stack traces with resolutions
3M CI/CD debugging logs
2.5M production sessions
14M curated from Defects4J, SWE-bench, BugsInPy

AGR Performance

k=1 hop: 58.2% success
k=2 hops: 72.4% success
k=adaptive: 87.1% success
Flat retrieval: 23.4% success

PDM Learning Curve

Initial: 35% success rate
After 100 sessions: 52% success
After 500 sessions: 65% success
7.3x token efficiency gain

Detailed Performance Analysis

Language-Specific Performance

| Language | **Chronos** | **GPT-4** | **Claude-3** | **Gemini-1.5** | Test Suite | |:--------:|:-----------:|:---------:|:------------:|:--------------:|:----------:| | **Python** | 68.7%2.1% | 11.2%2.8% | 10.3%2.9% | 14.6%2.6% | 1,823 bugs | | **JavaScript** | 64.2%2.3% | 7.8%2.5% | 6.9%2.6% | 10.1%2.4% | 1,547 bugs | | **Java** | 63.9%2.2% | 6.3%2.2% | 5.7%2.3% | 9.2%2.1% | 1,630 bugs | | **Go** | 66.8%2.4% | 9.1%2.6% | 8.4%2.7% | 12.3%2.5% | 892 bugs | | **C++** | 61.2%2.6% | 5.2%2.1% | 4.8%2.2% | 7.9%2.0% | 1,108 bugs |

Debugging Cycle Efficiency

| Iteration | **Chronos Success** | **GPT-4 Success** | **Time Reduction** | |:---------:|:-------------------:|:-----------------:|:------------------:| | 1st Attempt | 42.3% | 3.2% | -87% time | | 2nd Attempt | 58.7% (+16.4%) | 5.1% (+1.9%) | -83% time | | 3rd Attempt | 65.3% (+6.6%) | 6.8% (+1.7%) | -79% time | | 4+ Attempts | 65.3% (converged) | 8.5% (+1.7%) | -74% time |

Context Window Efficiency

| Model | Context Size | Debug Success | Note | |:------|:------------:|:-------------:|:-----| | GPT-4-32K | 32K tokens | 7.2% | More context better debugging | | Claude-3-200K | 200K tokens | 9.8% | Attention dilution at scale | | Gemini-1.5-Pro-1M | 1M tokens | 14.3% | Best traditional model | | **Chronos** | **Unlimited*** | **71.2%** | *Via intelligent retrieval |

Ablation Studies

### Component Contribution Analysis | Configuration | Debug Success | Impact | |:--------------|:-------------:|:-------| | **Full Chronos** | **65.3%** | Complete system | | No Multi-Code Association | 35.8% | -45% performance | | Static Memory Only | 40.1% | -39% performance | | No Orchestration Loop | 42.5% | -35% performance | | No AGR (Flat Retrieval) | 28.7% | -56% performance |

Documentation

Contributing

We welcome contributions to the evaluation framework and benchmarks!

```bash

Fork and clone

git clone https://github.com/[your-username]/chronos-research cd chronos-research

Create feature branch

git checkout -b feature/your-contribution

Make changes and test

python -m pytest tests/

Submit PR

git push origin feature/your-contribution ```

See CONTRIBUTING.md for detailed guidelines.

Citation

If you use this research in your work, please cite:

bibtex @article{khan2025chronos, title={Kodezi Chronos: A Debugging-First Language Model for Repository-Scale, Memory-Driven Code Understanding}, author={Khan, Ishraq and Chowdary, Assad and Haseeb, Sharoz and Patel, Urvish}, journal={arXiv preprint arXiv:2507.12482}, year={2025}, url={https://arxiv.org/abs/2507.12482} }

About Kodezi

Kodezi is building the future of autonomous software maintenance. Our mission is to empower developers with AI that truly understands code at scale.

Contact & Community

### Connect With Us [![Website](https://img.shields.io/badge/Website-kodezi.com-blue?style=for-the-badge)](https://kodezi.com) [![Paper](https://img.shields.io/badge/Paper-arXiv:2507.12482-red?style=for-the-badge)](https://arxiv.org/abs/2507.12482) [![Twitter](https://img.shields.io/badge/Twitter-@KodeziHQ-1DA1F2?style=for-the-badge&logo=twitter)](https://twitter.com/kodezihq) [![LinkedIn](https://img.shields.io/badge/LinkedIn-Kodezi-0077B5?style=for-the-badge&logo=linkedin)](https://linkedin.com/company/kodezi) [![Email](https://img.shields.io/badge/Email-research@kodezi.com-D14836?style=for-the-badge&logo=gmail)](mailto:research@kodezi.com) ### Join the Discussion [![GitHub Discussions](https://img.shields.io/badge/GitHub-Discussions-181717?style=for-the-badge&logo=github)](https://github.com/kodezi/chronos/discussions)

License

This research repository is licensed under the MIT License - see LICENSE for details.

** Important**: The Kodezi Chronos model itself is proprietary technology and is not included in this repository. Model waitlist access is available at chronos.so.

### The Future of Debugging is Here

[Join Waitlist ](https://chronos.so) | [Read Paper ](https://arxiv.org/abs/2507.12482) | [Learn More ](https://chronos.so)

_{Built with by the Kodezi Team}

Owner

Name: Kodezi
Login: Kodezi
Kind: organization
Email: info@kodezi.com
Location: United States of America

Website: https://kodezi.com/
Twitter: KodeziHQ
Repositories: 1
Profile: https://github.com/Kodezi

Kodezi is an AI platform providing tools to maximize efficiency when programming.

GitHub Events

Total

Watch event: 2,807
Public event: 1
Push event: 3
Fork event: 23
Create event: 3

Last Year

Watch event: 2,807
Public event: 1
Push event: 3
Fork event: 23
Create event: 3

Dependencies

.github/workflows/quality.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite

.github/workflows/tests.yml actions

actions/cache v3 composite
actions/checkout v3 composite
actions/setup-python v4 composite
codecov/codecov-action v3 composite

requirements.txt pypi

black >=21.6b0
bokeh >=2.4.0
flake8 >=3.9.0
ipykernel >=6.0.0
jupyter >=1.0.0
matplotlib >=3.4.0
mypy >=0.910
myst-parser >=0.18.0
notebook >=6.4.0
numpy >=1.21.0
pandas >=1.3.0
plotly >=5.0.0
pytest >=6.2.0
pytest-cov >=2.12.0
python-dotenv >=0.19.0
pyyaml >=5.4.0
scikit-learn >=0.24.0
scipy >=1.7.0
seaborn >=0.11.0
sphinx >=4.0.0
sphinx-rtd-theme >=1.0.0
tqdm >=4.62.0

setup.py pypi

matplotlib >=3.4.0
numpy >=1.21.0
pandas >=1.3.0
pyyaml >=5.4.0
scikit-learn >=0.24.0
scipy >=1.7.0
seaborn >=0.11.0
tqdm >=4.62.0