chronos
Debugging-first language model achieving 65.3% autonomous bug fixing (6-7x better than GPT-4). Research, benchmarks & evaluation framework. Model available Q1 2026 via Kodezi OS.
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.0%) to scientific vocabulary
Keywords
Repository
Debugging-first language model achieving 65.3% autonomous bug fixing (6-7x better than GPT-4). Research, benchmarks & evaluation framework. Model available Q1 2026 via Kodezi OS.
Basic Info
- Host: GitHub
- Owner: Kodezi
- License: other
- Language: Python
- Default Branch: main
- Homepage: https://chronos.so/
- Size: 219 KB
Statistics
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
67.3% Autonomous Debugging Success 89% Human Preference 7.8 Average Fix Iterations 40% Time Reduction
MRR Benchmark Results
Performance by Bug Category
Repository Scale Performance
Key Innovations in Chronos
1. Debugging-First Architecture
- Trained on 42.5M real debugging examples (not code completion)
- Specialized for root cause analysis and multi-file patches
- 78.4% root cause accuracy vs 15.8% best baseline
2. Persistent Debug Memory (PDM)
- Repository-specific learning from debugging sessions
- Improves from 35% 65% success rate over time
- Cross-session pattern recognition
3. Adaptive Graph-Guided Retrieval (AGR)
- O(k log d) complexity with dynamic k-hop expansion
- 92% precision, 85% recall on multi-file context
- Handles unlimited repository scale intelligently
4. Output-Optimized Design
- Optimized for ~3K output tokens (fixes, tests, docs)
- 47.2% output entropy density vs 12.8% for completion models
- Designed for complex patch generation
5. Autonomous Debugging Loop
- Average 7.8 iterations to successful fix
- Propose test analyze refine cycles
- 67.3% fully autonomous success rate
Architecture Overview
Seven-Layer System Design
- Multi-Source Input Layer: Processes code, logs, traces, tests, docs simultaneously
- Adaptive Retrieval Engine (AGR): Dynamic k-hop graph traversal (92% precision)
- Debug-Tuned LLM Core: 42.5M debugging examples, not code completion
- Orchestration Controller: Autonomous debugging loop management
- Persistent Debug Memory: Repository-specific learning (35% 65% improvement)
- Execution Sandbox: Isolated test validation environment
- Explainability Layer: Human-readable root cause analysis
See architecture documentation for detailed specifications.
Multi-Random Retrieval (MRR) Benchmark
What is MRR?
MRR simulates real-world debugging complexity by: - Spatial Distribution: Bug context scattered across 10-50 files - Temporal Dispersion: Relevant information from 3-12 months of history - Obfuscation Levels: Low/medium/high code complexity - 5,000 Scenarios: Comprehensive evaluation across languages and bug types
MRR Results
| Metric | Chronos | GPT-4+RAG | Claude-3+VectorDB | Gemini-1.5+Graph | |:-------|:-------:|:---------:|:-----------------:|:----------------:| | Precision@10 | 89.2% | 42.3% | 48.1% | 51.7% | | Recall@10 | 84.7% | 31.7% | 36.2% | 41.8% | | Fix Accuracy | 67.3% | 8.9% | 11.2% | 14.6% | | Context Efficiency | 0.71 | 0.23 | 0.28 | 0.31 |
Full benchmark available in benchmarks/multi-random-retrieval/
Getting Started
Running the MRR Benchmark
```bash
Clone the repository
git clone https://github.com/kodezi/chronos-research.git cd chronos-research
Install dependencies
pip install -r requirements.txt
Run MRR benchmark on your model
python benchmarks/runmrrbenchmark2025.py \ --model yourmodel \ --scenarios 100 # Start with subset
Analyze results
python benchmarks/analyze_results.py ```
Model Access
** The Chronos model is not included in this repository**
Chronos will be available via Kodezi OS: - Q4 2025: Enterprise beta - Q1 2026: General availability - Join waitlist: chronos.so
Repository Contents
chronos-research/
benchmarks/ # MRR Benchmark Suite
multi-random-retrieval/ # 5,000 scenario benchmark
evaluation_metrics/ # Metrics implementation
run_mrr_benchmark_2025.py # Main benchmark runner
reference_implementations/ # Algorithm references (NOT the model)
algorithms/ # AGR, PDM implementations
NOTICE.md # Proprietary model notice
paper/ # Research paper
chronos-research-2025.md # Full paper (arXiv:2507.12482)
results/ # Performance data
raw_data/ # 5,000 scenario results
case_studies/ # Debugging examples
figures/ # Paper visualizations
paper_figures/ # 11 paper figures
docs/ # Documentation
MODEL_ACCESS.md # How to access Chronos
LEADERBOARD.md # Performance rankings
Research Highlights
Training Dataset
- 42.5M debugging examples (not code completion)
- 15M GitHub issues with fixes
- 8M stack traces with resolutions
- 3M CI/CD debugging logs
- 2.5M production sessions
- 14M curated from Defects4J, SWE-bench, BugsInPy
AGR Performance
- k=1 hop: 58.2% success
- k=2 hops: 72.4% success
- k=adaptive: 87.1% success
- Flat retrieval: 23.4% success
PDM Learning Curve
- Initial: 35% success rate
- After 100 sessions: 52% success
- After 500 sessions: 65% success
- 7.3x token efficiency gain
Detailed Performance Analysis
Language-Specific Performance
Debugging Cycle Efficiency
Context Window Efficiency
Ablation Studies
Documentation
Contributing
We welcome contributions to the evaluation framework and benchmarks!
```bash
Fork and clone
git clone https://github.com/[your-username]/chronos-research cd chronos-research
Create feature branch
git checkout -b feature/your-contribution
Make changes and test
python -m pytest tests/
Submit PR
git push origin feature/your-contribution ```
See CONTRIBUTING.md for detailed guidelines.
Citation
If you use this research in your work, please cite:
bibtex
@article{khan2025chronos,
title={Kodezi Chronos: A Debugging-First Language Model for
Repository-Scale, Memory-Driven Code Understanding},
author={Khan, Ishraq and Chowdary, Assad and
Haseeb, Sharoz and Patel, Urvish},
journal={arXiv preprint arXiv:2507.12482},
year={2025},
url={https://arxiv.org/abs/2507.12482}
}
About Kodezi
Kodezi is building the future of autonomous software maintenance. Our mission is to empower developers with AI that truly understands code at scale.
Contact & Community
License
This research repository is licensed under the MIT License - see LICENSE for details.
** Important**: The Kodezi Chronos model itself is proprietary technology and is not included in this repository. Model waitlist access is available at chronos.so.
**[Join Waitlist ](https://chronos.so)** | **[Read Paper ](https://arxiv.org/abs/2507.12482)** | **[Learn More ](https://chronos.so)**
Built with by the Kodezi TeamOwner
- Name: Kodezi
- Login: Kodezi
- Kind: organization
- Email: info@kodezi.com
- Location: United States of America
- Website: https://kodezi.com/
- Twitter: KodeziHQ
- Repositories: 1
- Profile: https://github.com/Kodezi
Kodezi is an AI platform providing tools to maximize efficiency when programming.
GitHub Events
Total
- Watch event: 2,807
- Public event: 1
- Push event: 3
- Fork event: 23
- Create event: 3
Last Year
- Watch event: 2,807
- Public event: 1
- Push event: 3
- Fork event: 23
- Create event: 3
Dependencies
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/cache v3 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- codecov/codecov-action v3 composite
- black >=21.6b0
- bokeh >=2.4.0
- flake8 >=3.9.0
- ipykernel >=6.0.0
- jupyter >=1.0.0
- matplotlib >=3.4.0
- mypy >=0.910
- myst-parser >=0.18.0
- notebook >=6.4.0
- numpy >=1.21.0
- pandas >=1.3.0
- plotly >=5.0.0
- pytest >=6.2.0
- pytest-cov >=2.12.0
- python-dotenv >=0.19.0
- pyyaml >=5.4.0
- scikit-learn >=0.24.0
- scipy >=1.7.0
- seaborn >=0.11.0
- sphinx >=4.0.0
- sphinx-rtd-theme >=1.0.0
- tqdm >=4.62.0
- matplotlib >=3.4.0
- numpy >=1.21.0
- pandas >=1.3.0
- pyyaml >=5.4.0
- scikit-learn >=0.24.0
- scipy >=1.7.0
- seaborn >=0.11.0
- tqdm >=4.62.0