loghub
A large collection of system log datasets for AI-driven log analytics [ISSRE'23]
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org, zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.9%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
A large collection of system log datasets for AI-driven log analytics [ISSRE'23]
Statistics
- Stars: 2,287
- Watchers: 57
- Forks: 696
- Open Issues: 1
- Releases: 0
Topics
Metadata Files
README.md
Loghub
Loghub maintains a collection of system logs, which are freely accessible for AI-driven log analytics research. Some of the logs are production data released from previous studies, while some others are collected from real systems in our lab environment. Wherever possible, the logs are NOT sanitized, anonymized or modified in any way. These log datasets are freely available for research or academic work.
🤗 We proudly announce that the loghub datasets have attained total by more than 450 organizations from both industry and academia.
Logs currently available
🔗 Get raw logs via hyperlinks in the Download column.
| Dataset | Description | Labeled | Time Span | #Lines | Raw Size | Download | | :---------------------------- | :--------| :--------: | --------: | ---------: | ------: | :------: | |
| HDFS_v2 | Hadoop distributed file system log| | N.A. | 71,118,073 | 16.06GiB | :link: | | HDFS_v3 | Instrumented HDFS trace log (TraceBench) | :heavycheckmark: | N.A. | 14,778,079 | 2.96GiB | :link: | | Hadoop | Hadoop mapreduce job log | :heavycheckmark: (Check #56) | N.A. | 394,308 | 48.61MiB | :link: | | Spark | Spark job log || N.A. | 33,236,604 | 2.75GiB | :link: |
| Zookeeper | ZooKeeper service log | | 26.7 days | 74,380 | 9.95MiB | :link: | | OpenStack | OpenStack infrastructure log | :heavycheckmark: | N.A. | 207,820 | 58.61MiB | :link: |
|
| Thunderbird | Thunderbird supercomputer log | :heavycheckmark: | 244 days | 211,212,192 | 29.60GiB | :link: | |
| Windows | Windows event log | | 226.7 days | 114,608,388 | 26.09GiB | :link: |
| Linux | Linux system log | | 263.9 days | 25,567 | 2.25MiB | :link: | | Mac | Mac OS log | | 7.0 days | 117,283 | 16.09MiB | :link: | |
| Android_v1 | Android framework log | | N.A. | 1,555,005 | 183.37MiB | :link: | | Android_v2 | Android framework log | | N.A. | 30,348,042 | 3.38GiB | :link: | | HealthApp | Health app log | | 10.5 days | 253,395 | 22.44MiB | :link: | |
| Apache | Apache web server error log | | 263.9 days | 56,481 | 4.90MiB | :link: |
| OpenSSH | OpenSSH server log | | 28.4 days | 655,146 | 70.02MiB | :link: | |
| Proxifier | Proxifier software log | | N.A. | 21,329 | 2.42MiB | :link: |
🔥 Citation
Please cite the following two papers if you use the loghub datasets in your research.
- Loghub: Jieming Zhu, Shilin He, Pinjia He, Jinyang Liu, Michael R. Lyu. Loghub: A Large Collection of System Log Datasets for AI-driven Log Analytics. IEEE International Symposium on Software Reliability Engineering (ISSRE), 2023.
- Loghub-2.0: Zhihan Jiang, Jinyang Liu, Junjie Huang, Yichen Li, Yintong Huo, Jiazhen Gu, Zhuangbin Chen, Jieming Zhu, Michael R. Lyu. A Large-scale Evaluation for Log Parsing Techniques: How Far are We?. ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), 2024.
🌈 License
The datasets are freely available for research or academic work. For any usage or distribution of the datasets, please refer to the loghub repository URL https://github.com/logpai/loghub and cite the loghub paper where applicable.
🙋 Discussion
Welcome to open a discussion here for any question and discussion.
Owner
- Name: LOGPAI
- Login: logpai
- Kind: organization
- Website: https://logpai.com
- Repositories: 17
- Profile: https://github.com/logpai
Log Analytics Powered by AI
Citation (CITATION)
@inproceedings{Loghub,
author = {Jieming Zhu and
Shilin He and
Pinjia He and
Jinyang Liu and
Michael R. Lyu},
title = {Loghub: {A} Large Collection of System Log Datasets for AI-driven
Log Analytics},
booktitle = {IEEE International Symposium on Software Reliability Engineering (ISSRE)},
year = {2023}
}
@inproceedings{Loghub2,
author = {Zhihan Jiang and
Jinyang Liu and
Junjie Huang and
Yichen Li and
Yintong Huo and
Jiazhen Gu and
Zhuangbin Chen and
Jieming Zhu and
Michael R. Lyu},
title = {A Large-scale Evaluation for Log Parsing Techniques: How Far are We?},
booktitle = {ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA)},
year = {2024}
}
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 41
- Total pull requests: 7
- Average time to close issues: 3 months
- Average time to close pull requests: 3 months
- Total issue authors: 36
- Total pull request authors: 6
- Average comments per issue: 1.32
- Average comments per pull request: 0.43
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 8
- Pull requests: 0
- Average time to close issues: 3 months
- Average time to close pull requests: N/A
- Issue authors: 7
- Pull request authors: 0
- Average comments per issue: 0.13
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- AgrawalAmey (2)
- Li1Neo (2)
- Peter-of-Astora (2)
- Wapiti08 (2)
- mmantyla (1)
- nedaAF (1)
- wildfire8966 (1)
- sarvesh-cloudaeye (1)
- arijit32 (1)
- Mesv (1)
- th31nitiate (1)
- wailoktam (1)
- WMwzy (1)
- EricWebsmith (1)
- LITONG99 (1)
Pull Request Authors
- WahomeKezia (2)
- aleff-github (1)
- tulsidas (1)
- sannour (1)
- usmansk1210 (1)
- UmayrAhmad (1)
