loghub

A large collection of system log datasets for AI-driven log analytics [ISSRE'23]

https://github.com/logpai/loghub

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.9%) to scientific vocabulary

Keywords

anomaly-detection datasets log-analysis log-intelligence log-parsing logs unstructured-logs

Keywords from Contributors

log-mining log-parser
Last synced: 6 months ago · JSON representation ·

Repository

A large collection of system log datasets for AI-driven log analytics [ISSRE'23]

Basic Info
  • Host: GitHub
  • Owner: logpai
  • License: other
  • Default Branch: master
  • Homepage:
  • Size: 7 MB
Statistics
  • Stars: 2,287
  • Watchers: 57
  • Forks: 696
  • Open Issues: 1
  • Releases: 0
Topics
anomaly-detection datasets log-analysis log-intelligence log-parsing logs unstructured-logs
Created over 9 years ago · Last pushed 6 months ago
Metadata Files
Readme License Citation

README.md

Loghub

Loghub maintains a collection of system logs, which are freely accessible for AI-driven log analytics research. Some of the logs are production data released from previous studies, while some others are collected from real systems in our lab environment. Wherever possible, the logs are NOT sanitized, anonymized or modified in any way. These log datasets are freely available for research or academic work.

🤗 We proudly announce that the loghub datasets have attained total by more than 450 organizations from both industry and academia.

Logs currently available

🔗 Get raw logs via hyperlinks in the Download column.

| Dataset | Description | Labeled | Time Span | #Lines | Raw Size | Download | | :---------------------------- | :--------| :--------: | --------: | ---------: | ------: | :------: | |:openfilefolder: Distributed systems| | HDFS_v1 | Hadoop distributed file system log | :heavycheckmark: | 38.7 hours | 11,175,629 | 1.47GiB | :link: |
| HDFS_v2 | Hadoop distributed file system log| | N.A. | 71,118,073 | 16.06GiB | :link: | | HDFS_v3 | Instrumented HDFS trace log (TraceBench) | :heavycheckmark: | N.A. | 14,778,079 | 2.96GiB | :link: | | Hadoop | Hadoop mapreduce job log | :heavycheckmark: (Check #56) | N.A. | 394,308 | 48.61MiB | :link: | | Spark | Spark job log || N.A. | 33,236,604 | 2.75GiB | :link: |
| Zookeeper | ZooKeeper service log | | 26.7 days | 74,380 | 9.95MiB | :link: | | OpenStack | OpenStack infrastructure log | :heavycheckmark: | N.A. | 207,820 | 58.61MiB | :link: |
|:openfilefolder: Super computers| | BGL | Blue Gene/L supercomputer log | :heavycheckmark: | 214.7 days | 4,747,963 | 708.76MiB | :link: | | HPC | High performance cluster log | | N.A. | 433,489 | 32.00MiB | :link: |
| Thunderbird | Thunderbird supercomputer log | :heavycheckmark: | 244 days | 211,212,192 | 29.60GiB | :link: | |:openfilefolder: Operating systems|
| Windows | Windows event log | | 226.7 days | 114,608,388 | 26.09GiB | :link: |
| Linux | Linux system log | | 263.9 days | 25,567 | 2.25MiB | :link: | | Mac | Mac OS log | | 7.0 days | 117,283 | 16.09MiB | :link: | |:openfilefolder: Mobile systems|
| Android_v1 | Android framework log | | N.A. | 1,555,005 | 183.37MiB | :link: | | Android_v2 | Android framework log | | N.A. | 30,348,042 | 3.38GiB | :link: | | HealthApp | Health app log | | 10.5 days | 253,395 | 22.44MiB | :link: | |:openfilefolder: Server applications|
| Apache | Apache web server error log | | 263.9 days | 56,481 | 4.90MiB | :link: |
| OpenSSH | OpenSSH server log | | 28.4 days | 655,146 | 70.02MiB | :link: | |:openfilefolder: Standalone software|
| Proxifier | Proxifier software log | | N.A. | 21,329 | 2.42MiB | :link: |

🔥 Citation

Please cite the following two papers if you use the loghub datasets in your research.

🌈 License

The datasets are freely available for research or academic work. For any usage or distribution of the datasets, please refer to the loghub repository URL https://github.com/logpai/loghub and cite the loghub paper where applicable.

🙋 Discussion

Welcome to open a discussion here for any question and discussion.

Owner

  • Name: LOGPAI
  • Login: logpai
  • Kind: organization

Log Analytics Powered by AI

Citation (CITATION)

@inproceedings{Loghub,
  author       = {Jieming Zhu and
                  Shilin He and
                  Pinjia He and
                  Jinyang Liu and
                  Michael R. Lyu},
  title        = {Loghub: {A} Large Collection of System Log Datasets for AI-driven 
                  Log Analytics},
  booktitle    = {IEEE International Symposium on Software Reliability Engineering (ISSRE)},
  year         = {2023}
}

@inproceedings{Loghub2,
  author       = {Zhihan Jiang and 
                  Jinyang Liu and 
                  Junjie Huang and 
                  Yichen Li and 
                  Yintong Huo and 
                  Jiazhen Gu and 
                  Zhuangbin Chen and 
                  Jieming Zhu and
                  Michael R. Lyu},
  title        = {A Large-scale Evaluation for Log Parsing Techniques: How Far are We?},
  booktitle    = {ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA)},
  year         = {2024}
}

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 96
  • Total Committers: 4
  • Avg Commits per committer: 24.0
  • Development Distribution Score (DDS): 0.49
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Jamie Zhu j****u 49
Jamie Zhu z****e@g****m 33
Shilin HE s****l@g****m 12
Pinjia He p****e@g****m 2

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 41
  • Total pull requests: 7
  • Average time to close issues: 3 months
  • Average time to close pull requests: 3 months
  • Total issue authors: 36
  • Total pull request authors: 6
  • Average comments per issue: 1.32
  • Average comments per pull request: 0.43
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 8
  • Pull requests: 0
  • Average time to close issues: 3 months
  • Average time to close pull requests: N/A
  • Issue authors: 7
  • Pull request authors: 0
  • Average comments per issue: 0.13
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • AgrawalAmey (2)
  • Li1Neo (2)
  • Peter-of-Astora (2)
  • Wapiti08 (2)
  • mmantyla (1)
  • nedaAF (1)
  • wildfire8966 (1)
  • sarvesh-cloudaeye (1)
  • arijit32 (1)
  • Mesv (1)
  • th31nitiate (1)
  • wailoktam (1)
  • WMwzy (1)
  • EricWebsmith (1)
  • LITONG99 (1)
Pull Request Authors
  • WahomeKezia (2)
  • aleff-github (1)
  • tulsidas (1)
  • sannour (1)
  • usmansk1210 (1)
  • UmayrAhmad (1)
Top Labels
Issue Labels
Pull Request Labels