superlog
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.2%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: J-York
- License: apache-2.0
- Language: Python
- Default Branch: main
- Size: 6.51 MB
Statistics
- Stars: 6
- Watchers: 2
- Forks: 3
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
SuperLog & NLPLog
This repository includes the training codes for SuperLog, as well as a specifically designed interpretable dataset for the log analysis domain, named NLPLog.
NLPLog Dataset
NLPLog is a question-answer dataset generated through interactions with ChatGPT. We utilized the publicly available log datasets provided by LogHub, posing questions to ChatGPT across five defined dimensions. This approach yielded conversational data rich in log-related domain information and high interpretability. NLPLog will serve as the dataset for the subsequent continuous pre-training of SuperLog.
NLPLog data can be get here.
Continual Pre-training For SuperLog
At this stage, we leverage the open-source LLM training framework, LLaMa-Factory, to continuously pre-train an open-source large model using the NLPLog dataset. In this step, we infuse interpretable knowledge, equipping the general-purpose LLM with domain-specific information related to log analysis.
- Environment Configuration
bash pip install -e ".[torch,metrics,deepspeed]" - Start Training
bash llamafactory-cli train examples/train_full/train_superlog.yaml## Instruction Following SFT For SuperLog
In this step, we fine-tune SuperLog using a selection of 1000 high-quality instruction data from Alpacar. The purpose of this step is to enable SuperLog to have a good ability to respond to human instructions, understand log analysis commands, and complete the corresponding tasks.
bash
llamafactory-cli train examples/train_full/iftrain_superlog.yaml
Owner
- Name: YuheJi
- Login: J-York
- Kind: user
- Repositories: 1
- Profile: https://github.com/J-York
Citation (CITATION.cff)
cff-version: 1.2.0
date-released: 2024-03
message: "If you use this software, please cite it as below."
authors:
- family-names: "Zheng"
given-names: "Yaowei"
- family-names: "Zhang"
given-names: "Richong"
- family-names: "Zhang"
given-names: "Junhao"
- family-names: "Ye"
given-names: "Yanhan"
- family-names: "Luo"
given-names: "Zheyan"
- family-names: "Feng"
given-names: "Zhangchi"
- family-names: "Ma"
given-names: "Yongqiang"
title: "LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models"
url: "https://arxiv.org/abs/2403.13372"
preferred-citation:
type: conference-paper
conference:
name: "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)"
authors:
- family-names: "Zheng"
given-names: "Yaowei"
- family-names: "Zhang"
given-names: "Richong"
- family-names: "Zhang"
given-names: "Junhao"
- family-names: "Ye"
given-names: "Yanhan"
- family-names: "Luo"
given-names: "Zheyan"
- family-names: "Feng"
given-names: "Zhangchi"
- family-names: "Ma"
given-names: "Yongqiang"
title: "LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models"
url: "https://arxiv.org/abs/2403.13372"
year: 2024
publisher: "Association for Computational Linguistics"
address: "Bangkok, Thailand"
GitHub Events
Total
- Watch event: 7
- Push event: 24
- Fork event: 5
Last Year
- Watch event: 7
- Push event: 24
- Fork event: 5