superlog

https://github.com/j-york/superlog

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.2%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: J-York
License: apache-2.0
Language: Python
Default Branch: main
Size: 6.51 MB

Statistics

Stars: 6
Watchers: 2
Forks: 3
Open Issues: 0
Releases: 0

Created over 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme License Citation

SuperLog & NLPLog

This repository includes the training codes for SuperLog, as well as a specifically designed interpretable dataset for the log analysis domain, named NLPLog.

NLPLog Dataset

NLPLog is a question-answer dataset generated through interactions with ChatGPT. We utilized the publicly available log datasets provided by LogHub, posing questions to ChatGPT across five defined dimensions. This approach yielded conversational data rich in log-related domain information and high interpretability. NLPLog will serve as the dataset for the subsequent continuous pre-training of SuperLog.

NLPLog data can be get here.

Continual Pre-training For SuperLog

At this stage, we leverage the open-source LLM training framework, LLaMa-Factory, to continuously pre-train an open-source large model using the NLPLog dataset. In this step, we infuse interpretable knowledge, equipping the general-purpose LLM with domain-specific information related to log analysis.

Environment Configuration bash pip install -e ".[torch,metrics,deepspeed]"
Start Training bash llamafactory-cli train examples/train_full/train_superlog.yaml ## Instruction Following SFT For SuperLog

In this step, we fine-tune SuperLog using a selection of 1000 high-quality instruction data from Alpacar. The purpose of this step is to enable SuperLog to have a good ability to respond to human instructions, understand log analysis commands, and complete the corresponding tasks.

bash llamafactory-cli train examples/train_full/iftrain_superlog.yaml

Owner

Name: YuheJi
Login: J-York
Kind: user

Repositories: 1
Profile: https://github.com/J-York

Citation (CITATION.cff)

cff-version: 1.2.0
date-released: 2024-03
message: "If you use this software, please cite it as below."
authors:
- family-names: "Zheng"
  given-names: "Yaowei"
- family-names: "Zhang"
  given-names: "Richong"
- family-names: "Zhang"
  given-names: "Junhao"
- family-names: "Ye"
  given-names: "Yanhan"
- family-names: "Luo"
  given-names: "Zheyan"
- family-names: "Feng"
  given-names: "Zhangchi"
- family-names: "Ma"
  given-names: "Yongqiang"
title: "LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models"
url: "https://arxiv.org/abs/2403.13372"
preferred-citation:
  type: conference-paper
  conference:
    name: "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)"
  authors:
    - family-names: "Zheng"
      given-names: "Yaowei"
    - family-names: "Zhang"
      given-names: "Richong"
    - family-names: "Zhang"
      given-names: "Junhao"
    - family-names: "Ye"
      given-names: "Yanhan"
    - family-names: "Luo"
      given-names: "Zheyan"
    - family-names: "Feng"
      given-names: "Zhangchi"
    - family-names: "Ma"
      given-names: "Yongqiang"
  title: "LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models"
  url: "https://arxiv.org/abs/2403.13372"
  year: 2024
  publisher: "Association for Computational Linguistics"
  address: "Bangkok, Thailand"

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science