https://github.com/cstcloudops/selflog

https://github.com/cstcloudops/selflog

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.8%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: CSTCloudOps
  • Language: Python
  • Default Branch: main
  • Size: 1.52 MB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created almost 2 years ago · Last pushed almost 2 years ago
Metadata Files
Readme

README.md

Self-Evolutionary Group-wise Log Parsing Based on Large Language Model

In this paper we propose self-evolving method called SelfLog,which, on one hand, uses similar pairs extracted by LLM itself in the historical data to act as the prompt of a new log, allowing the model to learn in a self-evolution and labeling-free way. On the other hand, we propose an N-Gram-based grouper and log hitter.

Repository Organization

├── evaluate/ # │ ├── evaluator/ # the evaluation code of GA, PA, PTA, RTA │ └── evaluator_PA/ # calculate PA, PTA, RTA result ├── functions/ # mian part of SelfLog │ ├── benchmark_settings/ # log data process │ ├── gram/ # N-gram based grouper │ ├── llm_func/ # requst llm │ └── tree_based_merge/ # the postprocess of SelfLog ├── logs/ │ └── ...... # parsing log files ├── online_selfLog/ # online version of SelfLog │ ├── is_new_log # log hitter │ ├── log_pruduce # streaming log production │ └── online_run # test the efficient of SelfLog ├── PSQL/ # Prompt database recall method based on PostgreSQL │ ├── model # the embedding model of SelfLog │ ├── conConfig # connect psql setting │ ├── exampleToPSQL # algorithm startup candidate set written to psql │ └── findTopKexam # recall examples ├── CONSTANT # hyperparameter configuration items ├── llmAPIsetting # llm address url and key ├── prompt # llm prompt format ├── run.py # test the effect of SelfLog on the dataset └── README.md

Quick start

Preparation

Environment Installation

  • Prompt Database We use psql with the vector plugin to implement a method for retrieving and recalling related logs based on semantic similarity. You can also use other databases for your purposes. > 1. Install PostgresSQL > 2. Creat table such as CREATE TABLE IF NOT EXISTS public.log_template ( "ID" integer NOT NULL DEFAULT nextval('id_seq'::regclass), log text COLLATE pg_catalog."default", template text COLLATE pg_catalog."default", "logVector" vector, CONSTRAINT seflog_pkey PRIMARY KEY ("ID") );
  • Python > 1. Install python >= 3.8 > 2. pip install -r requirements.txt

Set settings

  • LLM API > 1. API-key > 2. model url
  • Candidates to prompt database > 1. cd PSQL > 2. python exampleToPSQL.py
  • Effect evaluation > 1. python run.py

The analysis results will be stored in the log directory. * Efficiency evaluation

  1. cd online_selfLog
  2. download full dataset
  3. python log_pruduce.py

Owner

  • Name: CSTCloud Lab
  • Login: CSTCloudOps
  • Kind: organization
  • Location: China

GitHub Events

Total
  • Issues event: 2
  • Watch event: 2
  • Fork event: 1
Last Year
  • Issues event: 2
  • Watch event: 2
  • Fork event: 1

Dependencies

requirements.txt pypi
  • Levenshtein *
  • anytree *
  • fuzzywuzzy *
  • nltk *
  • numpy *
  • openai *
  • pandas *
  • scikit-learn *
  • scipy *
  • sshtunnel *
  • tqdm *
  • wordfreq *