https://github.com/cstcloudops/selflog
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.8%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: CSTCloudOps
- Language: Python
- Default Branch: main
- Size: 1.52 MB
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Self-Evolutionary Group-wise Log Parsing Based on Large Language Model
In this paper we propose self-evolving method called SelfLog,which, on one hand, uses similar 
Repository Organization
├── evaluate/ #
│ ├── evaluator/ # the evaluation code of GA, PA, PTA, RTA
│ └── evaluator_PA/ # calculate PA, PTA, RTA result
├── functions/ # mian part of SelfLog
│ ├── benchmark_settings/ # log data process
│ ├── gram/ # N-gram based grouper
│ ├── llm_func/ # requst llm
│ └── tree_based_merge/ # the postprocess of SelfLog
├── logs/
│ └── ...... # parsing log files
├── online_selfLog/ # online version of SelfLog
│ ├── is_new_log # log hitter
│ ├── log_pruduce # streaming log production
│ └── online_run # test the efficient of SelfLog
├── PSQL/ # Prompt database recall method based on PostgreSQL
│ ├── model # the embedding model of SelfLog
│ ├── conConfig # connect psql setting
│ ├── exampleToPSQL # algorithm startup candidate set written to psql
│ └── findTopKexam # recall examples
├── CONSTANT # hyperparameter configuration items
├── llmAPIsetting # llm address url and key
├── prompt # llm prompt format
├── run.py # test the effect of SelfLog on the dataset
└── README.md
Quick start
Preparation
Environment Installation
- Prompt Database
We use psql with the vector plugin to implement a method for retrieving and recalling related logs based on semantic similarity. You can also use other databases for your purposes.
> 1. Install PostgresSQL
> 2. Creat table
such as CREATE TABLE IF NOT EXISTS public.log_template ( "ID" integer NOT NULL DEFAULT nextval('id_seq'::regclass), log text COLLATE pg_catalog."default", template text COLLATE pg_catalog."default", "logVector" vector, CONSTRAINT seflog_pkey PRIMARY KEY ("ID") ); - Python > 1. Install python >= 3.8 > 2. pip install -r requirements.txt
Set settings
- LLM API > 1. API-key > 2. model url
- Candidates to prompt database > 1. cd PSQL > 2. python exampleToPSQL.py
- Effect evaluation > 1. python run.py
The analysis results will be stored in the log directory. * Efficiency evaluation
- cd online_selfLog
- download full dataset
- python log_pruduce.py
Owner
- Name: CSTCloud Lab
- Login: CSTCloudOps
- Kind: organization
- Location: China
- Website: https://www.cstcloud.cn
- Repositories: 20
- Profile: https://github.com/CSTCloudOps
GitHub Events
Total
- Issues event: 2
- Watch event: 2
- Fork event: 1
Last Year
- Issues event: 2
- Watch event: 2
- Fork event: 1
Dependencies
- Levenshtein *
- anytree *
- fuzzywuzzy *
- nltk *
- numpy *
- openai *
- pandas *
- scikit-learn *
- scipy *
- sshtunnel *
- tqdm *
- wordfreq *