https://github.com/albertzhaoca/lite-detective

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.6%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: AlbertZhaoCA
Language: Python
Default Branch: main
Size: 4.59 MB

Statistics

Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created about 1 year ago · Last pushed about 1 year ago

https://github.com/AlbertZhaoCA/lite-detective/blob/main/

# LiteDetective

A lightweight Chinese malicious comment detection pipeline.



---

## Features / 
- Model training and evaluation for toxic comment detection
- Data processing and policy generation
- Console entry points for quick data and policy generation

- 
- 
- 

---

## Quick Start / 

### 1. Install dependencies / 
```bash
pip install -r requirements.txt
```

### 2. Data Preparation / 
- Place your training and test data in the `data/` directory. See `data/` for format examples.
-  `data/` 

### 3. Train the model / 
```bash
python train.py
```

### 4. Test the model / 
```bash
python test.py
```

---

## Console Entrypoints / 

After installation (or in project root):


- Generate policy file / 
```bash
build-policy --path data/raw --policy_file data/policy.jsonl
```
- Generate training data / 
```bash
build-train-data --policy_file data/policy.jsonl --output_file data/training_data.jsonl
```

---

## Installation (with setup.py) /  setup.py

You can install LiteDetective as a Python package, which will also enable the command line tools:

 setup.py  Python 

```bash
pip install -e .
```

After installation, you can use the following commands anywhere:


- `build-policy`    Generate policy file / 
- `build-train-data`    Generate training data / 

---

## Project Structure / 
- `train.py`, `test.py`: Model training/testing
- `libs/`: Data processing, policy, LLM SDK, etc.
- `models/`: Model definitions
- `data/`: Datasets and generated files

- `train.py`, `test.py`
- `libs/`LLM SDK 
- `models/`
- `data/`

---

## License / 
MIT License

---

##  predict  / How to use the predict function

### 1.  (download the model weight file)
 [HuggingFace: Albert-CAC/lite_DETECTIVE](https://huggingface.co/Albert-CAC/lite_DETECTIVE/tree/main)  `lited_best.pth`  `./hf_ckpt/`

### 2.  (load the model)
```python
import torch
from models.classifier import ToxicTextClassifier

model = ToxicTextClassifier()
state_dict = torch.load('hf_ckpt/lited_best.pth', map_location='cpu')
model.load_state_dict(state_dict)
model.eval()
```

### 3.  predict  (how to inference)
####  (without context)
```python
result = model.predict('', device='cpu')
print(result)
```
####  (batch without context)
```python
texts = ['', '']
results = model.predict(texts, device='cpu')
print(results)
```
####  (batch with context)
```python
texts_with_context = [['', ''], ['', '']]
results = model.predict(texts_with_context, device='cpu')
print(results)
```

### 4.  (Output format)

- `text`: 
- `prediction`: 0=1=
- `probabilities`: 

For more detailed information and data visualization, please refer to our paper (see assets/paper.pdf).
 assets/paper.pdf

---

##  / Online Demo

 HuggingFace Spaces 
[https://huggingface.co/spaces/Albert-CAC/lite_DETECTIVE](https://huggingface.co/spaces/Albert-CAC/lite_DETECTIVE)

You can try the official online demo here:
[https://huggingface.co/spaces/Albert-CAC/lite_DETECTIVE](https://huggingface.co/spaces/Albert-CAC/lite_DETECTIVE)


  
1. ** - 96.63% (Positive)**

   - Translation: I like you - 96.63% (Positive)

   - Correctly identified as expressing positive sentiment.

2. ** - 97.82% (Negative)**

   - Translation: Your mother is dead - 97.82% (Negative)

   - Correctly identified as expressing negative sentiment.

3. ** - 91.56% (Negative)**

   - Translation: Your  is gone - 91.56% (Negative)

   - Correctly identified as expressing negative sentiment. This might be unfamiliar to non-Chinese speakers, but it's a way of indirectly expressing a harsh sentiment.

4. ** - 98.68% (Negative)**

   - Translation: You idiot - 98.68% (Negative)

   - Correctly identified as expressing negative sentiment.

5. ** - 97.89% (Positive)**

   - Translation: The weather is great today - 97.89% (Positive)

   - Correctly identified as expressing positive sentiment.

6. ** - 55.82% (Positive)**

   - Translation: You're great - 55.82% (Positive)

   - Correctly identified as expressing positive sentiment, although the confidence level is relatively low.

Owner

Name: Albert Zhao
Login: AlbertZhaoCA
Kind: user

Repositories: 1
Profile: https://github.com/AlbertZhaoCA

A CS sophomore at Kean University Interest in HCI, NLP, and software architecture

GitHub Events

Total

Watch event: 1
Member event: 1
Push event: 11
Fork event: 1
Create event: 2

Last Year

Watch event: 1
Member event: 1
Push event: 11
Fork event: 1
Create event: 2

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science