https://github.com/albertzhaoca/lite-detective

https://github.com/albertzhaoca/lite-detective

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.6%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: AlbertZhaoCA
  • Language: Python
  • Default Branch: main
  • Size: 4.59 MB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 1 year ago · Last pushed about 1 year ago

https://github.com/AlbertZhaoCA/lite-detective/blob/main/

# LiteDetective

A lightweight Chinese malicious comment detection pipeline.



---

## Features / 
- Model training and evaluation for toxic comment detection
- Data processing and policy generation
- Console entry points for quick data and policy generation

- 
- 
- 

---

## Quick Start / 

### 1. Install dependencies / 
```bash
pip install -r requirements.txt
```

### 2. Data Preparation / 
- Place your training and test data in the `data/` directory. See `data/` for format examples.
-  `data/` 

### 3. Train the model / 
```bash
python train.py
```

### 4. Test the model / 
```bash
python test.py
```

---

## Console Entrypoints / 

After installation (or in project root):


- Generate policy file / 
```bash
build-policy --path data/raw --policy_file data/policy.jsonl
```
- Generate training data / 
```bash
build-train-data --policy_file data/policy.jsonl --output_file data/training_data.jsonl
```

---

## Installation (with setup.py) /  setup.py

You can install LiteDetective as a Python package, which will also enable the command line tools:

 setup.py  Python 

```bash
pip install -e .
```

After installation, you can use the following commands anywhere:


- `build-policy`    Generate policy file / 
- `build-train-data`    Generate training data / 

---

## Project Structure / 
- `train.py`, `test.py`: Model training/testing
- `libs/`: Data processing, policy, LLM SDK, etc.
- `models/`: Model definitions
- `data/`: Datasets and generated files

- `train.py`, `test.py`
- `libs/`LLM SDK 
- `models/`
- `data/`

---

## License / 
MIT License

---

##  predict  / How to use the predict function

### 1.  (download the model weight file)
 [HuggingFace: Albert-CAC/lite_DETECTIVE](https://huggingface.co/Albert-CAC/lite_DETECTIVE/tree/main)  `lited_best.pth`  `./hf_ckpt/`

### 2.  (load the model)
```python
import torch
from models.classifier import ToxicTextClassifier

model = ToxicTextClassifier()
state_dict = torch.load('hf_ckpt/lited_best.pth', map_location='cpu')
model.load_state_dict(state_dict)
model.eval()
```

### 3.  predict  (how to inference)
####  (without context)
```python
result = model.predict('', device='cpu')
print(result)
```
####  (batch without context)
```python
texts = ['', '']
results = model.predict(texts, device='cpu')
print(results)
```
####  (batch with context)
```python
texts_with_context = [['', ''], ['', '']]
results = model.predict(texts_with_context, device='cpu')
print(results)
```

### 4.  (Output format)

- `text`: 
- `prediction`: 0=1=
- `probabilities`: 

For more detailed information and data visualization, please refer to our paper (see assets/paper.pdf).
 assets/paper.pdf

---

##  / Online Demo

 HuggingFace Spaces 
[https://huggingface.co/spaces/Albert-CAC/lite_DETECTIVE](https://huggingface.co/spaces/Albert-CAC/lite_DETECTIVE)

You can try the official online demo here:
[https://huggingface.co/spaces/Albert-CAC/lite_DETECTIVE](https://huggingface.co/spaces/Albert-CAC/lite_DETECTIVE)

 Demo

1. ** - 96.63% (Positive)**
- Translation: I like you - 96.63% (Positive)
- Correctly identified as expressing positive sentiment.

2. ** - 97.82% (Negative)**
- Translation: Your mother is dead - 97.82% (Negative)
- Correctly identified as expressing negative sentiment.

3. ** - 91.56% (Negative)**
- Translation: Your is gone - 91.56% (Negative)
- Correctly identified as expressing negative sentiment. This might be unfamiliar to non-Chinese speakers, but it's a way of indirectly expressing a harsh sentiment.

4. ** - 98.68% (Negative)**
- Translation: You idiot - 98.68% (Negative)
- Correctly identified as expressing negative sentiment.

5. ** - 97.89% (Positive)**
- Translation: The weather is great today - 97.89% (Positive)
- Correctly identified as expressing positive sentiment.

6. ** - 55.82% (Positive)**
- Translation: You're great - 55.82% (Positive)
- Correctly identified as expressing positive sentiment, although the confidence level is relatively low.

 Demo  Demo

Owner

  • Name: Albert Zhao
  • Login: AlbertZhaoCA
  • Kind: user

A CS sophomore at Kean University Interest in HCI, NLP, and software architecture

GitHub Events

Total
  • Watch event: 1
  • Member event: 1
  • Push event: 11
  • Fork event: 1
  • Create event: 2
Last Year
  • Watch event: 1
  • Member event: 1
  • Push event: 11
  • Fork event: 1
  • Create event: 2