https://github.com/albertzhaoca/lite-detective
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.6%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: AlbertZhaoCA
- Language: Python
- Default Branch: main
- Size: 4.59 MB
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
https://github.com/AlbertZhaoCA/lite-detective/blob/main/
# LiteDetective
A lightweight Chinese malicious comment detection pipeline.
---
## Features /
- Model training and evaluation for toxic comment detection
- Data processing and policy generation
- Console entry points for quick data and policy generation
-
-
-
---
## Quick Start /
### 1. Install dependencies /
```bash
pip install -r requirements.txt
```
### 2. Data Preparation /
- Place your training and test data in the `data/` directory. See `data/` for format examples.
- `data/`
### 3. Train the model /
```bash
python train.py
```
### 4. Test the model /
```bash
python test.py
```
---
## Console Entrypoints /
After installation (or in project root):
- Generate policy file /
```bash
build-policy --path data/raw --policy_file data/policy.jsonl
```
- Generate training data /
```bash
build-train-data --policy_file data/policy.jsonl --output_file data/training_data.jsonl
```
---
## Installation (with setup.py) / setup.py
You can install LiteDetective as a Python package, which will also enable the command line tools:
setup.py Python
```bash
pip install -e .
```
After installation, you can use the following commands anywhere:
- `build-policy` Generate policy file /
- `build-train-data` Generate training data /
---
## Project Structure /
- `train.py`, `test.py`: Model training/testing
- `libs/`: Data processing, policy, LLM SDK, etc.
- `models/`: Model definitions
- `data/`: Datasets and generated files
- `train.py`, `test.py`
- `libs/`LLM SDK
- `models/`
- `data/`
---
## License /
MIT License
---
## predict / How to use the predict function
### 1. (download the model weight file)
[HuggingFace: Albert-CAC/lite_DETECTIVE](https://huggingface.co/Albert-CAC/lite_DETECTIVE/tree/main) `lited_best.pth` `./hf_ckpt/`
### 2. (load the model)
```python
import torch
from models.classifier import ToxicTextClassifier
model = ToxicTextClassifier()
state_dict = torch.load('hf_ckpt/lited_best.pth', map_location='cpu')
model.load_state_dict(state_dict)
model.eval()
```
### 3. predict (how to inference)
#### (without context)
```python
result = model.predict('', device='cpu')
print(result)
```
#### (batch without context)
```python
texts = ['', '']
results = model.predict(texts, device='cpu')
print(results)
```
#### (batch with context)
```python
texts_with_context = [['', ''], ['', '']]
results = model.predict(texts_with_context, device='cpu')
print(results)
```
### 4. (Output format)
- `text`:
- `prediction`: 0=1=
- `probabilities`:
For more detailed information and data visualization, please refer to our paper (see assets/paper.pdf).
assets/paper.pdf
---
## / Online Demo
HuggingFace Spaces
[https://huggingface.co/spaces/Albert-CAC/lite_DETECTIVE](https://huggingface.co/spaces/Albert-CAC/lite_DETECTIVE)
You can try the official online demo here:
[https://huggingface.co/spaces/Albert-CAC/lite_DETECTIVE](https://huggingface.co/spaces/Albert-CAC/lite_DETECTIVE)
1. ** - 96.63% (Positive)**
- Translation: I like you - 96.63% (Positive)
- Correctly identified as expressing positive sentiment.
2. ** - 97.82% (Negative)**
- Translation: Your mother is dead - 97.82% (Negative)
- Correctly identified as expressing negative sentiment.
3. ** - 91.56% (Negative)**
- Translation: Your is gone - 91.56% (Negative)
- Correctly identified as expressing negative sentiment. This might be unfamiliar to non-Chinese speakers, but it's a way of indirectly expressing a harsh sentiment.
4. ** - 98.68% (Negative)**
- Translation: You idiot - 98.68% (Negative)
- Correctly identified as expressing negative sentiment.
5. ** - 97.89% (Positive)**
- Translation: The weather is great today - 97.89% (Positive)
- Correctly identified as expressing positive sentiment.
6. ** - 55.82% (Positive)**
- Translation: You're great - 55.82% (Positive)
- Correctly identified as expressing positive sentiment, although the confidence level is relatively low.
Owner
- Name: Albert Zhao
- Login: AlbertZhaoCA
- Kind: user
- Repositories: 1
- Profile: https://github.com/AlbertZhaoCA
A CS sophomore at Kean University Interest in HCI, NLP, and software architecture
GitHub Events
Total
- Watch event: 1
- Member event: 1
- Push event: 11
- Fork event: 1
- Create event: 2
Last Year
- Watch event: 1
- Member event: 1
- Push event: 11
- Fork event: 1
- Create event: 2