bert-sms-classification

My second project in Natural Language Processing (NLP), where I fine-tuned a bert-base-uncased model to classify spam SMS. This is huge improvements from https://github.com/fzn0x/bert-indonesian-english-hate-comments.

https://github.com/fzn0x/bert-sms-classification

Scientific Fields

Artificial Intelligence and Machine Learning Computer Science - 38% confidence

Last synced: 4 months ago · JSON representation ·

Repository

My second project in Natural Language Processing (NLP), where I fine-tuned a bert-base-uncased model to classify spam SMS. This is huge improvements from https://github.com/fzn0x/bert-indonesian-english-hate-comments.

Basic Info

Host: GitHub
Owner: fzn0x
License: mit
Language: Python
Default Branch: main
Homepage: https://open.spotify.com/track/6s8WSX1MxNThrot8ThI6fG?si=ee460386b3e54552
Size: 208 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created 9 months ago · Last pushed 9 months ago

Metadata Files

Readme License Citation

README.md

Fine-tuned BERT-base-uncased pre-trained model to classify spam SMS.

My second project in Natural Language Processing (NLP), where I fine-tuned a bert-base-uncased model to classify spam SMS. This is huge improvements from https://github.com/fzn0x/bert-indonesian-english-hate-comments.

How to use this model?

```py from transformers import BertTokenizer, BertForSequenceClassification import torch

tokenizer = BertTokenizer.frompretrained('fzn0x/bert-spam-classification-model') model = BertForSequenceClassification.frompretrained('fzn0x/bert-spam-classification-model') ```

Check scripts/predict.py for full example (You just need to modify the argument of from_pretrained).

✅ Install requirements

Install required dependencies

sh pip install --upgrade pip pip install -r requirements.txt

✅ Add BERT virtual env

write the command below

```sh

✅ Create and activate a virtual environment

python -m venv bert-env source bert-env/bin/activate # On Windows use: bert-env\Scripts\activate ```

✅ Install CUDA

Check if your GPU supports CUDA:

sh nvidia-smi

Then:

sh pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:False

🔧 How to use

Check your device and CUDA availability:

sh python check_device.py

:warning: Using CPU is not advisable, prefer check your CUDA availability.

Train the model:

sh python scripts/train.py

:warning: Remove unneeded checkpoint in models/pretrained to save your storage after training

Run prediction:

sh python scripts/predict.py

✅ Dataset Location: data/spam.csv, modify the dataset to enhance the model based on your needs.

📚 Citations

If you use this repository or its ideas, please cite the following:

See citations.bib for full BibTeX entries.

Wolf et al., Transformers: State-of-the-Art Natural Language Processing, EMNLP 2020. ACL Anthology
Pedregosa et al., Scikit-learn: Machine Learning in Python, JMLR 2011.
Almeida & Gómez Hidalgo, SMS Spam Collection v.1, UCI Machine Learning Repository (2011). Kaggle Link

🧠 Credits and Libraries Used

Hugging Face Transformers – model, tokenizer, and training utilities
scikit-learn – metrics and preprocessing
Logging silencing inspired by Hugging Face GitHub discussions
Dataset from UCI SMS Spam Collection
Inspiration from Kaggle Notebook by Suyash Khare

License and Usage

License under MIT license.

Leave a ⭐ if you think this project is helpful, contributions are welcome.

Owner

Name: Fauzan
Login: fzn0x
Kind: user

Repositories: 29
Profile: https://github.com/fzn0x

Citation (citations.bib)

@inproceedings{wolf-etal-2020-transformers,
  title     = {Transformers: State-of-the-Art Natural Language Processing},
  author    = {Wolf, Thomas and Debut, Lysandre and Sanh, Victor and Chaumond, Julien and Delangue, Clement and Moi, Anthony and Cistac, Pierric and Rault, Tim and Louf, Remi and Funtowicz, Morgan and Brew, Jamie},
  booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations},
  month     = oct,
  year      = {2020},
  publisher = {Association for Computational Linguistics},
  pages     = {38--45},
  url       = {https://www.aclweb.org/anthology/2020.emnlp-demos.6}
}

@article{scikit-learn,
  title   = {Scikit-learn: Machine Learning in Python},
  author  = {Pedregosa, Fabian and Varoquaux, Ga{\"e}l and Gramfort, Alexandre and Michel, Vincent and Thirion, Bertrand and Grisel, Olivier and Blondel, Mathieu and Prettenhofer, Peter and Weiss, Ron and Dubourg, Vincent and Vanderplas, Jake and Passos, Alexandre and Cournapeau, David and Brucher, Matthieu and Perrot, Matthieu and Duchesnay, {\'E}douard},
  journal = {Journal of Machine Learning Research},
  volume  = {12},
  pages   = {2825--2830},
  year    = {2011}
}

@misc{smsspamcollection,
  author       = {Tiago A. Almeida and José María Gómez Hidalgo},
  title        = {SMS Spam Collection v.1},
  year         = {2011},
  howpublished = {\url{https://www.kaggle.com/datasets/uciml/sms-spam-collection-dataset}},
  note         = {UCI Machine Learning Repository}
}

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

bert-sms-classification

Science Score: 31.0%

Scientific Fields

Repository

Basic Info

Statistics

Metadata Files

README.md

Fine-tuned BERT-base-uncased pre-trained model to classify spam SMS.

How to use this model?

✅ Install requirements

✅ Add BERT virtual env

✅ Create and activate a virtual environment

✅ Install CUDA

🔧 How to use

📚 Citations

🧠 Credits and Libraries Used

License and Usage

Owner

Citation (citations.bib)

GitHub Events

Total

Last Year

Dependencies