https://github.com/agrover112/fasttext-real-or-not-nlp-with-disaster-tweets

Kaggle real or not disaster tweets classification using FastText.

https://github.com/agrover112/fasttext-real-or-not-nlp-with-disaster-tweets

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.3%) to scientific vocabulary

Keywords

disaster-tweets fasttext kaggle kaggle-competition machine-learning python3 tweets
Last synced: 6 months ago · JSON representation

Repository

Kaggle real or not disaster tweets classification using FastText.

Basic Info
Statistics
  • Stars: 1
  • Watchers: 2
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Topics
disaster-tweets fasttext kaggle kaggle-competition machine-learning python3 tweets
Created almost 6 years ago · Last pushed almost 6 years ago
Metadata Files
Readme

README.md

fastText-Real-or-Not-NLP-with-Disaster-Tweets

Experimenting with fasttext on tweets. Predict which Tweets are about real disasters and which ones are not Trains on CPU , compatible with Linux cmd line for auto-tune-validation.

Hyperparameters of supervised model:

  • label_prefix : The librarry needs a prefix to be added to classification labels
  • lr: The learning rate...works well with default (0.1) lr.
  • neg : Number of negative samples 2<neg<6
  • epoch : 5,10,15
  • dim : 128,256 perform very well.
  • loss: softmax takes a bit longer ,hs hierarchial softmax is good too, ns not good score.
  • word_ngrams: 2=bigrams ,3=trigrams ,in this case limit it to 2 as per original paper + score_performance
  • ws: Size of context window ,here avg sentence length is not too large ,therefore we chose 3 based on experiments.
  • bucket: Hash length -**

# Note: Above task can be completed using TFIDF,SVM,BERT,GloVE,LSTM, etc but training time is really high and may require strong GPU. ## Essential Reads: - PythonDocs - Enriching Word Vectors with Subword Information - Bag of Tricks for Efficient Text Classification - Open-sourcing hyperparameter autotuning for fastText - English word vectors - Original Paper Presentation

Owner

  • Login: Agrover112
  • Kind: user

Humans trying to understand machines and people.

GitHub Events

Total
Last Year

Committers

Last synced: 8 months ago

All Time
  • Total Commits: 14
  • Total Committers: 1
  • Avg Commits per committer: 14.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Agrover112 4****2 14

Issues and Pull Requests

Last synced: 8 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels