https://github.com/agrover112/fasttext-real-or-not-nlp-with-disaster-tweets
Kaggle real or not disaster tweets classification using FastText.
https://github.com/agrover112/fasttext-real-or-not-nlp-with-disaster-tweets
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (7.3%) to scientific vocabulary
Keywords
Repository
Kaggle real or not disaster tweets classification using FastText.
Basic Info
- Host: GitHub
- Owner: Agrover112
- Language: Jupyter Notebook
- Default Branch: master
- Homepage: https://www.kaggle.com/c/nlp-getting-started
- Size: 497 KB
Statistics
- Stars: 1
- Watchers: 2
- Forks: 1
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
fastText-Real-or-Not-NLP-with-Disaster-Tweets
Experimenting with fasttext on tweets. Predict which Tweets are about real disasters and which ones are not Trains on CPU , compatible with Linux cmd line for auto-tune-validation.
Hyperparameters of supervised model:
- label_prefix : The librarry needs a prefix to be added to classification labels
- lr: The learning rate...works well with default (0.1) lr.
- neg : Number of negative samples 2<neg<6
- epoch : 5,10,15
- dim : 128,256 perform very well.
- loss: softmax takes a bit longer ,hs hierarchial softmax is good too, ns not good score.
- word_ngrams: 2=bigrams ,3=trigrams ,in this case limit it to 2 as per original paper + score_performance
- ws: Size of context window ,here avg sentence length is not too large ,therefore we chose 3 based on experiments.
- bucket: Hash length -**
# Note: Above task can be completed using TFIDF,SVM,BERT,GloVE,LSTM, etc but training time is really high and may require strong GPU. ## Essential Reads: - PythonDocs - Enriching Word Vectors with Subword Information - Bag of Tricks for Efficient Text Classification - Open-sourcing hyperparameter autotuning for fastText - English word vectors - Original Paper Presentation
Owner
- Login: Agrover112
- Kind: user
- Repositories: 113
- Profile: https://github.com/Agrover112
Humans trying to understand machines and people.
GitHub Events
Total
Last Year
Issues and Pull Requests
Last synced: 8 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0