https://github.com/chapzq77/textclassification

Text classification using different neural networks (CNN, LSTM, Bi-LSTM, C-LSTM).

https://github.com/chapzq77/textclassification

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.8%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Text classification using different neural networks (CNN, LSTM, Bi-LSTM, C-LSTM).

Basic Info
  • Host: GitHub
  • Owner: chapzq77
  • License: mit
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 223 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Fork of zackhy/TextClassification
Created over 7 years ago · Last pushed almost 8 years ago

https://github.com/chapzq77/TextClassification/blob/master/

# Multi-class Text Classification
Implement four neural networks in Tensorflow for multi-class text classification problem.
## Models
* A LSTM classifier. See rnn_classifier.py
* A Bidirectional LSTM classifier. See rnn_classifier.py
* A CNN classifier. See cnn_classifier.py. Reference: [Implementing a CNN for Text Classification in Tensorflow](http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow/).
* A C-LSTM classifier. See clstm_classifier.py. Reference: [A C-LSTM Neural Network for Text Classification](https://arxiv.org/abs/1511.08630).
## Requirements  
* Python 3.x  
* Tensorflow > 1.5
* Sklearn > 0.19.0  
## Data Format
Training data should be stored in csv file. The first line of the file should be ["label", "content"] or ["content", "label"].
## Train
Run train.py to train the models.
Parameters:
```
optional arguments:
  --clf CLF             Type of classifiers. Default: cnn. You have four
                        choices: [cnn, lstm, blstm, clstm]
  --data_file DATA_FILE
                        Data file path
  --stop_word_file STOP_WORD_FILE
                        Stop word file path
  --language LANGUAGE   Language of the data file. You have two choices: [ch,
                        en]
  --min_frequency MIN_FREQUENCY
                        Minimal word frequency
  --num_classes NUM_CLASSES
                        Number of classes
  --max_length MAX_LENGTH
                        Max document length
  --vocab_size VOCAB_SIZE
                        Vocabulary size
  --test_size TEST_SIZE
                        Cross validation test size
  --embedding_size EMBEDDING_SIZE
                        Word embedding size. For CNN, C-LSTM.
  --filter_sizes FILTER_SIZES
                        CNN filter sizes. For CNN, C-LSTM.
  --num_filters NUM_FILTERS
                        Number of filters per filter size. For CNN, C-LSTM.
  --hidden_size HIDDEN_SIZE
                        Number of hidden units in the LSTM cell. For LSTM, Bi-
                        LSTM
  --num_layers NUM_LAYERS
                        Number of the LSTM cells. For LSTM, Bi-LSTM, C-LSTM
  --keep_prob KEEP_PROB
                        Dropout keep probability
  --learning_rate LEARNING_RATE
                        Learning rate
  --l2_reg_lambda L2_REG_LAMBDA
                        L2 regularization lambda
  --batch_size BATCH_SIZE
                        Batch size
  --num_epochs NUM_EPOCHS
                        Number of epochs
  --decay_rate DECAY_RATE
                        Learning rate decay rate. Range: (0, 1]
  --decay_steps DECAY_STEPS
                        Learning rate decay steps.
  --evaluate_every_steps EVALUATE_EVERY_STEPS
                        Evaluate the model on validation set after this many
                        steps
  --save_every_steps SAVE_EVERY_STEPS
                        Save the model after this many steps
  --num_checkpoint NUM_CHECKPOINT
                        Number of models to store
```
You could run train.py to start training. For example:
```
python train.py --data_file=./data/data.csv --clf=lstm
```

After the training is done, you can use tensorboard to see the visualizations of the graph, losses and evaluation metrics:  

```
tensorboard --logdir=./runs/1111111111/summaries
```

## Test 
Run test.py to evaluate the trained model  
Parameters: 
```
optional arguments:
  --test_data_file TEST_DATA_FILE
                        Test data file path
  --run_dir RUN_DIR     Restore the model from this run
  --checkpoint CHECKPOINT
                        Restore the graph from this checkpoint
  --batch_size BATCH_SIZE
                        Test batch size
```
You could run test.py to start evaluation. For example:
```
python test.py --test_data_file=./data/data.csv --run_dir=./runs/1111111111 --checkpoint=clf-10000
```

Owner

  • Name: 周奇
  • Login: chapzq77
  • Kind: user

GitHub Events

Total
Last Year