https://github.com/chapzq77/textclassification

Text classification using different neural networks (CNN, LSTM, Bi-LSTM, C-LSTM).

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (6.8%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Text classification using different neural networks (CNN, LSTM, Bi-LSTM, C-LSTM).

Basic Info

Host: GitHub
Owner: chapzq77
License: mit
Language: Python
Default Branch: master
Homepage:
Size: 223 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Fork of zackhy/TextClassification

Created over 7 years ago · Last pushed almost 8 years ago

https://github.com/chapzq77/TextClassification/blob/master/

# Multi-class Text Classification
Implement four neural networks in Tensorflow for multi-class text classification problem.
## Models
* A LSTM classifier. See rnn_classifier.py
* A Bidirectional LSTM classifier. See rnn_classifier.py
* A CNN classifier. See cnn_classifier.py. Reference: [Implementing a CNN for Text Classification in Tensorflow](http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow/).
* A C-LSTM classifier. See clstm_classifier.py. Reference: [A C-LSTM Neural Network for Text Classification](https://arxiv.org/abs/1511.08630).
## Requirements
* Python 3.x
* Tensorflow > 1.5
* Sklearn > 0.19.0
## Data Format
Training data should be stored in csv file. The first line of the file should be ["label", "content"] or ["content", "label"].
## Train
Run train.py to train the models.
Parameters:
```
optional arguments:
--clf CLF Type of classifiers. Default: cnn. You have four
choices: [cnn, lstm, blstm, clstm]
--data_file DATA_FILE
Data file path
--stop_word_file STOP_WORD_FILE
Stop word file path
--language LANGUAGE Language of the data file. You have two choices: [ch,
en]
--min_frequency MIN_FREQUENCY
Minimal word frequency
--num_classes NUM_CLASSES
Number of classes
--max_length MAX_LENGTH
Max document length
--vocab_size VOCAB_SIZE
Vocabulary size
--test_size TEST_SIZE
Cross validation test size
--embedding_size EMBEDDING_SIZE
Word embedding size. For CNN, C-LSTM.
--filter_sizes FILTER_SIZES
CNN filter sizes. For CNN, C-LSTM.
--num_filters NUM_FILTERS
Number of filters per filter size. For CNN, C-LSTM.
--hidden_size HIDDEN_SIZE
Number of hidden units in the LSTM cell. For LSTM, Bi-
LSTM
--num_layers NUM_LAYERS
Number of the LSTM cells. For LSTM, Bi-LSTM, C-LSTM
--keep_prob KEEP_PROB
Dropout keep probability
--learning_rate LEARNING_RATE
Learning rate
--l2_reg_lambda L2_REG_LAMBDA
L2 regularization lambda
--batch_size BATCH_SIZE
Batch size
--num_epochs NUM_EPOCHS
Number of epochs
--decay_rate DECAY_RATE
Learning rate decay rate. Range: (0, 1]
--decay_steps DECAY_STEPS
Learning rate decay steps.
--evaluate_every_steps EVALUATE_EVERY_STEPS
Evaluate the model on validation set after this many
steps
--save_every_steps SAVE_EVERY_STEPS
Save the model after this many steps
--num_checkpoint NUM_CHECKPOINT
Number of models to store
```
You could run train.py to start training. For example:
```
python train.py --data_file=./data/data.csv --clf=lstm
```

After the training is done, you can use tensorboard to see the visualizations of the graph, losses and evaluation metrics:

```
tensorboard --logdir=./runs/1111111111/summaries
```

## Test
Run test.py to evaluate the trained model
Parameters:
```
optional arguments:
--test_data_file TEST_DATA_FILE
Test data file path
--run_dir RUN_DIR Restore the model from this run
--checkpoint CHECKPOINT
Restore the graph from this checkpoint
--batch_size BATCH_SIZE
Test batch size
```
You could run test.py to start evaluation. For example:
```
python test.py --test_data_file=./data/data.csv --run_dir=./runs/1111111111 --checkpoint=clf-10000
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/chapzq77/textclassification

Science Score: 10.0%

Repository

Basic Info

Statistics

https://github.com/chapzq77/TextClassification/blob/master/

Owner

GitHub Events

Total

Last Year