band

BAND：BERT Application aNd Deployment, A simple and efficient BERT model training and deployment framework.

Keywords

bert named-entity-recognition question-answering reading-comprehension sequence-labeling text-classification transformer

Keywords from Contributors

interactive serializer packaging network-simulation hacking autograding observability embedded optim standardization

Last synced: 10 months ago · JSON representation

Repository

BAND：BERT Application aNd Deployment, A simple and efficient BERT model training and deployment framework.

Basic Info

Host: GitHub
Owner: SunYanCN
License: apache-2.0
Language: JavaScript
Default Branch: master
Homepage:
Size: 2.42 MB

Statistics

Stars: 6
Watchers: 2
Forks: 1
Open Issues: 3
Releases: 0

Topics

bert named-entity-recognition question-answering reading-comprehension sequence-labeling text-classification transformer

Created over 6 years ago · Last pushed about 3 years ago

Metadata Files

Readme Contributing Funding License

BAND：BERT Application aNd Deployment

A simple and efficient BERT model training and deployment framework.

BAND：BERT Application aNd Deployment
Documents »

Examples · Report Bug · Feature Request · Questions

What is it

**Encoding/Embedding** is a upstream task of encoding any inputs in the form of text, image, audio, video, transactional data to fixed length vector. Embeddings are quite popular in the field of NLP, there has been various Embeddings models being proposed in recent years by researchers, some of the famous one are bert, xlnet, word2vec etc. The goal of this repo is to build one stop solution for all embeddings techniques available, here we are starting with popular text embeddings for now and later on we aim to add as much technique for image, audio, video inputs also. **Finally**, **`embedding-as-service`** help you to encode any given text to fixed length vector from supported embeddings and models.

💾 Installation

Install the band via pip.
bash $ pip install band -U

Note that the code MUST be running on Python >= 3.6. Again module does not support Python 2!

⚡ ️Getting Started

Text Classification Example

```python import time import tensorflow as tf from transformers import BertConfig, BertTokenizer from band.model import TFBertForSequenceClassification from band.dataset import ChnSentiCorp from band.progress import classificationconvertexamplestofeatures

USEXLA = False USEAMP = False

EPOCHS = 1 BATCHSIZE = 16 EVALBATCHSIZE = 16 TESTBATCHSIZE = 1 MAXSEQLEN = 128 LEARNINGRATE = 3e-5 SAVEMODEL = False pretraineddir = "/home/band/models"

tf.config.optimizer.setjit(USEXLA) tf.config.optimizer.setexperimentaloptions({"automixedprecision": USE_AMP})

dataset = ChnSentiCorp(savepath="/tmp/band") data, label = dataset.data, dataset.label dataset.datasetinformation()

trainnumber, evalnumber, testnumber = dataset.trainexamplesnum, dataset.evalexamplesnum, dataset.testexamples_num

tokenizer = BertTokenizer.frompretrained(pretraineddir) traindataset = classificationconvertexamplestofeatures(data['train'], tokenizer, maxlength=MAXSEQLEN, labellist=label, outputmode="classification") validdataset = classificationconvertexamplestofeatures(data['validation'], tokenizer, maxlength=MAXSEQLEN, labellist=label, outputmode="classification") testdataset = classificationconvertexamplestofeatures(data['test'], tokenizer, maxlength=MAXSEQLEN, labellist=label, outputmode="classification")

traindataset = traindataset.shuffle(100).batch(BATCHSIZE, dropremainder=True).repeat(EPOCHS) traindataset = traindataset.prefetch(tf.data.experimental.AUTOTUNE) validdataset = validdataset.batch(EVALBATCHSIZE) validdataset = validdataset.prefetch(tf.data.experimental.AUTOTUNE) testdataset = testdataset.batch(TESTBATCHSIZE) testdataset = testdataset.prefetch(tf.data.experimental.AUTOTUNE)

config = BertConfig.frompretrained(pretraineddir, numlabels=dataset.numlabels) model = TFBertForSequenceClassification.frompretrained(pretraineddir, config=config, frompt=True) optimizer = tf.keras.optimizers.Adam(learningrate=LEARNINGRATE, epsilon=1e-08) if USEAMP: optimizer = tf.keras.mixedprecision.experimental.LossScaleOptimizer(optimizer, 'dynamic') loss = tf.keras.losses.SparseCategoricalCrossentropy(fromlogits=True) metric = tf.keras.metrics.SparseCategoricalAccuracy('accuracy') model.compile(optimizer=optimizer, loss=loss, metrics=[metric])

history = model.fit(traindataset, epochs=EPOCHS, stepsperepoch=trainnumber // BATCHSIZE, validationdata=validdataset, validationsteps=evalnumber // EVALBATCH_SIZE)

loss, accuracy = model.evaluate(testdataset, steps=testnumber // TESTBATCHSIZE) print(loss, accuracy)

if SAVEMODEL: savedmodelpath = "./savedmodels/{}".format(int(time.time())) model.save(savedmodelpath, save_format="tf") ```

Named Entity Recognition

```python import time import tensorflow as tf from transformers import BertTokenizer, BertConfig from band.dataset import MSRANER from band.seqeval.callbacks import F1Metrics from band.model import TFBertForTokenClassification from band.utils import TrainConfig from band.progress import NERDataset

pretrained_dir = '/home/band/models'

trainconfig = TrainConfig(epochs=3, trainbatchsize=32, evalbatchsize=32, testbatchsize=1, maxlength=128, learningrate=3e-5, savemodel=False)

dataset = MSRANER(savepath="/tmp/band")

config = BertConfig.frompretrained(pretraineddir, numlabels=dataset.numlabels, returnunusedkwargs=True) tokenizer = BertTokenizer.frompretrained(pretraineddir) model = TFBertForTokenClassification.frompretrained(pretraineddir, config=config, from_pt=True)

ner = NERDataset(dataset=dataset, tokenizer=tokenizer, trainconfig=train_config) model.compile(optimizer=ner.optimizer, loss=ner.loss, metrics=[ner.metric])

f1 = F1Metrics(dataset.getlabels(), validationdata=ner.validdataset, steps=ner.validsteps)

history = model.fit(ner.traindataset, epochs=trainconfig.epochs, stepsperepoch=ner.test_steps, callbacks=[f1])

loss, accuracy = model.evaluate(ner.testdataset, steps=ner.teststeps)

if trainconfig.savemodel: savedmodelpath = "./savedmodels/{}".format(int(time.time())) model.save(savedmodelpath, saveformat="tf") ```

Question Answering

```python import time import tensorflow as tf from transformers import BertConfig, BertTokenizer from band.model import TFBertForQuestionAnswering from band.dataset import Squad from band.progress import squadconvertexamplestofeatures, parallelsquadconvertexamplesto_features

USEXLA = False USEAMP = False

EPOCHS = 1 BATCHSIZE = 4 EVALBATCHSIZE = 4 TESTBATCHSIZE = 1 MAXSEQLEN = 128 LEARNINGRATE = 3e-5 SAVEMODEL = False pretraineddir = "/home/band/models"

tf.config.optimizer.setjit(USEXLA) tf.config.optimizer.setexperimentaloptions({"automixedprecision": USE_AMP})

dataset = Squad(save_path="/tmp/band") data, label = dataset.data, dataset.label

trainnumber, evalnumber = dataset.trainexamplesnum, dataset.evalexamplesnum

tokenizer = BertTokenizer.frompretrained(pretraineddir) traindataset = parallelsquadconvertexamplestofeatures(data['train'], tokenizer, maxlength=MAXSEQLEN, docstride=128, istraining=True, maxquerylength=64) validdataset = parallelsquadconvertexamplestofeatures(data['validation'], tokenizer, maxlength=MAXSEQLEN, docstride=128, istraining=False, maxquerylength=64)

traindataset = traindataset.shuffle(100).batch(BATCHSIZE, dropremainder=True).repeat(EPOCHS) traindataset = traindataset.prefetch(tf.data.experimental.AUTOTUNE) validdataset = validdataset.batch(EVALBATCHSIZE) validdataset = validdataset.prefetch(tf.data.experimental.AUTOTUNE)

config = BertConfig.frompretrained(pretraineddir) model = TFBertForQuestionAnswering.frompretrained(pretraineddir, config=config, frompt=True, maxlength=MAXSEQLEN)

print(model.summary())

optimizer = tf.keras.optimizers.Adam(learningrate=LEARNINGRATE, epsilon=1e-08) if USEAMP: optimizer = tf.keras.mixedprecision.experimental.LossScaleOptimizer(optimizer, 'dynamic')

loss = {'startposition': tf.keras.losses.SparseCategoricalCrossentropy(fromlogits=True), 'endposition': tf.keras.losses.SparseCategoricalCrossentropy(fromlogits=True)} metrics = {'startposition': tf.keras.metrics.SparseCategoricalAccuracy('accuracy'), 'endposition': tf.keras.metrics.SparseCategoricalAccuracy('accuracy')}

model.compile(optimizer=optimizer, loss=loss, metrics=metrics)

history = model.fit(traindataset, epochs=EPOCHS, stepsperepoch=trainnumber // BATCHSIZE, validationdata=validdataset, validationsteps=evalnumber // EVALBATCH_SIZE)

if SAVEMODEL: savedmodelpath = "./savedmodels/{}".format(int(time.time())) model.save(savedmodelpath, save_format="tf")

```

Dataset

For more information about dataset, see

✅ Supported Embeddings and Models

For more information about pretrained models, see

Stargazers over time

Owner

Name: SunYan
Login: SunYanCN
Kind: user
Location: WuHan
Company: HSUT

Website: http://pddj99.coding-pages.com/
Repositories: 50
Profile: https://github.com/SunYanCN

Smile Like Sunshine

GitHub Events

Total

Last Year

Committers

Last synced: over 2 years ago

All Time

Total Commits: 25
Total Committers: 3
Avg Commits per committer: 8.333
Development Distribution Score (DDS): 0.08

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
SunYan	4****N	23
SunYan	4****n	1
dependabot[bot]	4****]	1

Issues and Pull Requests

Last synced: 11 months ago

All Time

Total issues: 0
Total pull requests: 12
Average time to close issues: N/A
Average time to close pull requests: 3 months
Total issue authors: 0
Total pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.67
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 12

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Pull Request Authors

dependabot[bot] (12)

Top Labels

Issue Labels

Pull Request Labels

dependencies (12)

Packages

Total packages: 1
Total downloads:
- pypi 225 last-month

Total dependent packages: 0
Total dependent repositories: 3
Total versions: 14
Total maintainers: 1

pypi.org: band

BERT Application

Homepage: https://github.com/sunyancn/band
Documentation: https://band.readthedocs.io/
License: MIT License
Latest release: 0.3.3
published over 6 years ago

Versions: 14
Dependent Packages: 0
Dependent Repositories: 3
Downloads: 225 Last month

Rankings

Dependent repos count: 9.0%

Dependent packages count: 10.0%

Average: 15.6%

Downloads: 16.0%

Stargazers count: 20.3%

Forks count: 22.6%

Maintainers (1)

sunyan

Last synced: 11 months ago

band

Science Score: 13.0%

Keywords

Keywords from Contributors

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

BAND：BERT Application aNd Deployment

What is it

💾 Installation

⚡ ️Getting Started

Text Classification Example

Named Entity Recognition

Question Answering

Dataset

✅ Supported Embeddings and Models

Stargazers over time

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: band

Rankings

Maintainers (1)

Dependencies