https://github.com/asyml/texar-pytorch

Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/

https://github.com/asyml/texar-pytorch

Science Score: 20.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
    3 of 27 committers (11.1%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.6%) to scientific vocabulary

Keywords

bert casl-project data-processing deep-learning dialog-systems gpt-2 machine-learning machine-translation natural-language-processing python pytorch roberta texar texar-pytorch text-data text-generation xlnet

Keywords from Contributors

information-retrieval natural-language
Last synced: 6 months ago · JSON representation

Repository

Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/

Basic Info
  • Host: GitHub
  • Owner: asyml
  • License: apache-2.0
  • Language: Python
  • Default Branch: master
  • Homepage: https://asyml.io
  • Size: 3.08 MB
Statistics
  • Stars: 747
  • Watchers: 24
  • Forks: 113
  • Open Issues: 35
  • Releases: 0
Topics
bert casl-project data-processing deep-learning dialog-systems gpt-2 machine-learning machine-translation natural-language-processing python pytorch roberta texar texar-pytorch text-data text-generation xlnet
Created almost 7 years ago · Last pushed almost 4 years ago
Metadata Files
Readme Changelog License

README.md




pypi Python Build codecov Documentation Status License

Texar-PyTorch is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides a library of easy-to-use ML modules and functionalities for composing whatever models and algorithms. The tool is designed for both researchers and practitioners for fast prototyping and experimentation. Texar-PyTorch was originally developed and is actively contributed by Petuum and CMU in collaboration with other institutes. A mirror of this repository is maintained by Petuum Open Source.

Texar-PyTorch integrates many of the best features of TensorFlow into PyTorch, delivering highly usable and customizable modules superior to PyTorch native ones.

Key Features

  • Two Versions, (Mostly) Same Interfaces. Texar-PyTorch (this repo) and Texar-TF have mostly the same interfaces. Both further combine the best design of TF and PyTorch:
    • Interfaces and variable sharing in PyTorch convention
    • Excellent factorization and rich functionalities in TF convention.
  • Versatile to support broad needs:
    • data processing, model architectures, loss functions, training and inference algorithms, evaluation, ...
    • encoder(s) to decoder(s), sequential- and self-attentions, memory, hierarchical models, classifiers, ...
    • maximum likelihood learning, reinforcement learning, adversarial learning, probabilistic modeling, ...
  • Fully Customizable at multiple abstraction level -- both novice-friendly and expert-friendly.
    • Free to plug in whatever external modules, since Texar is fully compatible with the native PyTorch APIs.
  • Modularized for maximal re-use and clean APIs, based on principled decomposition of Learning-Inference-Model Architecture.
  • Rich Pre-trained Models, Rich Usage with Uniform Interfaces. BERT, GPT2, XLNet, etc, for encoding, classification, generation, and composing complex models with other Texar components!
  • Clean, detailed documentation and rich examples.




Library API Example

A code example that builds and trains a Conditional GPT2 model (e.g., for machine translation and text summarization):

```python import texar.torch as tx from texar.torch.run import *

(1) Modeling

class ConditionalGPT2Model(nn.Module): """An encoder-decoder model with GPT-2 as the decoder.""" def init(self, vocabsize): super().init() # Use hyperparameter dict for model configuration self.embedder = tx.modules.WordEmbedder(vocabsize, hparams=embhparams) self.encoder = tx.modules.TransformerEncoder(hparams=enchparams) self.decoder = tx.modules.GPT2Decoder("gpt2-small") # With pre-trained weights

def getdecoderoutput(self, batch, train=True): """Perform model inference, i.e., decoding.""" encstates = self.encoder(inputs=self.embedder(batch['sourcetextids']), sequencelength=batch['sourcelength']) if train: # Teacher-forcing decoding at training time return self.decoder( inputs=batch['targettextids'], sequencelength=batch['targetlength'] - 1, memory=encstates, memorysequencelength=batch['sourcelength']) else: # Beam search decoding at prediction time starttokens = torch.fulllike(batch['sourcetextids'][:, 0], BOS) return self.decoder( beamwidth=5, starttokens=starttokens, memory=encstates, memorysequencelength=batch['source_length'])

def forward(self, batch): """Compute training loss.""" outputs = self.getdecoderoutput(batch) loss = tx.losses.sequencesparsesoftmaxcrossentropy( # Sequence loss labels=batch['targettextids'][:, 1:], logits=outputs.logits, sequencelength=batch['target_length'] - 1) # Automatic masking return {"loss": loss}

def predict(self, batch): """Compute model predictions.""" sequence, _ = self.getdecoderoutput(batch, train=False) return {"gentext_ids": sequence}

(2) Data

Create dataset splits using built-in data loaders

datasets = {split: tx.data.PairedTextData(hparams=data_hparams[split]) for split in ["train", "valid", "test"]}

model = ConditionalGPT2Model(datasets["train"].target_vocab.size)

(3) Training

Manage the train-eval loop with the Executor API

executor = Executor( model=model, datasets=datasets, optimizer={"type": torch.optim.Adam, "kwargs": {"lr": 5e-4}}, stoptrainingon=cond.epoch(20), logevery=cond.iteration(100), validateevery=cond.epoch(1), trainmetric=("loss", metric.RunningAverage(10, predname="loss")), validmetric=metric.BLEU(predname="gentextids", labelname="targettextids"), saveevery=cond.validation(better=True), checkpointdir="outputs/savedmodels/") executor.train() executor.test(datasets["test"]) ``` Many more examples are available here.

Installation

Texar-PyTorch requires:

  • python == 3.6 or 3.7
  • torch >= 1.0.0. Please follow the official instructions to install the appropriate version.

After torch is installed, install Texar from PyPI: bash pip install texar-pytorch

To use cutting-edge features or develop locally, install from source: git clone https://github.com/asyml/texar-pytorch.git cd texar-pytorch pip install .

To use tensorboard support with Executor, please install tensorboardX with the following command

commandline pip install tensorboardX

Getting Started

Reference

If you use Texar, please cite the tech report with the following BibTex entry:

``` Texar: A Modularized, Versatile, and Extensible Toolkit for Text Generation Zhiting Hu, Haoran Shi, Bowen Tan, Wentao Wang, Zichao Yang, Tiancheng Zhao, Junxian He, Lianhui Qin, Di Wang, Xuezhe Ma, Zhengzhong Liu, Xiaodan Liang, Wanrong Zhu, Devendra Sachan and Eric Xing ACL 2019

@inproceedings{hu2019texar, title={Texar: A Modularized, Versatile, and Extensible Toolkit for Text Generation}, author={Hu, Zhiting and Shi, Haoran and Tan, Bowen and Wang, Wentao and Yang, Zichao and Zhao, Tiancheng and He, Junxian and Qin, Lianhui and Wang, Di and others}, booktitle={ACL 2019, System Demonstrations}, year={2019} } ```

License

Apache License 2.0

Companies and Universities Supporting Texar

                  

Owner

  • Name: ASYML
  • Login: asyml
  • Kind: organization

Machine Learning as Machine Assembly, part of the CASL project https://www.casl-project.ai/

GitHub Events

Total
  • Watch event: 3
Last Year
  • Watch event: 3

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 455
  • Total Committers: 27
  • Avg Commits per committer: 16.852
  • Development Distribution Score (DDS): 0.675
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Zecong Hu h****g@g****m 148
Pengzhi Gao p****o@p****m 124
hunterhector h****r@g****m 37
Shibiao Nong s****g@p****m 30
Zhiting Hu z****u@g****m 25
Allen Shi h****7@g****m 24
Avinash a****1@g****m 17
“Avinash” a****u@p****m 9
wanglechuan-gif w****n@e****m 8
Thomas 1****g 4
mylibrar 5****r 4
weiwei718 w****2@a****u 4
Atif Ahmed a****3@g****m 3
ZhitingHu z****u@p****m 3
swapnull7 c****a@y****n 2
Zeya Wang z****e@g****m 2
Bowen Tan t****n@s****n 1
Jun Gao i****n@g****m 1
Omkar Pangarkar o****r@g****m 1
Silver c****e@o****m 1
Zhanyuan Zhang 3****b 1
haoyuLucas 4****3@q****m 1
jennyzhang-petuum 7****m 1
王文涛 w****0@p****n 1
tom t****m@l****m 1
jieralice13 j****n@g****m 1
qinzzz 3****z 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 37
  • Total pull requests: 64
  • Average time to close issues: 3 months
  • Average time to close pull requests: about 1 month
  • Total issue authors: 21
  • Total pull request authors: 19
  • Average comments per issue: 1.19
  • Average comments per pull request: 1.91
  • Merged pull requests: 53
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • gpengzhi (9)
  • hunterhector (5)
  • ankitvad (2)
  • mylibrar (2)
  • rafiyajaved (2)
  • tanyuqian (2)
  • hongweizeng (1)
  • chiragraman (1)
  • roemmele (1)
  • mufeiteng (1)
  • li3cmz (1)
  • hadaev8 (1)
  • Codle (1)
  • jennyzhang-petuum (1)
  • pajola (1)
Pull Request Authors
  • gpengzhi (35)
  • mylibrar (5)
  • hunterhector (3)
  • huzecong (3)
  • wanglec (2)
  • swapnull7 (2)
  • ZeyaWang (2)
  • jieralice13 (1)
  • TomNong (1)
  • jennyzhang-petuum (1)
  • qinzzz (1)
  • limberc (1)
  • haoyuLucas (1)
  • imgaojun (1)
  • odp (1)
Top Labels
Issue Labels
enhancement (12) topic: modules (10) question (9) bug (8) topic: examples (7) topic: data (4) topic: executor (3) discussion (2) help wanted (2) good first issue (2) topic: docs (2) priority: high (1) priority: low (1)
Pull Request Labels
bug (1) topic: executor (1)

Packages

  • Total packages: 2
  • Total downloads:
    • pypi 79 last-month
  • Total dependent packages: 1
    (may contain duplicates)
  • Total dependent repositories: 14
    (may contain duplicates)
  • Total versions: 11
  • Total maintainers: 1
proxy.golang.org: github.com/asyml/texar-pytorch
  • Versions: 6
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 6.4%
Average: 6.6%
Dependent repos count: 6.8%
Last synced: 6 months ago
pypi.org: texar-pytorch

Toolkit for Machine Learning and Text Generation

  • Versions: 5
  • Dependent Packages: 1
  • Dependent Repositories: 14
  • Downloads: 79 Last month
Rankings
Stargazers count: 2.3%
Dependent repos count: 3.9%
Forks count: 4.3%
Dependent packages count: 4.8%
Average: 8.3%
Downloads: 26.3%
Maintainers (1)
Last synced: 6 months ago

Dependencies

docs/requirements.txt pypi
  • Pygments >=2.1.1,<2.5.1
  • asyml-utilities >=0.0.1.dev1
  • funcsigs *
  • mypy_extensions *
  • recommonmark *
  • regex >=2018.01.10
  • sentencepiece >=0.1.8
  • sphinx *
  • sphinx-rtd-theme >=0.2.4
  • sphinxcontrib-spelling *
examples/bert/requirements.txt pypi
  • hyperopt *
  • nni *
  • tensorboardX >=1.8
  • tensorflow *
examples/gpt-2/requirements.txt pypi
  • regex ==2017.4.5
  • tensorflow >=1.12
examples/transformer/requirements.txt pypi
  • sentencepiece *
  • tqdm *
examples/xlnet/requirements.txt pypi
  • numpy >=1.16
  • sentencepiece *
  • tensorflow >=1.8
  • torch >=1.1.0
  • tqdm *
requirements.txt pypi
  • dill >=0.3.3
  • funcsigs >=1.0.2
  • mypy_extensions >=0.4.1
  • nni >=2.0.0
  • numpy >=1.15.4
  • regex >=2018.01.10
  • sentencepiece >=0.1.8
  • six >=1.15
  • torch >=1.0.0
setup.py pypi
  • asyml-utilities >=0.0.1.dev1
  • funcsigs *
  • mypy_extensions *
  • numpy >=1.16.6
  • packaging >=19.0
  • regex >=2018.01.10
  • requests *
  • sentencepiece >=0.1.96
  • six *