Recent Releases of language-pretraining

language-pretraining - v2.2.1

  • Able to select sentencepiece algorithm
  • Able to use multiprocessing in create_datasets.py
  • Move ELECTRA model file into models directory
  • Add DeBERTaV3 (alpha) implementation
    • This implementation does back propagation of generator and discriminator at the same time
    • In my experiment, models from this implementation perform worse than the models with my DeBERTaV2 implementation
    • So the implementation needs to be improved, however, I don't have time to put effort into this.

- Python
Published by retarfi about 3 years ago

language-pretraining - v2.2.0

Main changes are following: - jptranstokenizer is used for tokenizer - It enables other word tokenizers such as Juman++, Sudachi, and spacy LUW. - requirements.txt to pyproject.toml - This is unstable, especially the PyTorch part, and should be changed according to your own environment. - If you get an error in runpretraining.py, it may be due to pydantic. updating pydandic to the latest version may solve the problem, although the compatibility does not match. - Add Pre-mask option - To use this option, please specify ``--maskstyleand use--isdatasetmasked`` option in run_pretraining.py. - Add DeBERTa and DeBERTaV2 - Change license from Apache 2.0 to MIT

There are more changes in detail.
Please read Readme.md.

- Python
Published by retarfi about 3 years ago

language-pretraining - v2.1.0

- Python
Published by retarfi almost 4 years ago

language-pretraining - v2.0.0

Apply Hugging Face's datasets library https://github.com/retarfi/language-pretraining/tree/336c3699679dd59be788acc21f83188efa76b95b

New features: - Apply datasets library - You need to run createdatasets.py before running runpretraining.py - Check README.md#Create Dataset for how to run create_datasets.py - Log losses of discriminator and generator of ELECTRA - Additional pre-training from a checkpoint is avaiable - Check README.md#Additional Pre-training for setting in detail

- Python
Published by retarfi almost 4 years ago

language-pretraining - v1.0

First release

- Python
Published by retarfi over 4 years ago