bart-small

bart-small model release page

https://github.com/lucadiliello/bart-small

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (4.9%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

bart-small model release page

Basic Info
  • Host: GitHub
  • Owner: lucadiliello
  • License: gpl-2.0
  • Default Branch: master
  • Homepage:
  • Size: 10.7 KB
Statistics
  • Stars: 2
  • Watchers: 1
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Created about 3 years ago · Last pushed over 2 years ago
Metadata Files
Readme License Citation

README.md

bart-small

bart-small is a variant of bart-base with reduced size. Most of the hyperparamters are the same as bart-base apart from those defining the size of the model. In particular, for both the encoder and the decoder we applied the following changes:

  • Max positional embeddings: 512
  • Hidden size: 512
  • Attention heads: 8
  • FF size: 2048

Get the model

Thanks to the transformers library, using bart-small is as simple as:

```python from transformers import BartModel, BartConfig, BartTokenizerFast

config = BartConfig.frompretrained('lucadiliello/bart-small') model = BartModel.frompretrained('lucadiliello/bart-small') tokenizer = BartTokenizerFast.from_pretrained('lucadiliello/bart-small') ```

Pre-Training

Training hyperparameters:

  • GPUs: 8x A100 with deepspeed in FP32
  • total batch size: 1024
  • number of training steps: 200k
  • max sequence length: 512
  • denoising:
    • probability: 0.3
    • max number of spans per sample: 200
    • whole word denoising (similar to BERT's whole word masking)
    • span length distribution: poisson (λ=2.5 words)
    • sentences shuffling
  • optimization:
    • AdamW
    • lr: triangular with peak 1e-04
    • warmup steps: 10K
    • weight decay: 0.01

Datasets: - BookCorpus - CC-News - OpenWebText - English Wikipedia

Benchmarks

Summarization

CNN/DailyMail XSum
R1 R2 RL R1 R2 RL
40.2 18.2 37.6 34.8 13.0 27.8

Owner

  • Name: Luca Di Liello
  • Login: lucadiliello
  • Kind: user
  • Location: San Francisco
  • Company: Amazon Alexa

Applied Scientist at Amazon Alexa AI.

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: bart-small
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Luca
    family-names: Di Liello
    email: luca.diliello@unitn.it
    affiliation: University of Trento
    orcid: 'https://orcid.org/0000-0002-9970-5048'
repository-code: 'https://github.com/lucadiliello/bart-small'
url: 'https://huggingface.co/lucadiliello/bart-small'
abstract: >-
  BART-Small is a lighter version of BART-Base with less
  attention heads, smaller FFT and a smaller hidden-size.
keywords:
  - 'LM, Denoising, BART'
license: GPL-2.0

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1