bart-small

bart-small model release page

https://github.com/lucadiliello/bart-small

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (4.9%) to scientific vocabulary

Last synced: 9 months ago · JSON representation ·

Repository

bart-small model release page

Basic Info

Host: GitHub
Owner: lucadiliello
License: gpl-2.0
Default Branch: master
Homepage:
Size: 10.7 KB

Statistics

Stars: 2
Watchers: 1
Forks: 1
Open Issues: 0
Releases: 0

Created over 3 years ago · Last pushed over 2 years ago

Metadata Files

Readme License Citation

`bart-small`

bart-small is a variant of bart-base with reduced size. Most of the hyperparamters are the same as bart-base apart from those defining the size of the model. In particular, for both the encoder and the decoder we applied the following changes:

Max positional embeddings: 512
Hidden size: 512
Attention heads: 8
FF size: 2048

Get the model

Thanks to the transformers library, using bart-small is as simple as:

```python from transformers import BartModel, BartConfig, BartTokenizerFast

config = BartConfig.frompretrained('lucadiliello/bart-small') model = BartModel.frompretrained('lucadiliello/bart-small') tokenizer = BartTokenizerFast.from_pretrained('lucadiliello/bart-small') ```

Pre-Training

Training hyperparameters:

GPUs: 8x A100 with deepspeed in FP32
total batch size: 1024
number of training steps: 200k
max sequence length: 512
denoising:
- probability: 0.3
- max number of spans per sample: 200
- whole word denoising (similar to BERT's whole word masking)
- span length distribution: poisson (λ=2.5 words)
- sentences shuffling
optimization:
- AdamW
- lr: triangular with peak 1e-04
- warmup steps: 10K
- weight decay: 0.01

Datasets: - BookCorpus - CC-News - OpenWebText - English Wikipedia

Benchmarks

Summarization

CNN/DailyMail			XSum
R1	R2	RL	R1	R2	RL
40.2	18.2	37.6	34.8	13.0	27.8

Owner

Name: Luca Di Liello
Login: lucadiliello
Kind: user
Location: San Francisco
Company: Amazon Alexa

Website: lucadiliello.github.io
Repositories: 59
Profile: https://github.com/lucadiliello

Applied Scientist at Amazon Alexa AI.

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: bart-small
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Luca
    family-names: Di Liello
    email: luca.diliello@unitn.it
    affiliation: University of Trento
    orcid: 'https://orcid.org/0000-0002-9970-5048'
repository-code: 'https://github.com/lucadiliello/bart-small'
url: 'https://huggingface.co/lucadiliello/bart-small'
abstract: >-
  BART-Small is a lighter version of BART-Base with less
  attention heads, smaller FFT and a smaller hidden-size.
keywords:
  - 'LM, Denoising, BART'
license: GPL-2.0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science