https://github.com/makcedward/nlpaug

Data augmentation for NLP

Science Score: 33.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
✓
Committers with academic emails
1 of 33 committers (3.0%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.2%) to scientific vocabulary

Keywords

adversarial-attacks adversarial-example ai artificial-intelligence augmentation data-science machine-learning ml natural-language-processing nlp

Keywords from Contributors

transformer agents vlm speech-recognition qwen pytorch-transformers pretrained-models model-hub glm gemma

Last synced: 10 months ago · JSON representation

Repository

Data augmentation for NLP

Basic Info

Host: GitHub
Owner: makcedward
License: mit
Language: Jupyter Notebook
Default Branch: master
Homepage: https://makcedward.github.io/
Size: 3.21 MB

Statistics

Stars: 4,577
Watchers: 41
Forks: 468
Open Issues: 80
Releases: 25

Topics

adversarial-attacks adversarial-example ai artificial-intelligence augmentation data-science machine-learning ml natural-language-processing nlp

Created over 7 years ago · Last pushed about 2 years ago

Metadata Files

Readme Changelog Funding License

nlpaug

This python library helps you with augmenting nlp for your machine learning projects. Visit this introduction to understand about Data Augmentation in NLP. Augmenter is the basic element of augmentation while Flow is a pipeline to orchestra multi augmenter together.

Features

Generate synthetic data for improving model performance without manual effort
Simple, easy-to-use and lightweight library. Augment data in 3 lines of code
Plug and play to any machine leanring/ neural network frameworks (e.g. scikit-learn, PyTorch, TensorFlow)
Support textual and audio input

Textual Data Augmentation Example

Acoustic Data Augmentation Example

Quick Demo

Quick Example
Example of Augmentation for Textual Inputs
Example of Augmentation for Multilingual Textual Inputs
Example of Augmentation for Spectrogram Inputs
Example of Augmentation for Audio Inputs
Example of Orchestra Multiple Augmenters
Example of Showing Augmentation History
How to train TF-IDF model
How to train LAMBADA model
How to create custom augmentation
API Documentation

Augmenter

Flow

Installation

The library supports python 3.5+ in linux and window platform.

To install the library: bash pip install numpy requests nlpaug or install the latest version (include BETA features) from github directly bash pip install numpy git+https://github.com/makcedward/nlpaug.git or install over conda bash conda install -c makcedward nlpaug

If you use BackTranslationAug, ContextualWordEmbsAug, ContextualWordEmbsForSentenceAug and AbstSummAug, installing the following dependencies as well bash pip install torch>=1.6.0 transformers>=4.11.3 sentencepiece

If you use LambadaAug, installing the following dependencies as well bash pip install simpletransformers>=0.61.10

If you use AntonymAug, SynonymAug, installing the following dependencies as well bash pip install nltk>=3.4.5

If you use WordEmbsAug (word2vec, glove or fasttext), downloading pre-trained model first and installing the following dependencies as well ```bash from nlpaug.util.file.download import DownloadUtil DownloadUtil.downloadword2vec(destdir='.') # Download word2vec model DownloadUtil.downloadglove(modelname='glove.6B', destdir='.') # Download GloVe model DownloadUtil.downloadfasttext(modelname='wiki-news-300d-1M', destdir='.') # Download fasttext model

pip install gensim>=4.1.2 ```

If you use SynonymAug (PPDB), downloading file from the following URI. You may not able to run the augmenter if you get PPDB file from other website bash http://paraphrase.org/#/download

If you use PitchAug, SpeedAug and VtlpAug, installing the following dependencies as well bash pip install librosa>=0.9.1 matplotlib

Recent Changes

1.1.11 Jul 6, 2022

See changelog for more details.

Extension Reading

Reference

This library uses data (e.g. capturing from internet), research (e.g. following augmenter idea), model (e.g. using pre-trained model) See data source for more details.

Citation

latex @misc{ma2019nlpaug, title={NLP Augmentation}, author={Edward Ma}, howpublished={https://github.com/makcedward/nlpaug}, year={2019} }

This package is cited by many books, workshop and academic research papers (70+). Here are some of examples and you may visit here to get the full list.

Contributions

_{sakares saengkaew}

_{Binoy Dalal}

_{Emrecan Çelik}

Owner

Name: Edward Ma
Login: makcedward
Kind: user
Location: San Francisco Bay Area
Company: SambaNova Systems

Website: https://makcedward.github.io/
Repositories: 11
Profile: https://github.com/makcedward

Focus on Natural Language Processing, Transferring Learning, Data Science Architecture

GitHub Events

Total

Issues event: 1
Watch event: 170
Issue comment event: 2
Pull request event: 3
Fork event: 9

Last Year

Issues event: 1
Watch event: 170
Issue comment event: 2
Pull request event: 3
Fork event: 9

Committers

Last synced: about 1 year ago

All Time

Total Commits: 603
Total Committers: 33
Avg Commits per committer: 18.273
Development Distribution Score (DDS): 0.081

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Edward Ma	m**d@g**m	554
Chirag Jain	j**5@g**m	8
binoydalal	b**l@u**u	4
Ricardo Pieper	r**r@l**m	3
Anatoly Vostryakov	a**v@g**m	2
DrMatters	q**q@g**m	2
John Giorgi	j**i@g**m	2
logigo	4****o	2
Mariia Trofimova	m**y@b**m	2
Jessica Sousa	j**s@g**m	1
Joanna Bitton	j**n@g**m	1
João António	j**e@g**m	1
MarkusSagen	m**n@g**m	1
Narayan Acharya	n**6@g**m	1
Rogier Stegeman	4****n	1
Sakares Saengkaew	s**s@g**m	1
Sebastian Sosa	s**e@g**m	1
Tan Li	t**n@t**v	1
USVSN SAI PRASHANTH	5****h	1
Vishal Singh	v**x@g**m	1
b.giahuy	h**i@e**t	1
emrecncelik	e**k@g**m	1
hsm207	h****7	1
karthikmurugadoss	k**k@n**t	1
phunc20	w**0@g**m	1
robolamp	r**p@y**u	1
Ivan Pereira	n**1@g**m	1
Ilya Fedorov	b**n@m**u	1
Harrison Chase	h**7@g**m	1
Chandan Akiti	c**i@g**m	1
and 3 more...

Committer Domains (Top 20 + Academic)

mail.ru: 1 ya.ru: 1 nference.net: 1 emandai.net: 1 tanli.dev: 1 blackswan-technologies.com: 1 uh.edu: 1

Issues and Pull Requests

Last synced: about 1 year ago

All Time

Total issues: 94
Total pull requests: 24
Average time to close issues: 3 months
Average time to close pull requests: 16 days
Total issue authors: 89
Total pull request authors: 21
Average comments per issue: 1.29
Average comments per pull request: 0.13
Merged pull requests: 14
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 2
Pull requests: 4
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 2
Pull request authors: 4
Average comments per issue: 0.0
Average comments per pull request: 0.25
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

beyondguo (3)
lindsaydbrin (2)
pratikchhapolika (2)
kgarg8 (2)
wiseyoungbuck (1)
moe-men (1)
980202006 (1)
EtherealRise (1)
Juliano-rb (1)
vc34 (1)
fratambot (1)
lei-liu1 (1)
bhomass (1)
le8888e (1)
anvitha-jain (1)

Pull Request Authors

JohnGiorgi (2)
SR-Rubel (2)
Keramatfar (2)
makcedward (2)
igopalakrishna (2)
sbrugman (2)
Logigo (2)
tshu-w (2)
emrecncelik (1)
robolamp (1)
EvanUp (1)
litanlitudan (1)
IgorMunizS (1)
baskrahmer (1)
MarkusSagen (1)

Top Labels

Issue Labels

enhancement (8) bug (4) help wanted (1) wontfix (1)

Pull Request Labels

Packages

Total packages: 3
Total downloads:
- pypi 159,014 last-month
Total docker downloads: 5,590

Total dependent packages: 28
(may contain duplicates)
Total dependent repositories: 141
(may contain duplicates)
Total versions: 45
Total maintainers: 1

pypi.org: nlpaug

Natural language processing augmentation library for deep neural networks

Homepage: https://github.com/makcedward/nlpaug
Documentation: https://nlpaug.readthedocs.io/
License: MIT
Latest release: 1.1.11
published about 4 years ago

Versions: 37
Dependent Packages: 28
Dependent Repositories: 141
Downloads: 159,014 Last month
Docker Downloads: 5,590

Rankings

Dependent packages count: 0.6%

Downloads: 0.8%

Stargazers count: 1.1%

Dependent repos count: 1.3%

Average: 1.4%

Docker downloads count: 2.2%

Forks count: 2.5%

Maintainers (1)

makcedward

Last synced: 10 months ago

proxy.golang.org: github.com/makcedward/nlpaug

Documentation: https://pkg.go.dev/github.com/makcedward/nlpaug#section-documentation
License: mit
Latest release: v0.0.5
published about 7 years ago

Versions: 1
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Stargazers count: 1.1%

Forks count: 1.4%

Average: 3.4%

Dependent packages count: 5.4%

Dependent repos count: 5.7%

Last synced: 11 months ago

conda-forge.org: nlpaug

This python library helps you with augmenting NLP for your machine learning projects. `Augmenter` is the basic element of augmentation while `Flow` is a pipeline to orchestra multi augmenter together. Nlpaug generates synthetic data for improving model performance without manual effort. It is a simple and easy-to-use and lightweight library where you can augment data in 3 lines of code, and features plug and play to any machine leanring and neural network frameworks (e.g. scikit-learn, PyTorch, TensorFlow). Nlpaug supports textual and audio input as well.

Homepage: https://github.com/makcedward/nlpaug
License: MIT
Latest release: 1.1.11
published about 4 years ago

Versions: 7
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Stargazers count: 5.4%

Forks count: 7.4%

Average: 24.5%

Dependent repos count: 34.0%

Dependent packages count: 51.2%

Last synced: 11 months ago

Dependencies

requirements.txt pypi

gdown >=4.0.0
numpy >=1.16.2
pandas >=1.2.0
requests >=2.22.0

requirements_dev.txt pypi

gensim >=4.1.2 development
librosa >=0.9 development
nltk >=3.4.5 development
pyinstrument * development
python-dotenv >=0.10.1 development
setuptools >=39.1.0 development
simpletransformers * development
torch * development
transformers * development

setup.py pypi

https://github.com/makcedward/nlpaug

Science Score: 33.0%

Keywords

Keywords from Contributors

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

nlpaug

Features

Textual Data Augmentation Example

Acoustic Data Augmentation Example

Quick Demo

Augmenter

Flow

Installation

Recent Changes

1.1.11 Jul 6, 2022

Extension Reading

Reference

Citation

Workshops cited nlpaug

Book cited nlpaug

Research paper cited nlpaug

Contributions

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: nlpaug

Rankings

Maintainers (1)

proxy.golang.org: github.com/makcedward/nlpaug

Rankings

conda-forge.org: nlpaug

Rankings

Dependencies