zensols-deepnlp

Deep learning utility library for natural language processing (NLP-OSS paper)

https://github.com/plandes/deepnlp

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.3%) to scientific vocabulary

Keywords

deep-learning deep-neural-networks framework natural-language-processing nlp
Last synced: 4 months ago · JSON representation ·

Repository

Deep learning utility library for natural language processing (NLP-OSS paper)

Basic Info
Statistics
  • Stars: 10
  • Watchers: 2
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Topics
deep-learning deep-neural-networks framework natural-language-processing nlp
Created over 5 years ago · Last pushed 5 months ago
Metadata Files
Readme Changelog Contributing License Citation

README.md

DeepZensols Natural Language Processing

PyPI Python 3.12 Python 3.11 Build Status

Deep learning utility library for natural language processing that aids in feature engineering and embedding layers from the paper A Deep Learning Natural Language Processing Framework for Experimentation and Reproducibility.

Features: * Configurable layers with little to no need to write code. * Natural language specific layers: * Easily configurable word embedding layers for Glove, Word2Vec, fastText. * Huggingface transformer (BERT) context based word vector layer. * Full Embedding+BiLSTM-CRF implementation using easy to configure constituent layers. * NLP specific vectorizers that generate zensols deeplearn encoded and decoded batched tensors for spaCy parsed features, dependency tree features, overlapping text features and others. * Easily swapable during runtime embedded layers as batched tensors and other linguistic vectorized features. * Support for token, document and embedding level vectorized features. * Transformer word piece to linguistic token mapping. * Two full documented reference models provided as both command line and Jupyter notebooks. * Command line support for training, testing, debugging, and creating predictions.

Documentation

Obtaining

The easiest way to install the command line program is via the pip installer: bash pip3 install zensols.deepnlp

Binaries are also available on pypi.

Usage

The API can be used as is and manually configuring each component. However, this (like any Zensols API) was designed to instantiated with inverse of control using resource libraries.

Component

Components and out of the box models are available with little to no coding. However, this simple example that uses the library's components is recommended for starters. The example is a command line application that in-lines a simple configuration needed to create deep learning NLP components.

Similarly, this example is also a command line example, but uses a masked langauge model to fill in words.

Reference Models

If you're in a rush, you can dive right in to the Clickbate Text Classification reference model, which is a working project that uses this library. However, you'll either end up reading up on the zensols deeplearn library before or during the tutorial.

The usage of this library is explained in terms of the reference models:

The unit test cases are also a good resource for the more detailed programming integration with various parts of the library.

Attribution

This project, or reference model code, uses: * Gensim for Glove, Word2Vec and fastText word embeddings. * Huggingface Transformers for BERT contextual word embeddings. * h5py for fast read access to word embedding vectors. * zensols nlparse for feature generation from spaCy parsing. * zensols deeplearn for deep learning network libraries.

Corpora used include: * Stanford movie review * Cornell sentiment polarity * CoNLL 2003 data set

Citation

If you use this project in your research please use the following BibTeX entry:

bibtex @inproceedings{landes-etal-2023-deepzensols, title = "{D}eep{Z}ensols: A Deep Learning Natural Language Processing Framework for Experimentation and Reproducibility", author = "Landes, Paul and Di Eugenio, Barbara and Caragea, Cornelia", editor = "Tan, Liling and Milajevs, Dmitrijs and Chauhan, Geeticka and Gwinnup, Jeremy and Rippeth, Elijah", booktitle = "Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)", month = dec, year = "2023", address = "Singapore, Singapore", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.nlposs-1.16", pages = "141--146" }

Changelog

An extensive changelog is available here.

Community

Please star this repository and let me know how and where you use this API. Contributions as pull requests, feedback and any input is welcome.

License

MIT License

Copyright (c) 2020 - 2025 Paul Landes

Owner

  • Name: Paul Landes
  • Login: plandes
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
title: >-
  DeepZensols: Deep Learning Framework
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
date-released: 2023-12-05
repository-code: https://github.com/plandes/deepnlp
authors:
  - given-names: Paul
    family-names: Landes
    email: landes@mailc.net
    affiliation: University of Illinois at Chicago
    orcid: 'https://orcid.org/0000-0003-0985-0864'
preferred-citation:
  type: conference-paper
  authors:
    - given-names: Paul
      family-names: Landes
      email: landes@mailc.net
      affiliation: University of Illinois at Chicago
      orcid: 'https://orcid.org/0000-0003-0985-0864'
    - given-names: Barbara
      family-names: Di Eugenio
      affiliation: University of Illinois at Chicago
    - given-names: Cornelia
      family-names: Caragea
      affiliation: University of Illinois at Chicago
  title: >-
    DeepZensols: A Deep Learning Natural Language Processing Framework for
    Experimentation and Reproducibility
  url: https://aclanthology.org/2023.nlposs-1.16/
  year: 2023
  conference:
    name: >-
      Proceedings of the 3rd Workshop for Natural Language Processing Open
      Source Software, Empirical Methods in Natural Language Processing
    city: Singapore
    country: SG
    date-start: 2023-12-05
    date-end: 2023-12-05

GitHub Events

Total
  • Push event: 18
  • Create event: 5
Last Year
  • Push event: 18
  • Create event: 5

Committers

Last synced: about 1 year ago

All Time
  • Total Commits: 1,158
  • Total Committers: 1
  • Avg Commits per committer: 1,158.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 63
  • Committers: 1
  • Avg Commits per committer: 63.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Paul Landes l****s@m****t 1,158
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 5 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 180 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 36
  • Total maintainers: 1
pypi.org: zensols-deepnlp

Deep learning utility library for natural language processing that aids in feature engineering and embedding layers.

  • Versions: 36
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 180 Last month
Rankings
Dependent packages count: 9.0%
Downloads: 12.2%
Average: 23.9%
Dependent repos count: 50.4%
Maintainers (1)
Last synced: 5 months ago

Dependencies

src/python/requirements.txt pypi
  • gensim *
  • h5py >=3.3.0
  • huggingface-hub *
  • protobuf *
  • sentencepiece *
  • transformers *
  • zensols.deeplearn *
  • zensols.nlp *
.github/workflows/test.yml actions
  • actions/checkout v2.4.0 composite
  • actions/setup-python v2 composite
src/python/requirements-model.txt pypi
src/python/setup.py pypi