zensols-nlparse

Natural language processing parsing and tool library

https://github.com/plandes/nlparse

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.0%) to scientific vocabulary

Keywords

natural-language-processing nlp-machine-learning pypi-badge pypi-link spacy spacy-nlp
Last synced: 6 months ago · JSON representation ·

Repository

Natural language processing parsing and tool library

Basic Info
Statistics
  • Stars: 5
  • Watchers: 2
  • Forks: 2
  • Open Issues: 0
  • Releases: 0
Topics
natural-language-processing nlp-machine-learning pypi-badge pypi-link spacy spacy-nlp
Created over 6 years ago · Last pushed 7 months ago
Metadata Files
Readme Changelog Contributing License Citation

README.md

Zensols Natural Language Parsing

PyPI Python 3.11 Python 3.12 Build Status

From the paper DeepZensols: A Deep Learning Natural Language Processing Framework for Experimentation and Reproducibility. This framework wraps the spaCy framework and creates light weight features in a class hierarchy that reflects the structure of natural language. The motivation is to generate features from the parsed text in an object oriented fashion that is fast and easy to pickle.

Other features include: * Parse and normalize a stream of tokens as stop words, punctuation filters, up/down casing, porter stemming and others. * Detached features that are safe and easy to pickle to disk. * Configuration drive parsing and token normalization using [configuration factories]. * Pretty print functionality for easy natural language feature selection. * A comprehensive scoring module including following scoring methods: * Rouge * Bleu * SemEval-2013 Task 9.1 * Levenshtein distance * Exact match

Documentation

Obtaining / Installing

The library can be installed with pip from the pypi repository: bash pip3 install zensols.nlp The smallest base spaCy model will automatically be downloaded on the first use. You can download other models, such as the medium base model using the following command: bash python -m spacy download en_core_web_md

Usage

A parser using the default configuration can be obtained by: ```python from zensols.nlp import FeatureDocumentParser parser: FeatureDocumentParser = FeatureDocumentParser.defaultinstance() doc = parser('Obama was the 44th president of the United States.') for tok in doc.tokens: print(tok.norm, tok.pos, tok.tag_) print(doc.entities)

Obama PROPN NNP was AUX VBD the DET DT 45th ADJ JJ president NOUN NN of ADP IN the United States DET DT . PUNCT . (, <45th>, ) ```

However, minimal effort is needed to configure the parser using a resource library: ```python from io import StringIO from zensols.config import ImportIniConfig, ImportConfigFactory from zensols.nlp import FeatureDocument, FeatureDocumentParser

CONFIG = """

import the zensols.nlp library

[import] config_file = resource(zensols.nlp): resources/obj.conf

override the parse to keep only the norm, ent

[docparser] tokenfeatureids = set: ent, tag_ """

if (name == 'main'): fac = ImportConfigFactory(ImportIniConfig(StringIO(CONFIG))) docparser: FeatureDocumentParser = fac('docparser') sent = 'He was George Washington and first president of the United States.' doc: FeatureDocument = doc_parser(sent) for tok in doc.tokens: tok.write() ```

This uses a resource library to source in the configuration from this package so minimal configuration is necessary. More advanced configuration examples are also available.

See the feature documents for more information.

Scoring

Certain scores in the scoring module need additional Python packages. These are installed with: bash pip install -R src/python/requirements-score.txt

Attribution

This project, or example code, uses: * spaCy for natural language parsing * msgpack and smart-open for Python disk serialization * nltk for the porter stemmer functionality

Citation

If you use this project in your research please use the following BibTeX entry:

bibtex @inproceedings{landes-etal-2023-deepzensols, title = "{D}eep{Z}ensols: A Deep Learning Natural Language Processing Framework for Experimentation and Reproducibility", author = "Landes, Paul and Di Eugenio, Barbara and Caragea, Cornelia", editor = "Tan, Liling and Milajevs, Dmitrijs and Chauhan, Geeticka and Gwinnup, Jeremy and Rippeth, Elijah", booktitle = "Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)", month = dec, year = "2023", address = "Singapore, Singapore", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.nlposs-1.16", pages = "141--146" }

Changelog

An extensive changelog is available here.

Community

Please star this repository and let me know how and where you use this API. Contributions as pull requests, feedback and any input is welcome.

License

MIT License

Copyright (c) 2020 - 2025 Paul Landes

Owner

  • Name: Paul Landes
  • Login: plandes
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
title: >-
  DeepZensols: Deep Learning Framework
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
date-released: 2023-12-05
repository-code: https://github.com/plandes/deepnlp
authors:
  - given-names: Paul
    family-names: Landes
    email: landes@mailc.net
    affiliation: University of Illinois at Chicago
    orcid: 'https://orcid.org/0000-0003-0985-0864'
preferred-citation:
  type: conference-paper
  authors:
    - given-names: Paul
      family-names: Landes
      email: landes@mailc.net
      affiliation: University of Illinois at Chicago
      orcid: 'https://orcid.org/0000-0003-0985-0864'
    - given-names: Barbara
      family-names: Di Eugenio
      affiliation: University of Illinois at Chicago
    - given-names: Cornelia
      family-names: Caragea
      affiliation: University of Illinois at Chicago
  title: >-
    DeepZensols: A Deep Learning Natural Language Processing Framework for
    Experimentation and Reproducibility
  url: https://aclanthology.org/2023.nlposs-1.16/
  year: 2023
  conference:
    name: >-
      Proceedings of the 3rd Workshop for Natural Language Processing Open
      Source Software, Empirical Methods in Natural Language Processing
    city: Singapore
    country: SG
    date-start: 2023-12-05
    date-end: 2023-12-05

GitHub Events

Total
  • Push event: 24
  • Create event: 7
Last Year
  • Push event: 24
  • Create event: 7

Committers

Last synced: over 1 year ago

All Time
  • Total Commits: 653
  • Total Committers: 1
  • Avg Commits per committer: 653.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 69
  • Committers: 1
  • Avg Commits per committer: 69.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Paul Landes l****s@m****t 653
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads: unknown
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 1
  • Total maintainers: 1
pypi.org: zensols-nlparse

This framework wraps the spaCy framework and creates light weight features in a class hierarchy that reflects the structure of natural language

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 9.0%
Average: 29.7%
Dependent repos count: 50.5%
Maintainers (1)
Last synced: 6 months ago

Dependencies

src/python/requirements.txt pypi
  • msgpack >=1.0.0
  • msgpack-numpy >=0.4.7.1
  • nltk >=3.5
  • python-Levenshtein *
  • smart-open >=4.0.1
  • spacy *
  • zensols.util *
.github/workflows/test.yml actions
  • actions/checkout v2.4.0 composite
  • actions/setup-python v2 composite
src/python/requirements-score.txt pypi
  • rouge-score *
src/python/requirements-model.txt pypi
src/python/setup.py pypi