zensols-nlparse
Natural language processing parsing and tool library
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.0%) to scientific vocabulary
Keywords
Repository
Natural language processing parsing and tool library
Basic Info
- Host: GitHub
- Owner: plandes
- License: other
- Language: Python
- Default Branch: master
- Homepage: https://plandes.github.io/nlparse/
- Size: 1.06 MB
Statistics
- Stars: 5
- Watchers: 2
- Forks: 2
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
Zensols Natural Language Parsing
From the paper DeepZensols: A Deep Learning Natural Language Processing Framework for Experimentation and Reproducibility. This framework wraps the spaCy framework and creates light weight features in a class hierarchy that reflects the structure of natural language. The motivation is to generate features from the parsed text in an object oriented fashion that is fast and easy to pickle.
Other features include: * Parse and normalize a stream of tokens as stop words, punctuation filters, up/down casing, porter stemming and others. * Detached features that are safe and easy to pickle to disk. * Configuration drive parsing and token normalization using [configuration factories]. * Pretty print functionality for easy natural language feature selection. * A comprehensive scoring module including following scoring methods: * Rouge * Bleu * SemEval-2013 Task 9.1 * Levenshtein distance * Exact match
Documentation
Obtaining / Installing
The library can be installed with pip from the pypi repository:
bash
pip3 install zensols.nlp
The smallest base spaCy model will automatically be downloaded on the first
use. You can download other models, such as the medium base model using the
following command:
bash
python -m spacy download en_core_web_md
Usage
A parser using the default configuration can be obtained by: ```python from zensols.nlp import FeatureDocumentParser parser: FeatureDocumentParser = FeatureDocumentParser.defaultinstance() doc = parser('Obama was the 44th president of the United States.') for tok in doc.tokens: print(tok.norm, tok.pos, tok.tag_) print(doc.entities)
Obama PROPN NNP was AUX VBD the DET DT 45th ADJ JJ president NOUN NN of ADP IN the United States DET DT . PUNCT . (
, <45th>, ) ```
However, minimal effort is needed to configure the parser using a resource library: ```python from io import StringIO from zensols.config import ImportIniConfig, ImportConfigFactory from zensols.nlp import FeatureDocument, FeatureDocumentParser
CONFIG = """
import the zensols.nlp library
[import] config_file = resource(zensols.nlp): resources/obj.conf
override the parse to keep only the norm, ent
[docparser] tokenfeatureids = set: ent, tag_ """
if (name == 'main'): fac = ImportConfigFactory(ImportIniConfig(StringIO(CONFIG))) docparser: FeatureDocumentParser = fac('docparser') sent = 'He was George Washington and first president of the United States.' doc: FeatureDocument = doc_parser(sent) for tok in doc.tokens: tok.write() ```
This uses a resource library to source in the configuration from this package so minimal configuration is necessary. More advanced configuration examples are also available.
See the feature documents for more information.
Scoring
Certain scores in the scoring module need additional Python packages. These
are installed with:
bash
pip install -R src/python/requirements-score.txt
Attribution
This project, or example code, uses: * spaCy for natural language parsing * msgpack and smart-open for Python disk serialization * nltk for the porter stemmer functionality
Citation
If you use this project in your research please use the following BibTeX entry:
bibtex
@inproceedings{landes-etal-2023-deepzensols,
title = "{D}eep{Z}ensols: A Deep Learning Natural Language Processing Framework for Experimentation and Reproducibility",
author = "Landes, Paul and
Di Eugenio, Barbara and
Caragea, Cornelia",
editor = "Tan, Liling and
Milajevs, Dmitrijs and
Chauhan, Geeticka and
Gwinnup, Jeremy and
Rippeth, Elijah",
booktitle = "Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)",
month = dec,
year = "2023",
address = "Singapore, Singapore",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.nlposs-1.16",
pages = "141--146"
}
Changelog
An extensive changelog is available here.
Community
Please star this repository and let me know how and where you use this API. Contributions as pull requests, feedback and any input is welcome.
License
Copyright (c) 2020 - 2025 Paul Landes
Owner
- Name: Paul Landes
- Login: plandes
- Kind: user
- Repositories: 90
- Profile: https://github.com/plandes
Citation (CITATION.cff)
cff-version: 1.2.0
title: >-
DeepZensols: Deep Learning Framework
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
date-released: 2023-12-05
repository-code: https://github.com/plandes/deepnlp
authors:
- given-names: Paul
family-names: Landes
email: landes@mailc.net
affiliation: University of Illinois at Chicago
orcid: 'https://orcid.org/0000-0003-0985-0864'
preferred-citation:
type: conference-paper
authors:
- given-names: Paul
family-names: Landes
email: landes@mailc.net
affiliation: University of Illinois at Chicago
orcid: 'https://orcid.org/0000-0003-0985-0864'
- given-names: Barbara
family-names: Di Eugenio
affiliation: University of Illinois at Chicago
- given-names: Cornelia
family-names: Caragea
affiliation: University of Illinois at Chicago
title: >-
DeepZensols: A Deep Learning Natural Language Processing Framework for
Experimentation and Reproducibility
url: https://aclanthology.org/2023.nlposs-1.16/
year: 2023
conference:
name: >-
Proceedings of the 3rd Workshop for Natural Language Processing Open
Source Software, Empirical Methods in Natural Language Processing
city: Singapore
country: SG
date-start: 2023-12-05
date-end: 2023-12-05
GitHub Events
Total
- Push event: 24
- Create event: 7
Last Year
- Push event: 24
- Create event: 7
Committers
Last synced: over 1 year ago
Top Committers
| Name | Commits | |
|---|---|---|
| Paul Landes | l****s@m****t | 653 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 7 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
- Total downloads: unknown
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 1
- Total maintainers: 1
pypi.org: zensols-nlparse
This framework wraps the spaCy framework and creates light weight features in a class hierarchy that reflects the structure of natural language
- Homepage: https://github.com/plandes/nlparse
- Documentation: https://plandes.github.io/nlparse
- License: MIT
-
Latest release: 1.12.2
published 8 months ago
Rankings
Maintainers (1)
Dependencies
- msgpack >=1.0.0
- msgpack-numpy >=0.4.7.1
- nltk >=3.5
- python-Levenshtein *
- smart-open >=4.0.1
- spacy *
- zensols.util *
- actions/checkout v2.4.0 composite
- actions/setup-python v2 composite
- rouge-score *