nlppln

NLP pipeline software using common workflow language

https://github.com/nlppln/nlppln

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.4%) to scientific vocabulary

Keywords

cwl nlp pipeline text-mining workflow
Last synced: 7 months ago · JSON representation ·

Repository

NLP pipeline software using common workflow language

Basic Info
Statistics
  • Stars: 34
  • Watchers: 1
  • Forks: 3
  • Open Issues: 11
  • Releases: 5
Topics
cwl nlp pipeline text-mining workflow
Created over 9 years ago · Last pushed almost 7 years ago
Metadata Files
Readme Changelog License Citation Codemeta Zenodo

README.rst

NLP Pipeline
============

|codacy_grade| |travis| |documentation| |pypi_version| |pypi_supported| |zenodo|

nlppln is a python package for creating NLP pipelines using `Common Workflow Language `_ (CWL).
It provides steps for (generic) NLP functionality, such as tokenization,
lemmatization, and part of speech tagging, and helps users to construct workflows
from these steps.

A text processing step consist of a (Python) command line tool and a CWL
specification to use this tool.
Most tools provided by nppln wrap existing NLP functionality.
The command line tools are made with `Click `_, a Python
package for creating command line interfaces.

To create a workflow, you have to write a Python script:
::

  from nlppln import WorkflowGenerator

  with WorkflowGenerator() as wf:
    txt_dir = wf.add_input(txt_dir='Directory')

    frogout = wf.frog_dir(in_dir=txt_dir)
    saf = wf.frog_to_saf(in_files=frogout)
    ner_stats = wf.save_ner_data(in_files=saf)
    new_saf = wf.replace_ner(metadata=ner_stats, in_files=saf)
    txt = wf.saf_to_txt(in_files=new_saf)

    wf.add_outputs(ner_stats=ner_stats, txt=txt)

    wf.save('anonymize.cwl')

The resulting workflow can be run using a CWL runner, such as `cwltool `_:

.. code-block:: sh

  cwltool anonymize.cwl --txt_dir /path/to/directory/with/txt/files/

For creating new (e.g., project specific) NLP functionality, you can use `nlppln-gen `_
to generate boilerplate (i.e., empty) command line tools and CWL specifications.

The full documentation can be found on `Read the Docs `_.

Installation
############

Install nlppln using pip:

.. code-block:: sh

  pip install nlppln

Please check the `installation guidelines `_ for additional required software.

License
#######

Copyright (c) 2016-2018, Netherlands eScience Center, University of Twente

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

.. |codacy_grade| image:: https://api.codacy.com/project/badge/Grade/24cd15fe1d9e4a51ab4be8c247e95c47
                     :target: https://www.codacy.com/app/jvdzwaan/nlppln?utm_source=github.com&utm_medium=referral&utm_content=nlppln/nlppln&utm_campaign=Badge_Grade
                     :alt: Codacy Badge

.. |travis| image:: https://travis-ci.org/nlppln/nlppln.svg?branch=master
              :target: https://travis-ci.org/nlppln/nlppln
              :alt: Build Status

.. |documentation| image:: https://readthedocs.org/projects/nlppln/badge/?version=latest
                     :target: http://nlppln.readthedocs.io/en/latest/?badge=latest
                     :alt: Documentation Status

.. |pypi_version| image:: https://badge.fury.io/py/nlppln.svg
                    :target: https://badge.fury.io/py/nlppln
                    :alt: PyPI version

.. |pypi_supported| image:: https://img.shields.io/pypi/pyversions/nlppln.svg
                      :target: https://pypi.python.org/pypi/nlppln
                      :alt: PyPI

.. |zenodo| image:: https://zenodo.org/badge/65198876.svg
              :target: https://zenodo.org/badge/latestdoi/65198876
              :alt: DOI

Citation (CITATION.cff)

# YAML 1.2
# Metadata for citation of this software according to the CFF format (https://citation-file-format.github.io/)
cff-version: 1.0.3
message: If you use this software, please cite it as below.
title: 'NLP Pipeline (nlppln)'
doi: 10.5281/zenodo.1116323
authors:
- given-names: "Janneke M."
  family-names: Zwaan
  name-particle: "van der"
  orcid: 0000-0002-8329-7000
  affiliation: Netherlands eScience Center
- given-names: Dafne
  family-names: Kuppevelt
  name-particle: van
  affiliation: Netherlands eScience Center
version: 0.3.3
date-released: 2019-01-08
repository-code: https://github.com/nlppln/nlppln
license: Apache-2.0

CodeMeta (codemeta.json)

{
  "@context": [
    "https://doi.org/10.5063/schema/codemeta-2.0",
    "http://schema.org"
  ],
  "@type": "SoftwareSourceCode",
  "author": [
    {
      "@id": "0000-0002-8329-7000",
      "@type": "Person",
      "affiliation": {
        "@type": "Organization",
        "legalName": "Netherlands eScience Center"
      },
      "familyName": "van der Zwaan",
      "givenName": "Janneke M."
    },
    {
      "@type": "Person",
      "affiliation": {
        "@type": "Organization",
        "legalName": "Netherlands eScience Center"
      },
      "familyName": "van Kuppevelt",
      "givenName": "Dafne"
    }
  ],
  "codeRepository": "https://github.com/nlppln/nlppln",
  "license": "http://www.apache.org/licenses/LICENSE-2.0",
  "name": "NLP Pipeline (nlppln)"
}

GitHub Events

Total
Last Year

Committers

Last synced: about 3 years ago

All Time
  • Total Commits: 346
  • Total Committers: 2
  • Avg Commits per committer: 173.0
  • Development Distribution Score (DDS): 0.023
Top Committers
Name Email Commits
Janneke van der Zwaan j****n@e****l 338
dafnevk d****t@e****l 8
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 8 months ago

All Time
  • Total issues: 29
  • Total pull requests: 2
  • Average time to close issues: 10 months
  • Average time to close pull requests: 16 minutes
  • Total issue authors: 6
  • Total pull request authors: 1
  • Average comments per issue: 0.48
  • Average comments per pull request: 0.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • jvdzwaan (21)
  • dafnevk (3)
  • arnikz (2)
  • egpbos (1)
  • MartineDeVos (1)
  • arater (1)
Pull Request Authors
  • dafnevk (2)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

requirements.txt pypi
  • patool *
  • tika *
  • yamlreader *
setup.py pypi
  • beautifulsoup4 *
  • lxml *
  • pandas *
  • scipy *
  • scriptcwl >=0.8.0
  • sklearn *