Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.8%) to scientific vocabulary
Repository
A collection of NLP pipelines powered by Nextflow
Basic Info
- Host: GitHub
- Owner: proycon
- License: other
- Language: Groovy
- Default Branch: master
- Size: 88.9 KB
Statistics
- Stars: 3
- Watchers: 3
- Forks: 0
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
aNtiLoPe: Natural Language Processing pipelines that run!
aNtiLoPe offers various NLP workflows that build on a variety of tools. This repository hosts the relevant workflows, powered by Nextflow. The tools the workflows depend on are not included as-such, but aNtiLope itself and all its dependencies are shipped as part of our LaMachine software distribution.
Some related but more specialised workflows are available as standalone projects: * PICCL - A set of workflows for corpus building through OCR, post-correction and normalisation. * Nederlab Pipeline - Linguistic enrichment pipeline for historical dutch, as used in the Nederlab project * Quoll - NLP text classification pipeline
Running these workflows, as opposed to manually invoking the underlying NLP tools that do the actual work, enables less effort on the part of the user, and more portability and scalability, as the pipelines can be executed across multiple computing nodes on a high performance cluster such as SGE, LSF, SLURM, PBS, HTCondor, Kubernetes and Amazon AWS. Parallellisation is handled automatically. Consult the Nextflow documentation for details regarding this.
aNtiLoPe makes extensive use of the FoLiA format, a rich XML-based format for linguistic annotation.
Important Note: This is beta software still in development; for the old and deprecated version consult this repository.
Installation
aNtiLoPe is already shipped as a part of LaMachine, you may need to explicitly add it using lamachine-add antilope if you already have a LaMachine instance running.
The workflows are invoked on the command line and end with the extension .nf.
It's also possible to use Nextflow directly and have it install and use the Docker flavour of LaMachine.
In this case you need to ensure to always run it with the -with-docker proycon/lamachine parameter:
$ nextflow run proycon/aNtiLoPe -with-docker proycon/lamachine
Workflows
tokenize.nf- A tokenisation workflow using the ucto tokeniser; takes either plaintext or untokenised FoLiA documents (e.g. output from ticcl), and produces tokenised FoLiA documents.frog.nf- An NLP workflow for Dutch using the frog NLP suite; takes either plaintext or FoLiA documents and produces linguistically enriched FoLiA documents, takes care of tokenisation as well.foliavalidator.nf- A simple validation workflow to validate FoLiA documents. Uses the FoLiA toolsfoliaupgrader.nf- An upgrade tool to upgrade FoLiA documents to FoLiA v2. Uses the FoLiA tools
Running with these workflows with the --help parameter or absence of any parameters will output usage
information.
Technical Details & Contributing
Please see CONTRIBUTE.md for technical details and information on how to contribute.
Owner
- Name: Maarten van Gompel
- Login: proycon
- Kind: user
- Location: Eindhoven, the Netherlands
- Company: KNAW Humanities Cluster & CLST, Radboud University
- Website: https://proycon.anaproy.nl
- Repositories: 213
- Profile: https://github.com/proycon
Research software engineer - NLP - AI - 🐧 Linux & open-source enthusiast - 🐍 Python/ 🌊C/C++ / 🦀 Rust / 🐚 Shell - 🔐 InfoSec - https://git.sr.ht/~proycon
CodeMeta (codemeta.json)
{
"@context": [
"https://doi.org/10.5063/schema/codemeta-2.0",
"http://schema.org",
{
"entryPoints": {
"@reverse": "schema:actionApplication"
},
"interfaceType": {
"@id": "codemeta:interfaceType"
}
}
],
"@type": "SoftwareSourceCode",
"identifier": "antilope",
"name": "aNtiLoPe",
"version": "0.8.0",
"description": "A set of workflows for Natural Language Processing",
"license": "https://spdx.org/licenses/GPL-3.0",
"url": "https://github.com/proycon/aNtiLoPe",
"producer": {
"@id": "https://www.ru.nl/clst",
"@type": "Organization",
"name": "Centre for Language and Speech Technology",
"url": "https://www.ru.nl/clst",
"parentOrganization": {
"@id": "https://www.ru.nl/cls",
"@type": "Organization",
"name": "Centre for Language Studies",
"url": "https://www.ru.nl/cls",
"parentOrganization": {
"@id": "https://www.ru.nl",
"name": "Radboud University",
"@type": "Organization",
"url": "https://www.ru.nl",
"location": {
"@type": "Place",
"name": "Nijmegen"
}
}
}
},
"author": [
{
"@id": "https://orcid.org/0000-0002-1046-0006",
"@type": "Person",
"givenName": "Maarten",
"familyName": "van Gompel",
"email": "proycon@anaproy.nl",
"affiliation": {
"@id": "https://www.ru.nl/clst"
}
}
],
"sourceOrganization": {
"@id": "https://www.ru.nl/clst"
},
"programmingLanguage": {
"@type": "ComputerLanguage",
"identifier": "nextflow",
"name": "Nextflow"
},
"operatingSystem": "POSIX",
"codeRepository": "https://github.com/proycon/aNtiLoPe",
"softwareRequirements": [
{
"@type": "SoftwareApplication",
"identifier": "nextflow",
"name": "Nextflow"
},
{
"@type": "SoftwareApplication",
"identifier": "frog",
"name": "Frog"
},
{
"@type": "SoftwareApplication",
"identifier": "ucto",
"name": "Ucto"
},
{
"@type": "SoftwareApplication",
"identifier": "foliautils",
"name": "FoLiA utilities"
}
],
"funder": [
{
"@type": "Organization",
"name": "Nederlab",
"url": "http://www.nederlab.nl"
}
],
"readme": "https://github.com/proycon/aNtiLoPe/blob/master/README.md",
"issueTracker": "https://github.com/proycon/aNtiLoPe/issues",
"contIntegration": "https://travis-ci.org/proycon/aNtiLoPe",
"releaseNotes": "https://github.com/proycon/aNtiLoPe/releases",
"developmentStatus": "active",
"keywords": [
"nlp",
"natural language processing"
],
"dateCreated": "2017",
"entryPoints": [
{
"@type": "EntryPoint",
"urlTemplate": "file:///frog.nf",
"name": "Frog.nf",
"description": "Dutch NLP suite for various linguistic enrichments",
"interfaceType": "CLI"
},
{
"@type": "EntryPoint",
"urlTemplate": "file:///ucto.nf",
"name": "Ucto",
"description": "Tokeniser",
"interfaceType": "CLI"
},
{
"@type": "EntryPoint",
"urlTemplate": "file:///foliavalidator.nf",
"name": "Folia Validator",
"description": "FoLiA validator",
"interfaceType": "CLI"
},
{
"@type": "EntryPoint",
"urlTemplate": "file:///foliaupgrader.nf",
"name": "Folia Upgrader",
"description": "FoLiA upgrader to upgrade to FoLiA v2",
"interfaceType": "CLI"
}
]
}
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1
Committers
Last synced: about 1 year ago
Top Committers
| Name | Commits | |
|---|---|---|
| Maarten van Gompel | p****n@a****l | 146 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: about 1 year ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0