foliautils
Command-line utilities for working with the Format for Linguistic Annotation (FoLiA), powered by libfolia (C++), written by Ko van der Sloot (CLST, Radboud University)
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (1.4%) to scientific vocabulary
Keywords
computational-linguistics
folia
nlp
Last synced: 10 months ago
·
JSON representation
Repository
Command-line utilities for working with the Format for Linguistic Annotation (FoLiA), powered by libfolia (C++), written by Ko van der Sloot (CLST, Radboud University)
Basic Info
- Host: GitHub
- Owner: LanguageMachines
- License: gpl-3.0
- Language: C++
- Default Branch: master
- Homepage: https://proycon.github.io/folia
- Size: 45.5 MB
Statistics
- Stars: 4
- Watchers: 8
- Forks: 3
- Open Issues: 8
- Releases: 24
Topics
computational-linguistics
folia
nlp
Created about 11 years ago
· Last pushed 12 months ago
Metadata Files
Readme
Changelog
License
Authors
Codemeta
README
Please consult README.md for more information
Owner
- Name: Language Machines
- Login: LanguageMachines
- Kind: organization
- Email: proycon@anaproy.nl
- Location: Nijmegen, The Netherlands
- Website: http://cls.ru.nl/languagemachines
- Repositories: 53
- Profile: https://github.com/LanguageMachines
NLP Research group at Centre for Language Studies, Radboud University Nijmegen
CodeMeta (codemeta.json)
{
"@context": [
"https://doi.org/10.5063/schema/codemeta-2.0",
"http://schema.org",
"https://w3id.org/software-types"
],
"@type": "SoftwareSourceCode",
"identifier": "foliautils",
"name": "foliautils",
"version": "0.23",
"description": "Command-line utilities for working with the Format for Linguistic Annotation (FoLiA).",
"license": "https://spdx.org/licenses/GPL-3.0-only",
"url": "https://github.com/LanguageMachines/foliautils",
"author": [
{
"@type": "Person",
"givenName": "Ko",
"familyName": "van der Sloot",
"email": "ko.vandersloot@let.ru.nl",
"affiliation": {
"@id": "https://www.ru.nl/clst",
"@type": "Organization",
"name": "Centre for Language and Speech Technology",
"url": "https://www.ru.nl/clst",
"parentOrganization": {
"@id": "https://www.ru.nl/cls",
"@type": "Organization",
"name": "Centre for Language Studies",
"url": "https://www.ru.nl/cls",
"parentOrganization": {
"@id": "https://www.ru.nl",
"name": "Radboud University",
"@type": "Organization",
"url": "https://www.ru.nl",
"location": {
"@type": "Place",
"name": "Nijmegen"
}
}
}
}
},
{
"@id": "https://orcid.org/0000-0002-1046-0006",
"@type": "Person",
"givenName": "Maarten",
"familyName": "van Gompel",
"email": "proycon@anaproy.nl",
"affiliation": {
"@id": "https://www.ru.nl/clst"
}
}
],
"sourceOrganization": {
"@id": "https://www.ru.nl/clst"
},
"programmingLanguage": {
"@type": "ComputerLanguage",
"identifier": "c++",
"name": "C++"
},
"operatingSystem": "POSIX",
"codeRepository": "https://github.com/LanguageMachines/foliautils",
"softwareRequirements": [
{
"@type": "SoftwareApplication",
"identifier": "icu",
"name": "icu"
},
{
"@type": "SoftwareApplication",
"identifier": "libxml2",
"name": "libxml2"
},
{
"@type": "SoftwareApplication",
"identifier": "ticcutils",
"name": "ticcutils"
},
{
"@type": "SoftwareApplication",
"identifier": "libfolia",
"name": "libfolia"
}
],
"readme": "https://github.com/LanguageMachines/foliautils/blob/master/README.md",
"issueTracker": "https://github.com/LanguageMachines/foliautils/issues",
"contIntegration": "https://travis-ci.org/LanguageMachines/foliautils",
"releaseNotes": "https://github.com/LanguageMachines/foliautils/releases",
"developmentStatus": "https://www.repostatus.org/#active",
"keywords": [
"nlp",
"natural language processing",
"folia",
"xml",
"linguistic annotation"
],
"referencePublication": [
{
"@type": "ScholarlyArticle",
"name": "FoLiA: A practical XML format for linguistic annotation - a descriptive and comparative study",
"author": [
"Maarten van Gompel",
"Martin Reynaert"
],
"pageStart": 63,
"pageEnd": 81,
"isPartOf": {
"@type": "PublicationIssue",
"datePublised": "2014",
"name": "Computational Linguistics in the Netherlands Journal",
"issue": "3"
},
"url": "http://www.clinjournal.org/sites/clinjournal.org/files/05-vanGompel-Reynaert-CLIN2013.pdf"
},
{
"@type": "TechArticle",
"name": "FoLiA: Format for Linguistic Annotation, Documentation",
"author": [
"Maarten van Gompel"
],
"isPartOf": {
"@type": "PublicationIssue",
"datePublised": "2014",
"name": "Language and Speech Technology Technical Report Series",
"issue": "14-01",
"location": "Nijmegen, the Netherlands"
},
"url": "https://github.com/proycon/folia/raw/master/docs/folia.pdf"
}
],
"targetProduct": [
{
"@type": "CommandLineApplication",
"name": "FoLiA-2text",
"executableName": "FoLiA-2text",
"description": "Convert FoLiA documents into plain text",
"consumesData": [
{
"@type": "TextDigitalDocument",
"encodingFormat": "application/folia+xml"
}
],
"producesData": [
{
"@type": "TextDigitalDocument",
"encodingFormat": "text/plain"
}
]
},
{
"@type": "CommandLineApplication",
"name": "FoLiA-txt",
"executableName": "FoLiA-txt",
"description": "Convert plain text to FoLiA, the output will contain only <p> and <str> nodes. See ucto or rst2folia (FoLiA-tools) for alternatives.",
"consumesData": [
{
"@type": "TextDigitalDocument",
"encodingFormat": "text/plain"
}
],
"producesData": [
{
"@type": "TextDigitalDocument",
"encodingFormat": "application/folia+xml"
}
]
},
{
"@type": "CommandLineApplication",
"name": "FoLiA-page",
"executableName": "FoLiA-page",
"description": "Convert PAGE XML to FoLiA",
"consumesData": [
{
"@type": "TextDigitalDocument",
"encodingFormat": "application/page+xml"
}
],
"producesData": [
{
"@type": "TextDigitalDocument",
"encodingFormat": "application/folia+xml"
}
]
},
{
"@type": "CommandLineApplication",
"name": "FoLiA-hocr",
"executableName": "FoLiA-hocr",
"description": "Convert hOCR (as outputted by Tesseract) to FoLiA",
"consumesData": [
{
"@type": "TextDigitalDocument",
"encodingFormat": "text/html"
}
],
"producesData": [
{
"@type": "TextDigitalDocument",
"encodingFormat": "application/folia+xml"
}
]
},
{
"@type": "CommandLineApplication",
"name": "FoLiA-alto",
"executableName": "FoLiA-alto",
"description": "Convert ALTO DIDL files into a series of FoLiA documents"
},
{
"@type": "CommandLineApplication",
"name": "FoLiA-langcat",
"executableName": "FoLiA-langcat",
"description": "Language Identification using textcat.",
"consumesData": [
{
"@type": "TextDigitalDocument",
"encodingFormat": "application/folia+xml"
}
],
"producesData": [
{
"@type": "TextDigitalDocument",
"encodingFormat": "application/folia+xml"
}
]
},
{
"@type": "CommandLineApplication",
"name": "FoLiA-idf",
"executableName": "FoLiA-idf",
"description": "Count words in a series of FoLiA documents and compute IDF statistics, which are outputted to a tsv file",
"consumesData": [
{
"@type": "TextDigitalDocument",
"encodingFormat": "application/folia+xml"
}
],
"producesData": [
{
"@type": "Dataset",
"encodingFormat": "text/tab-separated-values"
}
]
},
{
"@type": "CommandLineApplication",
"name": "FoLiA-stats",
"executableName": "FoLiA-stats",
"description": "Gather n-gram statistics over a series of FoLiA documents",
"consumesData": [
{
"@type": "TextDigitalDocument",
"encodingFormat": "application/folia+xml"
}
],
"producesData": [
{
"@type": "Dataset",
"encodingFormat": "text/tab-separated-values"
}
]
},
{
"@type": "CommandLineApplication",
"name": "FoLiA-collect",
"executableName": "FoLiA-collect",
"description": "Collect n-gram statistics from tsv files produced by FoLiA-stats, aggregating results.",
"consumesData": [
{
"@type": "Dataset",
"encodingFormat": "text/tab-seperated-values"
}
],
"producesData": [
{
"@type": "Dataset",
"encodingFormat": "text/tab-separated-values"
}
]
},
{
"@type": "CommandLineApplication",
"name": "FoLiA-correct",
"executableName": "FoLiA-correct",
"description": "Correct FoLiA documents using correction candidates generated by TICCL-rank (from ticcltools)",
"consumesData": [
{
"@type": "TextDigitalDocument",
"encodingFormat": "application/folia+xml"
}
],
"producesData": [
{
"@type": "TextDigitalDocument",
"encodingFormat": "application/folia+xml"
}
]
},
{
"@type": "CommandLineApplication",
"name": "FoLiA-wordtranslate",
"executableName": "FoLiA-wordtranslate",
"description": "Simple word-by-word translator on the basis of a dictonary and/or rewrite rules",
"consumesData": [
{
"@type": "TextDigitalDocument",
"encodingFormat": "application/folia+xml"
}
],
"producesData": [
{
"@type": "TextDigitalDocument",
"encodingFormat": "application/folia+xml"
}
]
},
{
"@type": "CommandLineApplication",
"name": "FoLiA-clean",
"executableName": "FoLiA-clean",
"description": "FoLiA-clean will produce a cleaned up version of a FoLiA file, or a whole directory of FoLiA files, removing specified annotation types and specified text classes",
"consumesData": [
{
"@type": "TextDigitalDocument",
"encodingFormat": "application/folia+xml"
}
],
"producesData": [
{
"@type": "TextDigitalDocument",
"encodingFormat": "application/folia+xml"
}
]
},
{
"@type": "CommandLineApplication",
"name": "FoLiA-pm",
"executableName": "file:///FoLiA-pm",
"description": "Convert Political Maskup XML to FoLiA",
"producesData": [
{
"@type": "TextDigitalDocument",
"encodingFormat": "application/folia+xml"
}
]
}
]
}
GitHub Events
Total
- Create event: 2
- Release event: 1
- Issues event: 5
- Issue comment event: 8
- Push event: 14
Last Year
- Create event: 2
- Release event: 1
- Issues event: 5
- Issue comment event: 8
- Push event: 14
Issues and Pull Requests
Last synced: over 1 year ago
All Time
- Total issues: 73
- Total pull requests: 0
- Average time to close issues: 5 months
- Average time to close pull requests: N/A
- Total issue authors: 9
- Total pull request authors: 0
- Average comments per issue: 4.95
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 0
- Average time to close issues: 15 minutes
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 5.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- kosloot (28)
- martinreynaert (15)
- proycon (13)
- pirolen (11)
- egpbos (1)
- VincentCCL (1)
- peterdekker (1)
- antalvdb (1)
- alankessler (1)
Pull Request Authors
Top Labels
Issue Labels
enhancement (27)
bug (11)
question (6)
ready (3)
Testing (2)
low priority (2)
Pull Request Labels
Dependencies
.github/workflows/foliautils.yml
actions
- Gottox/irc-message-action v2 composite
- actions/checkout v2 composite
- styfle/cancel-workflow-action 0.11.0 composite
Dockerfile
docker
- alpine latest build