foliautils

Command-line utilities for working with the Format for Linguistic Annotation (FoLiA), powered by libfolia (C++), written by Ko van der Sloot (CLST, Radboud University)

https://github.com/languagemachines/foliautils

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (1.4%) to scientific vocabulary

Keywords

computational-linguistics folia nlp
Last synced: 10 months ago · JSON representation

Repository

Command-line utilities for working with the Format for Linguistic Annotation (FoLiA), powered by libfolia (C++), written by Ko van der Sloot (CLST, Radboud University)

Basic Info
Statistics
  • Stars: 4
  • Watchers: 8
  • Forks: 3
  • Open Issues: 8
  • Releases: 24
Topics
computational-linguistics folia nlp
Created about 11 years ago · Last pushed 12 months ago
Metadata Files
Readme Changelog License Authors Codemeta

README

Please consult README.md for more information

Owner

  • Name: Language Machines
  • Login: LanguageMachines
  • Kind: organization
  • Email: proycon@anaproy.nl
  • Location: Nijmegen, The Netherlands

NLP Research group at Centre for Language Studies, Radboud University Nijmegen

CodeMeta (codemeta.json)

{
  "@context": [
    "https://doi.org/10.5063/schema/codemeta-2.0",
    "http://schema.org",
    "https://w3id.org/software-types"
  ],
  "@type": "SoftwareSourceCode",
  "identifier": "foliautils",
  "name": "foliautils",
  "version": "0.23",
  "description": "Command-line utilities for working with the Format for Linguistic Annotation (FoLiA).",
  "license": "https://spdx.org/licenses/GPL-3.0-only",
  "url": "https://github.com/LanguageMachines/foliautils",
  "author": [
    {
      "@type": "Person",
      "givenName": "Ko",
      "familyName": "van der Sloot",
      "email": "ko.vandersloot@let.ru.nl",
      "affiliation": {
        "@id": "https://www.ru.nl/clst",
        "@type": "Organization",
        "name": "Centre for Language and Speech Technology",
        "url": "https://www.ru.nl/clst",
        "parentOrganization": {
          "@id": "https://www.ru.nl/cls",
          "@type": "Organization",
          "name": "Centre for Language Studies",
          "url": "https://www.ru.nl/cls",
          "parentOrganization": {
            "@id": "https://www.ru.nl",
            "name": "Radboud University",
            "@type": "Organization",
            "url": "https://www.ru.nl",
            "location": {
              "@type": "Place",
              "name": "Nijmegen"
            }
          }
        }
      }
    },
    {
      "@id": "https://orcid.org/0000-0002-1046-0006",
      "@type": "Person",
      "givenName": "Maarten",
      "familyName": "van Gompel",
      "email": "proycon@anaproy.nl",
      "affiliation": {
        "@id": "https://www.ru.nl/clst"
      }
    }
  ],
  "sourceOrganization": {
    "@id": "https://www.ru.nl/clst"
  },
  "programmingLanguage": {
    "@type": "ComputerLanguage",
    "identifier": "c++",
    "name": "C++"
  },
  "operatingSystem": "POSIX",
  "codeRepository": "https://github.com/LanguageMachines/foliautils",
  "softwareRequirements": [
    {
      "@type": "SoftwareApplication",
      "identifier": "icu",
      "name": "icu"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "libxml2",
      "name": "libxml2"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "ticcutils",
      "name": "ticcutils"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "libfolia",
      "name": "libfolia"
    }
  ],
  "readme": "https://github.com/LanguageMachines/foliautils/blob/master/README.md",
  "issueTracker": "https://github.com/LanguageMachines/foliautils/issues",
  "contIntegration": "https://travis-ci.org/LanguageMachines/foliautils",
  "releaseNotes": "https://github.com/LanguageMachines/foliautils/releases",
  "developmentStatus": "https://www.repostatus.org/#active",
  "keywords": [
    "nlp",
    "natural language processing",
    "folia",
    "xml",
    "linguistic annotation"
  ],
  "referencePublication": [
    {
      "@type": "ScholarlyArticle",
      "name": "FoLiA: A practical XML format for linguistic annotation - a descriptive and comparative study",
      "author": [
        "Maarten van Gompel",
        "Martin Reynaert"
      ],
      "pageStart": 63,
      "pageEnd": 81,
      "isPartOf": {
        "@type": "PublicationIssue",
        "datePublised": "2014",
        "name": "Computational Linguistics in the Netherlands Journal",
        "issue": "3"
      },
      "url": "http://www.clinjournal.org/sites/clinjournal.org/files/05-vanGompel-Reynaert-CLIN2013.pdf"
    },
    {
      "@type": "TechArticle",
      "name": "FoLiA: Format for Linguistic Annotation, Documentation",
      "author": [
        "Maarten van Gompel"
      ],
      "isPartOf": {
        "@type": "PublicationIssue",
        "datePublised": "2014",
        "name": "Language and Speech Technology Technical Report Series",
        "issue": "14-01",
        "location": "Nijmegen, the Netherlands"
      },
      "url": "https://github.com/proycon/folia/raw/master/docs/folia.pdf"
    }
  ],
  "targetProduct": [
    {
      "@type": "CommandLineApplication",
      "name": "FoLiA-2text",
      "executableName": "FoLiA-2text",
      "description": "Convert FoLiA documents into plain text",
      "consumesData": [
        {
          "@type": "TextDigitalDocument",
          "encodingFormat": "application/folia+xml"
        }
      ],
      "producesData": [
        {
          "@type": "TextDigitalDocument",
          "encodingFormat": "text/plain"
        }
      ]
    },
    {
      "@type": "CommandLineApplication",
      "name": "FoLiA-txt",
      "executableName": "FoLiA-txt",
      "description": "Convert plain text to FoLiA, the output will contain only <p> and <str> nodes. See ucto or rst2folia (FoLiA-tools) for alternatives.",
      "consumesData": [
        {
          "@type": "TextDigitalDocument",
          "encodingFormat": "text/plain"
        }
      ],
      "producesData": [
        {
          "@type": "TextDigitalDocument",
          "encodingFormat": "application/folia+xml"
        }
      ]
    },
    {
      "@type": "CommandLineApplication",
      "name": "FoLiA-page",
      "executableName": "FoLiA-page",
      "description": "Convert PAGE XML to FoLiA",
      "consumesData": [
        {
          "@type": "TextDigitalDocument",
          "encodingFormat": "application/page+xml"
        }
      ],
      "producesData": [
        {
          "@type": "TextDigitalDocument",
          "encodingFormat": "application/folia+xml"
        }
      ]
    },
    {
      "@type": "CommandLineApplication",
      "name": "FoLiA-hocr",
      "executableName": "FoLiA-hocr",
      "description": "Convert hOCR (as outputted by Tesseract) to FoLiA",
      "consumesData": [
        {
          "@type": "TextDigitalDocument",
          "encodingFormat": "text/html"
        }
      ],
      "producesData": [
        {
          "@type": "TextDigitalDocument",
          "encodingFormat": "application/folia+xml"
        }
      ]
    },
    {
      "@type": "CommandLineApplication",
      "name": "FoLiA-alto",
      "executableName": "FoLiA-alto",
      "description": "Convert ALTO DIDL files into a series of FoLiA documents"
    },
    {
      "@type": "CommandLineApplication",
      "name": "FoLiA-langcat",
      "executableName": "FoLiA-langcat",
      "description": "Language Identification using textcat.",
      "consumesData": [
        {
          "@type": "TextDigitalDocument",
          "encodingFormat": "application/folia+xml"
        }
      ],
      "producesData": [
        {
          "@type": "TextDigitalDocument",
          "encodingFormat": "application/folia+xml"
        }
      ]
    },
    {
      "@type": "CommandLineApplication",
      "name": "FoLiA-idf",
      "executableName": "FoLiA-idf",
      "description": "Count words in a series of FoLiA documents and compute IDF statistics, which are outputted to a tsv file",
      "consumesData": [
        {
          "@type": "TextDigitalDocument",
          "encodingFormat": "application/folia+xml"
        }
      ],
      "producesData": [
        {
          "@type": "Dataset",
          "encodingFormat": "text/tab-separated-values"
        }
      ]
    },
    {
      "@type": "CommandLineApplication",
      "name": "FoLiA-stats",
      "executableName": "FoLiA-stats",
      "description": "Gather n-gram statistics over a series of FoLiA documents",
      "consumesData": [
        {
          "@type": "TextDigitalDocument",
          "encodingFormat": "application/folia+xml"
        }
      ],
      "producesData": [
        {
          "@type": "Dataset",
          "encodingFormat": "text/tab-separated-values"
        }
      ]
    },
    {
      "@type": "CommandLineApplication",
      "name": "FoLiA-collect",
      "executableName": "FoLiA-collect",
      "description": "Collect n-gram statistics from tsv files produced by FoLiA-stats, aggregating results.",
      "consumesData": [
        {
          "@type": "Dataset",
          "encodingFormat": "text/tab-seperated-values"
        }
      ],
      "producesData": [
        {
          "@type": "Dataset",
          "encodingFormat": "text/tab-separated-values"
        }
      ]
    },
    {
      "@type": "CommandLineApplication",
      "name": "FoLiA-correct",
      "executableName": "FoLiA-correct",
      "description": "Correct FoLiA documents using correction candidates generated by TICCL-rank (from ticcltools)",
      "consumesData": [
        {
          "@type": "TextDigitalDocument",
          "encodingFormat": "application/folia+xml"
        }
      ],
      "producesData": [
        {
          "@type": "TextDigitalDocument",
          "encodingFormat": "application/folia+xml"
        }
      ]
    },
    {
      "@type": "CommandLineApplication",
      "name": "FoLiA-wordtranslate",
      "executableName": "FoLiA-wordtranslate",
      "description": "Simple word-by-word translator on the basis of a dictonary and/or rewrite rules",
      "consumesData": [
        {
          "@type": "TextDigitalDocument",
          "encodingFormat": "application/folia+xml"
        }
      ],
      "producesData": [
        {
          "@type": "TextDigitalDocument",
          "encodingFormat": "application/folia+xml"
        }
      ]
    },
    {
      "@type": "CommandLineApplication",
      "name": "FoLiA-clean",
      "executableName": "FoLiA-clean",
      "description": "FoLiA-clean will produce a cleaned up version of a FoLiA file, or a whole directory of FoLiA files, removing specified annotation types and specified text classes",
      "consumesData": [
        {
          "@type": "TextDigitalDocument",
          "encodingFormat": "application/folia+xml"
        }
      ],
      "producesData": [
        {
          "@type": "TextDigitalDocument",
          "encodingFormat": "application/folia+xml"
        }
      ]
    },
    {
      "@type": "CommandLineApplication",
      "name": "FoLiA-pm",
      "executableName": "file:///FoLiA-pm",
      "description": "Convert Political Maskup XML to FoLiA",
      "producesData": [
        {
          "@type": "TextDigitalDocument",
          "encodingFormat": "application/folia+xml"
        }
      ]
    }
  ]
}

GitHub Events

Total
  • Create event: 2
  • Release event: 1
  • Issues event: 5
  • Issue comment event: 8
  • Push event: 14
Last Year
  • Create event: 2
  • Release event: 1
  • Issues event: 5
  • Issue comment event: 8
  • Push event: 14

Issues and Pull Requests

Last synced: over 1 year ago

All Time
  • Total issues: 73
  • Total pull requests: 0
  • Average time to close issues: 5 months
  • Average time to close pull requests: N/A
  • Total issue authors: 9
  • Total pull request authors: 0
  • Average comments per issue: 4.95
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: 15 minutes
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 5.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • kosloot (28)
  • martinreynaert (15)
  • proycon (13)
  • pirolen (11)
  • egpbos (1)
  • VincentCCL (1)
  • peterdekker (1)
  • antalvdb (1)
  • alankessler (1)
Pull Request Authors
Top Labels
Issue Labels
enhancement (27) bug (11) question (6) ready (3) Testing (2) low priority (2)
Pull Request Labels

Dependencies

.github/workflows/foliautils.yml actions
  • Gottox/irc-message-action v2 composite
  • actions/checkout v2 composite
  • styfle/cancel-workflow-action 0.11.0 composite
Dockerfile docker
  • alpine latest build