Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (1.4%) to scientific vocabulary

Keywords from Contributors

computational-linguistics language-modelling lm
Last synced: 6 months ago · JSON representation

Repository

Tools for TICCL

Basic Info
  • Host: GitHub
  • Owner: LanguageMachines
  • License: gpl-3.0
  • Language: C++
  • Default Branch: master
  • Size: 248 MB
Statistics
  • Stars: 14
  • Watchers: 7
  • Forks: 4
  • Open Issues: 17
  • Releases: 9
Created over 10 years ago · Last pushed 9 months ago
Metadata Files
Readme Changelog License Authors Codemeta

README

Please see README.md for more information on this package.

Owner

  • Name: Language Machines
  • Login: LanguageMachines
  • Kind: organization
  • Email: proycon@anaproy.nl
  • Location: Nijmegen, The Netherlands

NLP Research group at Centre for Language Studies, Radboud University Nijmegen

CodeMeta (codemeta.json)

{
  "@context": [
    "https://doi.org/10.5063/schema/codemeta-2.0",
    "http://schema.org",
    {
      "entryPoints": {
        "@reverse": "schema:actionApplication"
      },
      "interfaceType": {
        "@id": "codemeta:interfaceType"
      }
    }
  ],
  "@type": "SoftwareSourceCode",
  "identifier": "ticcltools",
  "name": "TICCLTools",
  "version": "0.11",
  "description": "TicclTools is a collection of programs to process datafiles, together they constitute the bulk of TICCL: Text Induced Corpus-Cleanup. This software consists of individual modules that are invoked by the pipeline system PICCL.",
  "license": "https://spdx.org/licenses/GPL-3.0",
  "url": "https://github.com/LanguageMachines/ticcltools",
  "author": [
    {
      "@type": "Person",
      "givenName": "Martin",
      "familyName": "Reynaert",
      "email": "reynaert@uvt.nl",
      "affiliation": {
        "@id": "https://www.ru.nl/clst",
        "@type": "Organization",
        "name": "Centre for Language and Speech Technology",
        "url": "https://www.ru.nl/clst",
        "parentOrganization": {
          "@id": "https://www.ru.nl/cls",
          "@type": "Organization",
          "name": "Centre for Language Studies",
          "url": "https://www.ru.nl/cls",
          "parentOrganization": {
            "@id": "https://www.ru.nl",
            "name": "Radboud University",
            "@type": "Organization",
            "url": "https://www.ru.nl",
            "location": {
              "@type": "Place",
              "name": "Nijmegen"
            }
          }
        }
      }
    },
    {
      "@type": "Person",
      "givenName": "Ko",
      "familyName": "van der Sloot",
      "email": "k.vandersloot@let.ru.nl",
      "affiliation": {
        "@id": "https://www.ru.nl/clst"
      }
    }
  ],
  "sourceOrganization": {
    "@id": "https://www.ru.nl/clst"
  },
  "programmingLanguage": {
    "@type": "ComputerLanguage",
    "identifier": "c++",
    "name": "C++"
  },
  "operatingSystem": "POSIX",
  "codeRepository": "https://github.com/LanguageMachines/ticcltools",
  "softwareRequirements": [
    {
      "@type": "SoftwareApplication",
      "identifier": "ticcutils",
      "name": "Ticcutils"
    }
  ],
  "funder": [
    {
      "@type": "Organization",
      "name": "CLARIN-NL"
    },
    {
      "@type": "Organization",
      "name": "CLARIAH",
      "url": "https://www.clariah.nl"
    }
  ],
  "readme": "https://github.com/LanguageMachines/ticcltools/blob/master/README.md",
  "issueTracker": "https://github.com/LanguageMachines/ticcltools/issues",
  "contIntegration": "https://travis-ci.org/LanguageMachines/ticcltools",
  "releaseNotes": "https://github.com/LanguageMachines/ticcltools/releases",
  "developmentStatus": "unsupported",
  "keywords": [
    "nlp",
    "natural language processing",
    "ocr",
    "normalization"
  ],
  "referencePublication": [
    {
      "@type": "ScholarlyArticle",
      "name": "PICCL: Philosophical Integrator of Computational and Corpus Libraries",
      "author": [
        "Martin Reynaert",
        "Maarten van Gompel",
        "Ko van der Sloot",
        "Antal van den Bosch"
      ],
      "pageStart": "75",
      "pageEnd": 79,
      "isPartOf": {
        "@type": "PublicationIssue",
        "datePublised": "2015",
        "name": "Proceedings of CLARIN Annual Conference 2015",
        "location": "Wrocaw, Poland"
      },
      "url": "http://www.nederlab.nl/cms/wp-content/uploads/2015/10/Reynaert_PICCL-Philosophical-Integrator-of-Computational-and-Corpus-Libraries.pdf"
    }
  ],
  "dateCreated": "2015",
  "entryPoints": [
    {
      "@type": "EntryPoint",
      "urlTemplate": "file:///TICCL-indexer",
      "name": "TICCL-indexer",
      "description": "A tool to create an exhaustive index to all lexical variants given a particular Levenshtein or edit distance in a corpus.",
      "interfaceType": "CLI"
    },
    {
      "@type": "EntryPoint",
      "urlTemplate": "file:///TICCL-indexerNT",
      "name": "TICCL-indexerNT",
      "description": "A tool to create an exhaustive index to all lexical variants given a particular Levenshtein or edit distance in a corpus.",
      "interfaceType": "CLI"
    },
    {
      "@type": "EntryPoint",
      "urlTemplate": "file:///TICCL-anahash",
      "name": "TICCL-anahash",
      "description": "A tool to create anagram hashes from a word frequency file. Also creates a 'alphabet' file of the unicode characters that are present in the corpus.",
      "interfaceType": "CLI"
    },
    {
      "@type": "EntryPoint",
      "urlTemplate": "file:///TICCL-LDcalc",
      "name": "TICCL-LDcalc",
      "description": "A pre-processing tool for TICCL-rank. Gathers the info from TICC-anahash, TICCL-indexer, TICCL-lexstat and TICCL-unk",
      "interfaceType": "CLI"
    },
    {
      "@type": "EntryPoint",
      "urlTemplate": "file:///TICCL-rank",
      "name": "TICCL-rank",
      "description": "ranks a word varian list according to a lot of criteria",
      "interfaceType": "CLI"
    },
    {
      "@type": "EntryPoint",
      "urlTemplate": "file:///TICCL-unk",
      "name": "TICCL-unk",
      "description": "a cleanup tool for word frequency lists. creates a 'clean' file with desirable words, an 'unk' file with uncorrectable words and a 'punct' file with words that would be clean after removing puncuation.",
      "interfaceType": "CLI"
    },
    {
      "@type": "EntryPoint",
      "urlTemplate": "file:///TICCL-lexstat",
      "name": "TICCL-lexstat",
      "description": "convert an 'alphabet' file (from TICCL-anahash) into a frequency list of hashes and optionally a list of confusions.",
      "interfaceType": "CLI"
    }
  ]
}

GitHub Events

Total
  • Release event: 1
  • Push event: 2
  • Fork event: 1
  • Create event: 1
Last Year
  • Release event: 1
  • Push event: 2
  • Fork event: 1
  • Create event: 1

Committers

Last synced: almost 3 years ago

All Time
  • Total Commits: 790
  • Total Committers: 5
  • Avg Commits per committer: 158.0
  • Development Distribution Score (DDS): 0.114
Past Year
  • Commits: 38
  • Committers: 2
  • Avg Commits per committer: 19.0
  • Development Distribution Score (DDS): 0.132
Top Committers
Name Email Commits
Ko van der Sloot K****t@l****l 700
Ko van der Sloot K****t@z****l 48
martinreynaert m****t@u****m 24
Maarten van Gompel p****n@a****l 10
sloot s****t@1****3 8
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: about 1 year ago

All Time
  • Total issues: 47
  • Total pull requests: 0
  • Average time to close issues: 7 months
  • Average time to close pull requests: N/A
  • Total issue authors: 7
  • Total pull request authors: 0
  • Average comments per issue: 4.47
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: about 4 hours
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 2.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • martinreynaert (21)
  • kosloot (18)
  • proycon (4)
  • egpbos (1)
  • tokee (1)
  • VincentCCL (1)
  • peterdekker (1)
Pull Request Authors
Top Labels
Issue Labels
enhancement (20) testing (4) bug (3) invalid (1) help wanted (1) question (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads: unknown
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 4
conda-forge.org: ticcltools

TicclTools is a collection of programs to process datafiles, together they constitute the bulk of TICCL: Text Induced Corpus-Cleanup. The main programs in this colection are: * TICCL-indexer and TICCL-indexerNT: a tool to create an exhaustive index to all lexical variants given a particular Levenshtein or edit distance in a corpus. * TICCL-anahash: a tool to create anagram hashes form a word frequency file. Also creates ab 'alphabet' file of the unicode characters that are present in the corpus. * TICCL-LDcalc: a proprocessing tool for TICCL-rank. Gathers the info from TICC-anahash, TICCL-indexer, TICCL-lexstat and TICCL-unk * TICCL-rank: ranks a word varian list according to al lot of criteria * TICCL-unk: a cleanup tool for word frequency lists. creates a 'clean' file with desirable words, an 'unk' file with uncorrectable words and a 'punct' file with words that would be clean after removing puncuation. * TICCL-lexstat: convert an 'alphabet' file (from TICCL-anahash) into a frequency list of hashes and optionally a list of confusions.

  • Versions: 4
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 34.0%
Average: 46.4%
Stargazers count: 48.9%
Dependent packages count: 51.2%
Forks count: 51.6%
Last synced: 7 months ago

Dependencies

Dockerfile docker
  • alpine latest build
.github/workflows/cleanup.yml actions
  • Mattraks/delete-workflow-runs v2 composite
.github/workflows/ticcltools.yml actions
  • Gottox/irc-message-action v2 composite
  • actions/checkout v3 composite
  • styfle/cancel-workflow-action 0.11.0 composite