Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (1.4%) to scientific vocabulary
Keywords from Contributors
Repository
Tools for TICCL
Basic Info
- Host: GitHub
- Owner: LanguageMachines
- License: gpl-3.0
- Language: C++
- Default Branch: master
- Size: 248 MB
Statistics
- Stars: 14
- Watchers: 7
- Forks: 4
- Open Issues: 17
- Releases: 9
Metadata Files
README
Please see README.md for more information on this package.
Owner
- Name: Language Machines
- Login: LanguageMachines
- Kind: organization
- Email: proycon@anaproy.nl
- Location: Nijmegen, The Netherlands
- Website: http://cls.ru.nl/languagemachines
- Repositories: 53
- Profile: https://github.com/LanguageMachines
NLP Research group at Centre for Language Studies, Radboud University Nijmegen
CodeMeta (codemeta.json)
{
"@context": [
"https://doi.org/10.5063/schema/codemeta-2.0",
"http://schema.org",
{
"entryPoints": {
"@reverse": "schema:actionApplication"
},
"interfaceType": {
"@id": "codemeta:interfaceType"
}
}
],
"@type": "SoftwareSourceCode",
"identifier": "ticcltools",
"name": "TICCLTools",
"version": "0.11",
"description": "TicclTools is a collection of programs to process datafiles, together they constitute the bulk of TICCL: Text Induced Corpus-Cleanup. This software consists of individual modules that are invoked by the pipeline system PICCL.",
"license": "https://spdx.org/licenses/GPL-3.0",
"url": "https://github.com/LanguageMachines/ticcltools",
"author": [
{
"@type": "Person",
"givenName": "Martin",
"familyName": "Reynaert",
"email": "reynaert@uvt.nl",
"affiliation": {
"@id": "https://www.ru.nl/clst",
"@type": "Organization",
"name": "Centre for Language and Speech Technology",
"url": "https://www.ru.nl/clst",
"parentOrganization": {
"@id": "https://www.ru.nl/cls",
"@type": "Organization",
"name": "Centre for Language Studies",
"url": "https://www.ru.nl/cls",
"parentOrganization": {
"@id": "https://www.ru.nl",
"name": "Radboud University",
"@type": "Organization",
"url": "https://www.ru.nl",
"location": {
"@type": "Place",
"name": "Nijmegen"
}
}
}
}
},
{
"@type": "Person",
"givenName": "Ko",
"familyName": "van der Sloot",
"email": "k.vandersloot@let.ru.nl",
"affiliation": {
"@id": "https://www.ru.nl/clst"
}
}
],
"sourceOrganization": {
"@id": "https://www.ru.nl/clst"
},
"programmingLanguage": {
"@type": "ComputerLanguage",
"identifier": "c++",
"name": "C++"
},
"operatingSystem": "POSIX",
"codeRepository": "https://github.com/LanguageMachines/ticcltools",
"softwareRequirements": [
{
"@type": "SoftwareApplication",
"identifier": "ticcutils",
"name": "Ticcutils"
}
],
"funder": [
{
"@type": "Organization",
"name": "CLARIN-NL"
},
{
"@type": "Organization",
"name": "CLARIAH",
"url": "https://www.clariah.nl"
}
],
"readme": "https://github.com/LanguageMachines/ticcltools/blob/master/README.md",
"issueTracker": "https://github.com/LanguageMachines/ticcltools/issues",
"contIntegration": "https://travis-ci.org/LanguageMachines/ticcltools",
"releaseNotes": "https://github.com/LanguageMachines/ticcltools/releases",
"developmentStatus": "unsupported",
"keywords": [
"nlp",
"natural language processing",
"ocr",
"normalization"
],
"referencePublication": [
{
"@type": "ScholarlyArticle",
"name": "PICCL: Philosophical Integrator of Computational and Corpus Libraries",
"author": [
"Martin Reynaert",
"Maarten van Gompel",
"Ko van der Sloot",
"Antal van den Bosch"
],
"pageStart": "75",
"pageEnd": 79,
"isPartOf": {
"@type": "PublicationIssue",
"datePublised": "2015",
"name": "Proceedings of CLARIN Annual Conference 2015",
"location": "Wrocaw, Poland"
},
"url": "http://www.nederlab.nl/cms/wp-content/uploads/2015/10/Reynaert_PICCL-Philosophical-Integrator-of-Computational-and-Corpus-Libraries.pdf"
}
],
"dateCreated": "2015",
"entryPoints": [
{
"@type": "EntryPoint",
"urlTemplate": "file:///TICCL-indexer",
"name": "TICCL-indexer",
"description": "A tool to create an exhaustive index to all lexical variants given a particular Levenshtein or edit distance in a corpus.",
"interfaceType": "CLI"
},
{
"@type": "EntryPoint",
"urlTemplate": "file:///TICCL-indexerNT",
"name": "TICCL-indexerNT",
"description": "A tool to create an exhaustive index to all lexical variants given a particular Levenshtein or edit distance in a corpus.",
"interfaceType": "CLI"
},
{
"@type": "EntryPoint",
"urlTemplate": "file:///TICCL-anahash",
"name": "TICCL-anahash",
"description": "A tool to create anagram hashes from a word frequency file. Also creates a 'alphabet' file of the unicode characters that are present in the corpus.",
"interfaceType": "CLI"
},
{
"@type": "EntryPoint",
"urlTemplate": "file:///TICCL-LDcalc",
"name": "TICCL-LDcalc",
"description": "A pre-processing tool for TICCL-rank. Gathers the info from TICC-anahash, TICCL-indexer, TICCL-lexstat and TICCL-unk",
"interfaceType": "CLI"
},
{
"@type": "EntryPoint",
"urlTemplate": "file:///TICCL-rank",
"name": "TICCL-rank",
"description": "ranks a word varian list according to a lot of criteria",
"interfaceType": "CLI"
},
{
"@type": "EntryPoint",
"urlTemplate": "file:///TICCL-unk",
"name": "TICCL-unk",
"description": "a cleanup tool for word frequency lists. creates a 'clean' file with desirable words, an 'unk' file with uncorrectable words and a 'punct' file with words that would be clean after removing puncuation.",
"interfaceType": "CLI"
},
{
"@type": "EntryPoint",
"urlTemplate": "file:///TICCL-lexstat",
"name": "TICCL-lexstat",
"description": "convert an 'alphabet' file (from TICCL-anahash) into a frequency list of hashes and optionally a list of confusions.",
"interfaceType": "CLI"
}
]
}
GitHub Events
Total
- Release event: 1
- Push event: 2
- Fork event: 1
- Create event: 1
Last Year
- Release event: 1
- Push event: 2
- Fork event: 1
- Create event: 1
Committers
Last synced: almost 3 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| Ko van der Sloot | K****t@l****l | 700 |
| Ko van der Sloot | K****t@z****l | 48 |
| martinreynaert | m****t@u****m | 24 |
| Maarten van Gompel | p****n@a****l | 10 |
| sloot | s****t@1****3 | 8 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: about 1 year ago
All Time
- Total issues: 47
- Total pull requests: 0
- Average time to close issues: 7 months
- Average time to close pull requests: N/A
- Total issue authors: 7
- Total pull request authors: 0
- Average comments per issue: 4.47
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 0
- Average time to close issues: about 4 hours
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 2.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- martinreynaert (21)
- kosloot (18)
- proycon (4)
- egpbos (1)
- tokee (1)
- VincentCCL (1)
- peterdekker (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
- Total downloads: unknown
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 4
conda-forge.org: ticcltools
TicclTools is a collection of programs to process datafiles, together they constitute the bulk of TICCL: Text Induced Corpus-Cleanup. The main programs in this colection are: * TICCL-indexer and TICCL-indexerNT: a tool to create an exhaustive index to all lexical variants given a particular Levenshtein or edit distance in a corpus. * TICCL-anahash: a tool to create anagram hashes form a word frequency file. Also creates ab 'alphabet' file of the unicode characters that are present in the corpus. * TICCL-LDcalc: a proprocessing tool for TICCL-rank. Gathers the info from TICC-anahash, TICCL-indexer, TICCL-lexstat and TICCL-unk * TICCL-rank: ranks a word varian list according to al lot of criteria * TICCL-unk: a cleanup tool for word frequency lists. creates a 'clean' file with desirable words, an 'unk' file with uncorrectable words and a 'punct' file with words that would be clean after removing puncuation. * TICCL-lexstat: convert an 'alphabet' file (from TICCL-anahash) into a frequency list of hashes and optionally a list of confusions.
- Homepage: https://github.com/LanguageMachines/ticcltools
- License: GPL-3.0-only
-
Latest release: 0.7.1
published about 5 years ago
Rankings
Dependencies
- alpine latest build
- Mattraks/delete-workflow-runs v2 composite
- Gottox/irc-message-action v2 composite
- actions/checkout v3 composite
- styfle/cancel-workflow-action 0.11.0 composite