ucto
Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto comes with tokenisation rules for several languages and can be easily extended to suit other languages. It has been incorporated for tokenizing Dutch text in Frog, our Dutch morpho-syntactic processor. http://ilk.uvt.nl/ucto --
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (3.7%) to scientific vocabulary
Keywords
Repository
Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto comes with tokenisation rules for several languages and can be easily extended to suit other languages. It has been incorporated for tokenizing Dutch text in Frog, our Dutch morpho-syntactic processor. http://ilk.uvt.nl/ucto --
Basic Info
- Host: GitHub
- Owner: LanguageMachines
- License: gpl-3.0
- Language: C++
- Default Branch: master
- Homepage: https://languagemachines.github.io/ucto
- Size: 6.14 MB
Statistics
- Stars: 70
- Watchers: 12
- Forks: 14
- Open Issues: 12
- Releases: 47
Topics
Metadata Files
README
Please see README.md for more information
Owner
- Name: Language Machines
- Login: LanguageMachines
- Kind: organization
- Email: proycon@anaproy.nl
- Location: Nijmegen, The Netherlands
- Website: http://cls.ru.nl/languagemachines
- Repositories: 53
- Profile: https://github.com/LanguageMachines
NLP Research group at Centre for Language Studies, Radboud University Nijmegen
CodeMeta (codemeta.json)
{
"@context": [
"https://doi.org/10.5063/schema/codemeta-2.0",
"http://schema.org",
"https://w3id.org/software-types",
"https://w3id.org/software-iodata"
],
"@type": "SoftwareSourceCode",
"identifier": "ucto",
"name": "ucto",
"version": "0.35",
"description": "Ucto tokenizes text files: it separates words from punctuation, and splits sentences. This is one of the first tasks for almost any Natural Language Processing application. Ucto offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation.",
"license": "https://spdx.org/licenses/GPL-3.0-only",
"url": "https://languagemachines.github.io/ucto",
"thumbnailUrl": "https://raw.githubusercontent.com/LanguageMachines/ucto/master/logo.svg",
"producer": {
"@id": "https://huc.knaw.nl",
"@type": "Organization",
"name": "KNAW Humanities Cluster",
"url": "https://huc.knaw.nl",
"parentOrganization": {
"@id": "https://knaw.nl",
"@type": "Organization",
"name": "KNAW",
"url": "https://knaw.nl",
"location": {
"@type": "Place",
"name": "Amsterdam"
}
}
},
"author": [
{
"@id": "https://orcid.org/0000-0002-1046-0006",
"@type": "Person",
"givenName": "Maarten",
"familyName": "van Gompel",
"email": "proycon@anaproy.nl",
"affiliation": {
"@id": "https://huc.knaw.nl"
}
},
{
"@type": "Person",
"givenName": "Ko",
"familyName": "van der Sloot",
"email": "ko.vandersloot@let.ru.nl",
"affiliation": {
"@id": "https://www.ru.nl/clst",
"@type": "Organization",
"name": "Centre for Language and Speech Technology",
"url": "https://www.ru.nl/clst",
"parentOrganization": {
"@id": "https://www.ru.nl/cls",
"@type": "Organization",
"name": "Centre for Language Studies",
"url": "https://www.ru.nl/cls",
"parentOrganization": {
"@id": "https://www.ru.nl",
"name": "Radboud University",
"@type": "Organization",
"url": "https://www.ru.nl",
"location": {
"@type": "Place",
"name": "Nijmegen"
}
}
}
}
}
],
"programmingLanguage": {
"@type": "ComputerLanguage",
"identifier": "c++",
"name": "C++"
},
"operatingSystem": [
"Linux",
"BSD",
"macOS"
],
"codeRepository": "https://github.com/LanguageMachines/ucto",
"softwareRequirements": [
{
"@type": "SoftwareApplication",
"identifier": "icu",
"name": "icu"
},
{
"@type": "SoftwareApplication",
"identifier": "libxml2",
"name": "libxml2"
},
{
"@type": "SoftwareApplication",
"identifier": "ticcutils",
"name": "ticcutils"
},
{
"@type": "SoftwareApplication",
"identifier": "libfolia",
"name": "libfolia"
}
],
"funding": [
{
"@type": "Grant",
"name": "CLARIN-NL (NWO grant 184.021.003)",
"url": "https://www.clariah.nl",
"funder": {
"@type": "Organization",
"name": "NWO",
"url": "https://www.nwo.nl"
}
},
{
"@type": "Grant",
"name": "CLARIAH-CORE (NWO grant 184.033.101)",
"url": "https://www.clariah.nl",
"funder": {
"@type": "Organization",
"name": "NWO",
"url": "https://www.nwo.nl"
}
},
{
"@type": "Grant",
"name": "CLARIAH-PLUS (NWO grant 184.034.023)",
"funder": {
"@type": "Organization",
"name": "NWO",
"url": "https://www.nwo.nl"
}
}
],
"readme": "https://github.com/LanguageMachines/ucto/blob/master/README.md",
"softwareHelp": [
{
"@id": "https://ucto.readthedocs.io",
"@type": "WebSite",
"name": "Ucto documentation",
"url": "https://ucto.readthedocs.io"
}
],
"issueTracker": "https://github.com/LanguageMachines/ucto/issues",
"contIntegration": "https://github.com/LanguageMachines/ucto/actions/workflows/ucto.yml",
"releaseNotes": "https://github.com/LanguageMachines/ucto/releases",
"developmentStatus": [
"https://www.repostatus.org/#active",
"https://w3id.org/research-technology-readiness-levels#Level9Proven"
],
"keywords": [
"nlp",
"natural language processing",
"tokenization",
"tokenizer"
],
"dateCreated": "2011-03-27",
"dateModified": "2023-02-22T12:17:06Z+0100",
"applicationCategory": [
"https://vocabs.dariah.eu/tadirah/annotating",
"https://vocabs.dariah.eu/tadirah/tagging",
"https://w3id.org/nwo-research-fields#Linguistics",
"https://w3id.org/nwo-research-fields#TextualAndContentAnalysis"
],
"targetProduct": [
{
"@type": "SoftwareLibrary",
"executableName": "libucto",
"name": "libucto",
"runtimePlatform": [
"Linux",
"BSD",
"macOS"
],
"description": "Ucto Library with API for C++"
},
{
"@type": "CommandLineApplication",
"executableName": "ucto",
"name": "ucto",
"runtimePlatform": [
"Linux",
"BSD",
"macOS"
],
"description": "Command-line interface to the tokenizer",
"consumesData": [
{
"@type": "TextDigitalDocument",
"encodingFormat": "text/plain",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/nld",
"@type": "Language",
"name": "Dutch",
"identifier": "nld"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "text/plain",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/eng",
"@type": "Language",
"name": "English",
"identifier": "eng"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "text/plain",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/ita",
"@type": "Language",
"name": "Italian",
"identifier": "ita"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "text/plain",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/fra",
"@type": "Language",
"name": "French",
"identifier": "fra"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "text/plain",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/spa",
"@type": "Language",
"name": "Spanish",
"identifier": "spa"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "text/plain",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/por",
"@type": "Language",
"name": "Portuguese",
"identifier": "por"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "text/plain",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/deu",
"@type": "Language",
"name": "German",
"identifier": "deu"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "text/plain",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/fry",
"@type": "Language",
"name": "Frisian",
"identifier": "fry"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "text/plain",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/swe",
"@type": "Language",
"name": "Swedish",
"identifier": "swe"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "text/plain",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/rus",
"@type": "Language",
"name": "Russian",
"identifier": "rus"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "text/plain",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/tur",
"@type": "Language",
"name": "Turkish",
"identifier": "tur"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "application/folia+xml",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/nld"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "application/folia.xml",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/eng"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "application/folia.xml",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/ita"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "application/folia+xml",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/fra"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "application/folia+xml",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/spa"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "application/folia.xml",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/por"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "application/folia.xml",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/deu"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "application/folia.xml",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/fry"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "application/folia.xml",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/swe"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "application/folia.xml",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/rus"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "application/folia.xml",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/tur"
}
]
}
],
"producesData": [
{
"@type": "TextDigitalDocument",
"encodingFormat": "text/plain",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/nld"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "text/plain",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/eng"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "text/plain",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/ita"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "text/plain",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/fra"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "text/plain",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/spa"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "text/plain",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/por"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "text/plain",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/deu"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "text/plain",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/fry"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "text/plain",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/swe"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "text/plain",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/rus"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "text/plain",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/tur"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "application/folia+xml",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/nld"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "application/folia.xml",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/eng"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "application/folia.xml",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/ita"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "application/folia+xml",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/fra"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "application/folia+xml",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/spa"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "application/folia.xml",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/por"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "application/folia.xml",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/deu"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "application/folia.xml",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/fry"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "application/folia.xml",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/swe"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "application/folia.xml",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/rus"
}
]
},
{
"@type": "TextDigitalDocument",
"encodingFormat": "application/folia.xml",
"inLanguage": [
{
"@id": "https://iso639-3.sil.org/code/tur"
}
]
}
]
}
]
}
GitHub Events
Total
- Watch event: 5
- Push event: 13
- Fork event: 1
- Create event: 1
Last Year
- Watch event: 5
- Push event: 13
- Fork event: 1
- Create event: 1
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Ko van der Sloot | K****t@l****l | 692 |
| sloot | s****t@1****3 | 608 |
| mvgompel | m****l@1****3 | 213 |
| Iris Hendrickx | i****s@i****l | 6 |
| fkarsdorp | f****p@1****3 | 5 |
| Sander Maijers | S****s@g****m | 3 |
| antalb | a****b@1****3 | 1 |
| Kobus van der Sloot | s****t@a****l | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 7 months ago
All Time
- Total issues: 93
- Total pull requests: 3
- Average time to close issues: 4 months
- Average time to close pull requests: 19 days
- Total issue authors: 17
- Total pull request authors: 2
- Average comments per issue: 4.52
- Average comments per pull request: 5.0
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 2
- Pull requests: 0
- Average time to close issues: 1 day
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- kosloot (36)
- proycon (22)
- martinreynaert (6)
- pirolen (5)
- emanjavacas (4)
- JessedeDoes (4)
- Irishx (3)
- sanmai-NL (3)
- mhkuu (2)
- yurivict (1)
- a-tsioh (1)
- fkunneman (1)
- asharkinasuit (1)
- alabrashJr (1)
- marijnschraagen (1)
Pull Request Authors
- sanmai-NL (2)
- 0mp (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 34
- Total downloads: unknown
-
Total dependent packages: 16
(may contain duplicates) -
Total dependent repositories: 0
(may contain duplicates) - Total versions: 90
- Total maintainers: 1
proxy.golang.org: github.com/LanguageMachines/ucto
- Documentation: https://pkg.go.dev/github.com/LanguageMachines/ucto#section-documentation
- License: gpl-3.0
-
Latest release: v0.32.1
published almost 2 years ago
Rankings
alpine-v3.18: ucto-dev
advanced rule-based (regular-expression) and unicode-aware tokenizer for various languages (development files)
- Homepage: https://github.com/LanguageMachines/ucto
- License: GPL-3.0-only
-
Latest release: 0.29-r1
published almost 3 years ago
Rankings
Maintainers (1)
alpine-v3.18: ucto-doc
advanced rule-based (regular-expression) and unicode-aware tokenizer for various languages (documentation)
- Homepage: https://github.com/LanguageMachines/ucto
- License: GPL-3.0-only
-
Latest release: 0.29-r1
published almost 3 years ago
Rankings
Maintainers (1)
alpine-v3.18: frog-dev
Integration of natural language processing models for Dutch (development files)
- Homepage: https://github.com/LanguageMachines/ucto
- License: GPL-3.0-only
-
Latest release: 0.27.1-r3
published almost 3 years ago
Rankings
Maintainers (1)
alpine-v3.18: ucto
advanced rule-based (regular-expression) and unicode-aware tokenizer for various languages
- Homepage: https://github.com/LanguageMachines/ucto
- License: GPL-3.0-only
-
Latest release: 0.29-r1
published almost 3 years ago
Rankings
Maintainers (1)
alpine-v3.18: frog
Integration of natural language processing models for Dutch
- Homepage: https://github.com/LanguageMachines/ucto
- License: GPL-3.0-only
-
Latest release: 0.27.1-r3
published almost 3 years ago
Rankings
Maintainers (1)
alpine-v3.18: frog-doc
Integration of natural language processing models for Dutch (documentation)
- Homepage: https://github.com/LanguageMachines/ucto
- License: GPL-3.0-only
-
Latest release: 0.27.1-r3
published almost 3 years ago
Rankings
Maintainers (1)
alpine-v3.17: ucto
advanced rule-based (regular-expression) and unicode-aware tokenizer for various languages
- Homepage: https://github.com/LanguageMachines/ucto
- License: GPL-3.0-only
-
Latest release: 0.25-r1
published over 3 years ago
Rankings
Maintainers (1)
alpine-v3.17: frog
Integrator of natural language processing moduels for Dutch
- Homepage: https://github.com/LanguageMachines/ucto
- License: GPL-3.0-only
-
Latest release: 0.25-r1
published over 3 years ago
Rankings
Maintainers (1)
alpine-v3.16: frog-doc
Integrator of natural language processing moduels for Dutch (documentation)
- Homepage: https://github.com/LanguageMachines/ucto
- License: GPL-3.0-only
-
Latest release: 0.24-r2
published almost 4 years ago
Rankings
Maintainers (1)
alpine-v3.16: ucto-doc
advanced rule-based (regular-expression) and unicode-aware tokenizer for various languages (documentation)
- Homepage: https://github.com/LanguageMachines/ucto
- License: GPL-3.0-only
-
Latest release: 0.24.1-r2
published almost 4 years ago
Rankings
Maintainers (1)
alpine-v3.16: frog-dev
Integrator of natural language processing moduels for Dutch (development files)
- Homepage: https://github.com/LanguageMachines/ucto
- License: GPL-3.0-only
-
Latest release: 0.24-r2
published almost 4 years ago
Rankings
Maintainers (1)
alpine-v3.16: ucto
advanced rule-based (regular-expression) and unicode-aware tokenizer for various languages
- Homepage: https://github.com/LanguageMachines/ucto
- License: GPL-3.0-only
-
Latest release: 0.24.1-r2
published almost 4 years ago
Rankings
Maintainers (1)
alpine-v3.16: ucto-dev
advanced rule-based (regular-expression) and unicode-aware tokenizer for various languages (development files)
- Homepage: https://github.com/LanguageMachines/ucto
- License: GPL-3.0-only
-
Latest release: 0.24.1-r2
published almost 4 years ago
Rankings
Maintainers (1)
alpine-v3.16: frog
Integrator of natural language processing moduels for Dutch
- Homepage: https://github.com/LanguageMachines/ucto
- License: GPL-3.0-only
-
Latest release: 0.24-r2
published almost 4 years ago
Rankings
Maintainers (1)
alpine-edge: ucto
advanced rule-based (regular-expression) and unicode-aware tokenizer for various languages
- Homepage: https://github.com/LanguageMachines/ucto
- License: GPL-3.0-only
-
Latest release: 0.35-r1
published 11 months ago
Rankings
Maintainers (1)
alpine-edge: ucto-doc
advanced rule-based (regular-expression) and unicode-aware tokenizer for various languages (documentation)
- Homepage: https://github.com/LanguageMachines/ucto
- License: GPL-3.0-only
-
Latest release: 0.35-r1
published 11 months ago
Rankings
Maintainers (1)
alpine-edge: ucto-dev
advanced rule-based (regular-expression) and unicode-aware tokenizer for various languages (development files)
- Homepage: https://github.com/LanguageMachines/ucto
- License: GPL-3.0-only
-
Latest release: 0.35-r1
published 11 months ago
Rankings
Maintainers (1)
alpine-v3.17: frog-dev
Integrator of natural language processing moduels for Dutch (development files)
- Homepage: https://github.com/LanguageMachines/ucto
- License: GPL-3.0-only
-
Latest release: 0.25-r1
published over 3 years ago
Rankings
Maintainers (1)
alpine-v3.17: ucto-doc
advanced rule-based (regular-expression) and unicode-aware tokenizer for various languages (documentation)
- Homepage: https://github.com/LanguageMachines/ucto
- License: GPL-3.0-only
-
Latest release: 0.25-r1
published over 3 years ago
Rankings
Maintainers (1)
alpine-v3.17: frog-doc
Integrator of natural language processing moduels for Dutch (documentation)
- Homepage: https://github.com/LanguageMachines/ucto
- License: GPL-3.0-only
-
Latest release: 0.25-r1
published over 3 years ago
Rankings
Maintainers (1)
alpine-v3.17: ucto-dev
advanced rule-based (regular-expression) and unicode-aware tokenizer for various languages (development files)
- Homepage: https://github.com/LanguageMachines/ucto
- License: GPL-3.0-only
-
Latest release: 0.25-r1
published over 3 years ago
Rankings
Maintainers (1)
alpine-v3.21: ucto
advanced rule-based (regular-expression) and unicode-aware tokenizer for various languages
- Homepage: https://github.com/LanguageMachines/ucto
- License: GPL-3.0-only
-
Latest release: 0.34-r0
published over 1 year ago
Rankings
Maintainers (1)
alpine-v3.22: ucto-dev
advanced rule-based (regular-expression) and unicode-aware tokenizer for various languages (development files)
- Homepage: https://github.com/LanguageMachines/ucto
- License: GPL-3.0-only
-
Latest release: 0.35-r1
published 11 months ago
Rankings
Maintainers (1)
alpine-v3.21: ucto-doc
advanced rule-based (regular-expression) and unicode-aware tokenizer for various languages (documentation)
- Homepage: https://github.com/LanguageMachines/ucto
- License: GPL-3.0-only
-
Latest release: 0.34-r0
published over 1 year ago
Rankings
Maintainers (1)
alpine-v3.20: ucto-doc
advanced rule-based (regular-expression) and unicode-aware tokenizer for various languages (documentation)
- Homepage: https://github.com/LanguageMachines/ucto
- License: GPL-3.0-only
-
Latest release: 0.33-r0
published almost 2 years ago
Rankings
Maintainers (1)
alpine-v3.22: ucto
advanced rule-based (regular-expression) and unicode-aware tokenizer for various languages
- Homepage: https://github.com/LanguageMachines/ucto
- License: GPL-3.0-only
-
Latest release: 0.35-r1
published 11 months ago
Rankings
Maintainers (1)
alpine-v3.21: ucto-dev
advanced rule-based (regular-expression) and unicode-aware tokenizer for various languages (development files)
- Homepage: https://github.com/LanguageMachines/ucto
- License: GPL-3.0-only
-
Latest release: 0.34-r0
published over 1 year ago
Rankings
Maintainers (1)
alpine-v3.19: ucto
advanced rule-based (regular-expression) and unicode-aware tokenizer for various languages
- Homepage: https://github.com/LanguageMachines/ucto
- License: GPL-3.0-only
-
Latest release: 0.30-r1
published over 2 years ago
Rankings
alpine-v3.20: ucto
advanced rule-based (regular-expression) and unicode-aware tokenizer for various languages
- Homepage: https://github.com/LanguageMachines/ucto
- License: GPL-3.0-only
-
Latest release: 0.33-r0
published almost 2 years ago
Rankings
Maintainers (1)
alpine-v3.20: ucto-dev
advanced rule-based (regular-expression) and unicode-aware tokenizer for various languages (development files)
- Homepage: https://github.com/LanguageMachines/ucto
- License: GPL-3.0-only
-
Latest release: 0.33-r0
published almost 2 years ago
Rankings
Maintainers (1)
alpine-v3.22: ucto-doc
advanced rule-based (regular-expression) and unicode-aware tokenizer for various languages (documentation)
- Homepage: https://github.com/LanguageMachines/ucto
- License: GPL-3.0-only
-
Latest release: 0.35-r1
published 11 months ago
Rankings
Maintainers (1)
alpine-v3.19: ucto-dev
advanced rule-based (regular-expression) and unicode-aware tokenizer for various languages (development files)
- Homepage: https://github.com/LanguageMachines/ucto
- License: GPL-3.0-only
-
Latest release: 0.30-r1
published over 2 years ago
Rankings
Maintainers (1)
alpine-v3.19: ucto-doc
advanced rule-based (regular-expression) and unicode-aware tokenizer for various languages (documentation)
- Homepage: https://github.com/LanguageMachines/ucto
- License: GPL-3.0-only
-
Latest release: 0.30-r1
published over 2 years ago
Rankings
Dependencies
- Gottox/irc-message-action v2 composite
- actions/checkout v2 composite
- styfle/cancel-workflow-action 0.11.0 composite
- alpine latest build
- Mattraks/delete-workflow-runs v2 composite