https://github.com/apertium/apertium
Core tools (driver script, transfer, tagger, formatters) for the FOSS RBMT system Apertium
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
✓Committers with academic emails
1 of 40 committers (2.5%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.3%) to scientific vocabulary
Keywords
apertium-core
Keywords from Contributors
apertium-trunk
apertium-tools
apertium-languages
apertium-incubator
apertium-nursery
apertium
nynorsk
norwegian
bokmal
web-app
Last synced: 6 months ago
·
JSON representation
Repository
Core tools (driver script, transfer, tagger, formatters) for the FOSS RBMT system Apertium
Basic Info
- Host: GitHub
- Owner: apertium
- License: gpl-2.0
- Language: C++
- Default Branch: main
- Homepage: https://apertium.org/
- Size: 1.95 MB
Statistics
- Stars: 99
- Watchers: 18
- Forks: 30
- Open Issues: 55
- Releases: 14
Topics
apertium-core
Created almost 8 years ago
· Last pushed 7 months ago
Metadata Files
Readme
Changelog
License
Authors
README
# Apertium Apertium is an open-source rule-based machine translation toolchain and ecosystem. It facilitates the creation of consistent and transparent machine translation systems by relying on deterministic linguistic rules rather than statistical or neural models. Apertium's tools are designed to be language-agnostic and platform-independent, making them suitable for a wide range of languages and applications. ## Project Overview Apertium's framework is based on finite-state transducers, which enable efficient and accurate processing of natural languages. The language data used by Apertium is stored in XML and other human-readable text formats, organized into modular single-language packages and translation pairs. This modularity allows for the reuse of language data across multiple translation systems. ## Features - **Rule-Based Translation**: Consistent and understandable translations based on deterministic rules. - **Finite-State Transducers**: Efficient language processing using advanced computational models. - **Language-Agnostic Tools**: Broad applicability across multiple languages. - **Modular Design**: Reusable language packages simplify the development of new translation pairs. ## Installation Apertium provides binaries for several platforms, including Debian, Ubuntu, Fedora, CentOS, OpenSUSE, Windows, and macOS. Both nightly builds and official releases are available. If you are on a supported platform, it is recommended to use the pre-built binaries. For more information, see the [Apertium Installation Guide](https://wiki.apertium.org/wiki/Installation). ## Building from Source If you need to modify Apertium’s behavior or are on a platform that is not officially supported, follow these steps to build from source. ### Requirements - [lttoolbox](https://github.com/apertium/lttoolbox) - libxml2 - ICU ### Compiling ```bash $ autoreconf -fvi $ ./configure $ make ``` ## Usage Apertium can be used to translate text between supported languages. Assuming the relevant language data (here the Spanish-Catalan translator) has been installed, translation can be achieved with the following command: ```bash $ apertium spa-cat input.txt output.txt ``` The `apertium` executable can also use piped streams: ```bash $ echo "La casa es roja." | apertium spa-cat ``` Language data which has been compiled but not installed can be used with the `-d` flag: ```bash $ echo "La casa es roja." | apertium -d ./apertium-spa-cat spa-cat ``` Formats other than plaintext can be specified with the `-f` flag: ```bash $ apertium -f html spa-cat input.html output.html ``` Data packages may provide modes besides the main translation mode. Use the `-l` flag to list them. ```bash $ apertium -l $ apertium -l -d ./apertium-spa-cat ``` ## Additional Tools This repository also provides the following executables: ### Pipeline Modules - `apertium-extract-caps`, `apertium-restore-caps`: Handle capitalization - `apertium-pretransfer`: Split compound analyses into separate words for processing by `apertium-transfer` - `apertium-posttransfer`: Clean up repeated spaces - `apertium-tagger`: Perform statistical part-of-speech tagging - `apertium-transfer`, `apertium-interchunk`, `apertium-postchunk`: Structural transfer modules ([documentation](https://wiki.apertium.org/wiki/A_long_introduction_to_transfer_rules)) - `apertium-wblank-attach`, `apertium-wblank-detach`, `apertium-wblank-mode`: Handle word-bound blanks ### Build Tools These programs are used in the process of compiling linguistic data packages. - `apertium-compile-caps`: Compile capitalization-handling rules for use by `apertium-restore-caps` ([documentation](https://wiki.apertium.org/wiki/Capitalization_restoration)) - `apertium-gen-modes`: Process the `modes.xml` file, which specifies what translation and analysis modes a data package provides - `apertium-preprocess-transfer`: Process structural transfer rule files for use by `apertium-transfer` - `apertium-validate-acx`, `apertium-validate-crx`, `apertium-validate-dictionary`, `apertium-validate-interchunk`, `apertium-validate-modes`, `apertium-validate-postchunk`, `apertium-validate-tagger`, `apertium-validate-transfer`: Validators for various XML rule formats ### Format Handlers For each supported file format, there is a deformatter named `apertium-des[NAME]` (e.g. `apertium-deshtml`) which reads formatted text from standard input and writes [Apertium stream format](https://wiki.apertium.org/wiki/Apertium_stream_format) to standard output. There is also a corresponding set of reformatters which do the reverse and are named `apertium-re[NAME]` (e.g. `apertium-rehtml`). These programs rarely need to be invoked directly, since they are handled by the `apertium` executable. Most of the format handlers are currently deprecated in favor of [Transfuse](https://github.com/TinoDidriksen/transfuse). ## License This project is licensed under the GNU General Public License v2.0. See the [COPYING](COPYING) file for details. For more information, visit [Apertium](https://apertium.org) or the [Apertium Wiki](https://wiki.apertium.org).
Owner
- Name: Apertium
- Login: apertium
- Kind: organization
- Email: apertium-contact@lists.sourceforge.net
- Website: https://wiki.apertium.org/
- Repositories: 630
- Profile: https://github.com/apertium
Free/open-source platform for developing rule-based machine translation systems and language technology
GitHub Events
Total
- Create event: 2
- Release event: 1
- Issues event: 3
- Watch event: 11
- Issue comment event: 14
- Push event: 5
- Pull request event: 2
- Fork event: 6
Last Year
- Create event: 2
- Release event: 1
- Issues event: 3
- Watch event: 11
- Issue comment event: 14
- Push event: 5
- Pull request event: 2
- Fork event: 6
Committers
Last synced: 8 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Sergio Ortiz Rojas | s****z@g****m | 196 |
| Kevin Brubeck Unhammer | u****r@f****g | 165 |
| Jim O'Regan | j****n@g****m | 138 |
| Tino Didriksen | m****l@t****m | 95 |
| Francis M. Tyers | f****s@p****m | 82 |
| Daniel Swanson | p****e@g****m | 69 |
| Frankie Robertson | f****e@r****e | 64 |
| Tanmai Khanna | k****i@g****m | 15 |
| Matthew | m****g@e****t | 13 |
| Jacob Nordfalk | j****k@g****m | 9 |
| Pasquale Minervini | p****i@g****m | 9 |
| Xavi Ivars | x****s@g****m | 9 |
| Felipe Sánchez Martínez | f****z@d****s | 7 |
| Sergio Ortiz Rojas | s****o@p****m | 7 |
| Benedikt Freisen | b****n@g****t | 4 |
| Bernard Chardonneau | b****m@f****r | 4 |
| Lokendra Singh | l****8@g****m | 4 |
| Kartik Mistry | k****y@g****m | 4 |
| Sjur Nørstebø Moshagen | s****n@u****o | 3 |
| Abu Zaher | z****4@g****m | 2 |
| Hrvoje Peradin | h****n@g****m | 2 |
| Sushain Cherivirala | s****n@s****e | 2 |
| Techievena | a****a@g****m | 2 |
| Flammie Pirinen | f****e@i****i | 2 |
| Marc Riera | m****n@g****m | 2 |
| aboelhamd | a****a@g****m | 1 |
| Daniel Emilio Beck | b****l@u****t | 1 |
| Flammie Pirinen | t****n@u****i | 1 |
| Julen Ruiz Aizpuru | m****a@u****t | 1 |
| Leonardo F. S. Boiko | l****o@i****r | 1 |
| and 10 more... | ||
Committer Domains (Top 20 + Academic)
prompsit.com: 2
users.sourceforge.net: 2
fsfe.org: 1
tinodidriksen.com: 1
robertson.name: 1
earthlink.net: 1
dlsi.ua.es: 1
gmx.net: 1
free.fr: 1
uit.no: 1
skc.name: 1
iki.fi: 1
uit.fi: 1
ime.usp.br: 1
tudelft.nl: 1
gentoo.org: 1
selimcan.org: 1
anjbe.name: 1
Issues and Pull Requests
Last synced: 8 months ago
All Time
- Total issues: 108
- Total pull requests: 30
- Average time to close issues: 11 months
- Average time to close pull requests: 14 days
- Total issue authors: 27
- Total pull request authors: 12
- Average comments per issue: 2.92
- Average comments per pull request: 1.93
- Merged pull requests: 23
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 9
- Pull requests: 6
- Average time to close issues: 2 days
- Average time to close pull requests: 3 days
- Issue authors: 3
- Pull request authors: 5
- Average comments per issue: 2.44
- Average comments per pull request: 1.0
- Merged pull requests: 4
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- unhammer (35)
- hectoralos (16)
- ftyers (10)
- MarcRiera (7)
- TinoDidriksen (6)
- flammie (5)
- mr-martian (4)
- sushain97 (2)
- bentley (2)
- maddin200 (2)
- jelmervdl (2)
- mortenivar (1)
- brad0 (1)
- buhomec (1)
- chiru200513 (1)
Pull Request Authors
- mr-martian (11)
- unhammer (10)
- khannatanmai (4)
- Atharv-2004 (2)
- marcriera (2)
- ahmedsiam0 (1)
- xavivars (1)
- aboelhamd (1)
- chiru200513 (1)
- TinoDidriksen (1)
- thesamesam (1)
- dhruvak001 (1)
Top Labels
Issue Labels
bug (24)
enhancement (19)
tagger (12)
good first issue (7)
formatters (6)
capitalisation (3)
cleanup (2)
help wanted (2)
pretransfer (2)
invalid (1)
wontfix (1)
question (1)
Pull Request Labels
enhancement (2)
pretransfer (1)