mishkal

Mishkal is an arabic text vocalization software

https://github.com/linuxscout/mishkal

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    1 of 13 committers (7.7%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.7%) to scientific vocabulary

Keywords

arabic natural-language-processing python webapp

Keywords from Contributors

arabic-language text-processing
Last synced: 4 months ago · JSON representation ·

Repository

Mishkal is an arabic text vocalization software

Basic Info
  • Host: GitHub
  • Owner: linuxscout
  • License: gpl-3.0
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 64.7 MB
Statistics
  • Stars: 294
  • Watchers: 20
  • Forks: 72
  • Open Issues: 14
  • Releases: 1
Topics
arabic natural-language-processing python webapp
Created over 11 years ago · Last pushed over 2 years ago
Metadata Files
Readme Funding License Citation Support

README.md

Mishkal

Mishkal Arabic text vocalization software

GitHub stars GitHub forks GitHub contributors GitHub issues downloads downloads GitHub license

Developpers: Taha Zerrouki: http://tahadz.com taha dot zerrouki at gmail dot com

Features | value ---------|--------------------------------------------------------------------------------- Authors | Authors.md Release | 1.10 Bouira License |GPL Tracker |linuxscout/mishkal/Issues Mailinglist |mishkal@googlegroups.com Website |tahadz.com/mishkal Source |Github Download |sourceforge Feedbacks |Comments Accounts |@Facebook @Twitter @Sourceforge

Table of Contents

Citation

Please, if you want to cite this software use the following citation

bibtex @thesis{zerrouki2020adawat, author = {Taha Zerrouki}, title = {Towards An Open Platform For Arabic Language Processing}, type = {PhD thesis}, institution = {Ecole Nationale Suprieure d'informatique, Alger, Algrie}, date = {2020}, }

Install

You can Install Mishkal as library or Software

Python lib

pip install mishkal

Install from github

  1. Clone mishkal project from GitHub:

git clone https://github.com/linuxscout/mishkal.git

  1. Install necessary packages:

pip install -r miskal/requirements.txt

Requirments

- pyarabic  : basic arabic library
- sylajone  : aranasyn syntaxical analyzer
- arramooz  : arabic morphological dictionary
- asmai     : semantic analyzer
- CodernityDB :  pure python, fast, NoSQL database, used as cache system to minimize load of morphological analyzer 
- collocations : collocation library ( deprecated)
- libqutrub : verb conjugation library used by morphological analyzer
- maskouk   : collocation library
- naftawayh : word tag library
- qalsadi   ; morphological analyzer
- tashaphyne : light stemmer used by morphological analyzer

Usage

Mishkal provides:

  • Console command line
  • python library
  • GUi interface
  • Web interface
  • API interface ### GUI:
  • Windows: MishkalGui.exe

  • GUI: Linux python interfaces/gui/mishkal-gui.py

    Web server (linux, windows)

    python3 interfaces/web/mishkal-webserver

    • serving on 0.0.0.0:8080 view at http://127.0.0.1:8080
    • open in your browser the URL: http://127.0.0.1:8080

Console (linux/windows)

```shell $ python3 bin/mishkal-console.py -f filename

Usage: bin/mishal-console.py -f filename [OPTIONS] bin/mishal-console.py ' ' [OPTIONS]

    [-f | --file = filename]       input file 
    [-o | --outfile = filename]    output file to write vocalized text to, '$FILENAME (Tashkeel).txt' by default

    [-h | --help]             outputs this usage message
    [-v | --version]        program version
    [-p | --progress]      display progress status
    [-a | --verbose]       enable verbosity

    * Tashkeel Actions
    -------------------
    [-r | --reduced]        Reduced Tashkeel.
    [-s | --strip]             Strip tashkeel (remove harakat).
    [-c | --compare]      compare the vocalized text with the program output

    * Tashkeel Options
    ------------------
    [-l | --limit]             vocalize only a limited number of line
    [-x | --syntax]         disable syntaxic analysis
    [-m | --semantic]    disable semantic analysis
    [-g | --train]             enable training option
    [-i | --ignore]           ignore the last Mark on output words.
    [-t | --stat]               disable statistic tashkeel

This program is licensed under the GPL License ```

Example:

```python

import mishkal.tashkeel vocalizer = mishkal.tashkeel.TashkeelClass() text = u" " vocalizer.tashkeel(text) ' '

```

JSON connection API

   json  ajax       .

  • 1- json Jquery

```javascript <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

```

javascript $.getJSON("http://tahadz.com/mishkal/ajax...", {text:" \n \n ", action:"TashkeelText"},

  • text: .
  • action: TashkeelText.

javascript {"result": " ", "order": "0"}

  • result: .
  • order: .

How does Mishkal work:

Mishkal use a rule based method to detect relations and diacritics, First, it analyzes all morphological cases, it generates all possible diacritized word forms, by detecting all affixes and check it in a dictionary. second, It add word frequency to each word.

The two previous steps are made by support/Qalsadi ( arabic morphological analyzer), the used dictionary is a separated project named 'Arramooz: arabic dictionnary for morphology".

Third, we use a syntax analyzer to detect all possible relations between words. The syntax library is named support/ArAnaSyn. This analyzer is basic for the moment, it use only linear relations between adjacent words.

Forth, all data generated and relations will be analyzed semantically, to detect semantic relation in order to reduce ambiguity. The use libary is support/asmai ( Arabic semantic analysis). The semantic relations extraction is based on corpus. The used corpus is named "Tashkeela: arabic vocalized texts corpus".

In the final stage, The module mishkal/tashkeel tries to select the suitable word in the context, it tries to get evidents cases, or more related cases, else, it tries to select more probable case, using some rules like select a stop word by default, or select Mansoub case by default.

The rest of program provides functions to handles interfaces and API with web/desktop or command line

Featured Posts

Owner

  • Name: Taha Zerrouki (طه زروقي )
  • Login: linuxscout
  • Kind: user
  • Location: Bouira, Algeria
  • Company: Bouira University

PhD, Computer Science Professor, Interest : Arabic Natural Language processing

Citation (CITATION.cff)

abstract: Mishkal is an arabic text vocalization software.
authors:
  - family-names: Zerrouki
    given-names: Taha
cff-version: 1.0.0
message: If you use this software, please cite it using these metadata.
title: Towards An Open Platform For Arabic Language Processing

GitHub Events

Total
  • Issues event: 8
  • Watch event: 18
  • Issue comment event: 11
  • Fork event: 5
Last Year
  • Issues event: 8
  • Watch event: 18
  • Issue comment event: 11
  • Fork event: 5

Committers

Last synced: 8 months ago

All Time
  • Total Commits: 178
  • Total Committers: 13
  • Avg Commits per committer: 13.692
  • Development Distribution Score (DDS): 0.129
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
linuxscout t****i@h****m 155
Abdelhak Bougouffa a****k@c****t 6
Valdis Vitolins v****s@o****v 3
Muhammad Al-Barham m****8@g****m 3
harabat y****e@g****m 2
Youssef Sherif s****f@a****u 2
harabat 4****t 1
PAHXO i****z@g****m 1
Mehdi Nassim KHODJA k****m@g****m 1
Karl Wettin k****n@k****e 1
Fahad Al-Saidi f****i@g****m 1
Assem Chelli a****h@g****m 1
André Costa a****a@w****e 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 53
  • Total pull requests: 18
  • Average time to close issues: 10 months
  • Average time to close pull requests: about 1 month
  • Total issue authors: 21
  • Total pull request authors: 13
  • Average comments per issue: 1.58
  • Average comments per pull request: 0.94
  • Merged pull requests: 15
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 3
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 3
  • Pull request authors: 0
  • Average comments per issue: 0.67
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • linuxscout (26)
  • muotaz (3)
  • Fahad-Alsaidi (3)
  • yoosif0 (2)
  • abdoutech93 (2)
  • othmanalikhan (1)
  • alihabib80 (1)
  • Coder-ACJHP (1)
  • burhansvural (1)
  • donny08 (1)
  • zaidalyafeai (1)
  • anasram (1)
  • firas-jolha (1)
  • naskio (1)
  • alaanousir (1)
Pull Request Authors
  • abougouffa (3)
  • mohammad-albarham (3)
  • harabat (2)
  • reedy (1)
  • naskio (1)
  • Fahad-Alsaidi (1)
  • yoosif0 (1)
  • karlwettin (1)
  • hajabiGlob (1)
  • assem-ch (1)
  • PAHXO (1)
  • valdisvi (1)
  • lokal-profil (1)
Top Labels
Issue Labels
bug (4) help wanted (3) question (1) enhancement (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 1,125 last-month
  • Total dependent packages: 1
  • Total dependent repositories: 4
  • Total versions: 5
  • Total maintainers: 1
pypi.org: mishkal

Mishkal: Arabic text diacritization library for Python

  • Versions: 5
  • Dependent Packages: 1
  • Dependent Repositories: 4
  • Downloads: 1,125 Last month
Rankings
Dependent packages count: 4.8%
Dependent repos count: 7.5%
Average: 8.8%
Downloads: 14.2%
Maintainers (1)
Last synced: 4 months ago

Dependencies

requirements.txt pypi
  • alyahmor >=0.1
  • arramooz-pysqlite >=0.1
  • asmai >=0.1
  • codernitydb3 ==0.6.0
  • libqutrub >=1.0
  • maskouk-pysqlite >=0.1
  • mysam-tagmanager >=0.3.3
  • naftawayh >=0.2
  • pickledb >=0.9.0
  • pyarabic >=0.6.2
  • qalsadi >=0.2
  • sylajone >=0.1
  • tashaphyne >=0.3.1
setup.py pypi