pyuca

pyuca: a Python implementation of the Unicode Collation Algorithm - Published in JOSS (2016)

https://github.com/jtauber/pyuca

Keywords

unicode unicode-collation-algorithm

Last synced: 6 months ago · JSON representation

Repository

a Python implementation of the Unicode Collation Algorithm

Basic Info

Host: GitHub
Owner: jtauber
License: mit
Language: Python
Default Branch: master
Homepage:
Size: 14.8 MB

Statistics

Stars: 221
Watchers: 12
Forks: 24
Open Issues: 15
Releases: 3

Topics

unicode unicode-collation-algorithm

Created over 13 years ago · Last pushed almost 2 years ago

Metadata Files

Readme Contributing Funding License Authors

pyuca: Python Unicode Collation Algorithm implementation

This is a Python implementation of the Unicode Collation Algorithm (UCA). It passes 100% of the UCA conformance tests for Unicode 5.2.0 (Python 2.7), Unicode 6.3.0 (Python 3.3+), Unicode 8.0.0 (Python 3.5+), Unicode 9.0.0 (Python 3.6+), and Unicode 10.0.0 (Python 3.7+) with a variable-weighting setting of Non-ignorable.

What do you use it for?

In short, sorting non-English strings properly.

The core of the algorithm involves multi-level comparison. For example, café comes before caff because at the primary level, the accent is ignored and the first word is treated as if it were cafe. The secondary level (which considers accents) only applies then to words that are equivalent at the primary level.

The Unicode Collation Algorithm and pyuca also support contraction and expansion. Contraction is where multiple letters are treated as a single unit. In Spanish, ch is treated as a letter coming between c and d so that, for example, words beginning ch should sort after all other words beginnings with c. Expansion is where a single letter is treated as though it were multiple letters. In German, ä is sorted as if it were ae, i.e. after ad but before af.

How to use it

Here is how to use the pyuca module.

pip install pyuca

Usage example:

from pyuca import Collator
c = Collator()

assert sorted(["cafe", "caff", "café"]) == ["cafe", "caff", "café"]
assert sorted(["cafe", "caff", "café"], key=c.sort_key) == ["cafe", "café", "caff"]

Collator can also take an optional filename for specifying a custom collation element table.

You can also import collators for specific Unicode versions, e.g. from pyuca.collator import Collator_8_0_0. But just from pyuca import Collator will ensure that the collator version matches the version of unicodata provided by the standard library for your version of Python.

How to cite it

Tauber, J. K. (2016). pyuca: a Python implementation of the Unicode Collation Algorithm. The Journal of Open Source Software. DOI: 10.21105/joss.00021

License

Python code is made available under an MIT license (see LICENSE). allkeys.txt is made available under the similar license defined in LICENSE-allkeys.

Contacting the Developer

If you have any problems, questions or suggestions, it's best to file an issue on GitHub although you can also contact me at jtauber@jtauber.com.

For more of my work on linguistics and Ancient Greek, see http://jktauber.com/.

Owner

Name: James Tauber
Login: jtauber
Kind: user
Location: Greater Boston Area, US

Website: https://jtauber.com/
Repositories: 140
Profile: https://github.com/jtauber

Python and Web developer using linguistics, data science, and open source software to help people better understand languages and texts.

JOSS Publication

pyuca: a Python implementation of the Unicode Collation Algorithm

Published

May 18, 2016

DOI

10.21105/joss.00021

Volume 1, Issue 1, Page 21

Authors

J. K. Tauber

None

Editor

Arfon Smith

View PDF Review Thread Software Archive

GitHub Events

Total

Watch event: 6
Pull request event: 1
Fork event: 1

Last Year

Watch event: 6
Pull request event: 1
Fork event: 1

Committers

Last synced: 7 months ago

All Time

Total Commits: 137
Total Committers: 5
Avg Commits per committer: 27.4
Development Distribution Score (DDS): 0.139

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
James Tauber	j**r@j**m	118
Chris Beaven	s**s@g**m	12
Michal Čihař	m**l@c**m	3
Paul McLanahan	p**c@m**m	2
Bruno Oliveira	n**s@g**m	2

Committer Domains (Top 20 + Academic)

mozilla.com: 1 cihar.com: 1 jtauber.com: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 13
Total pull requests: 16
Average time to close issues: 3 months
Average time to close pull requests: 21 days
Total issue authors: 9
Total pull request authors: 9
Average comments per issue: 4.62
Average comments per pull request: 1.38
Merged pull requests: 5
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

jtauber (4)
ChameleonRed (2)
pmclanahan (1)
penguinpee (1)
jtojnar (1)
santhoshtr (1)
href (1)
filak (1)
Hultner (1)

Pull Request Authors

lucafavatella (5)
jtauber (3)
penguinpee (2)
bryanforbes (2)
nicoddemus (1)
nijel (1)
feanil (1)
SmileyChris (1)
eric-wieser (1)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 4
Total downloads:
- pypi 222,329 last-month
Total docker downloads: 295,081

Total dependent packages: 11
(may contain duplicates)
Total dependent repositories: 263
(may contain duplicates)
Total versions: 16
Total maintainers: 1

pypi.org: pyuca

a Python implementation of the Unicode Collation Algorithm

Homepage: http://github.com/jtauber/pyuca
Documentation: https://pyuca.readthedocs.io/
License: MIT
Latest release: 1.1.2
published almost 10 years ago

Versions: 11
Dependent Packages: 10
Dependent Repositories: 263
Downloads: 222,329 Last month
Docker Downloads: 295,081

Rankings

Docker downloads count: 0.9%

Dependent repos count: 0.9%

Dependent packages count: 1.3%

Downloads: 1.5%

Average: 2.9%

Stargazers count: 4.7%

Forks count: 8.0%

Maintainers (1)

jtauber

Last synced: 6 months ago

proxy.golang.org: github.com/jtauber/pyuca

Documentation: https://pkg.go.dev/github.com/jtauber/pyuca#section-documentation
License: mit
Latest release: v1.1.2
published almost 10 years ago

Versions: 2
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Stargazers count: 3.6%

Forks count: 4.6%

Average: 7.1%

Dependent packages count: 9.6%

Dependent repos count: 10.8%

Last synced: 6 months ago

conda-forge.org: pyuca

Homepage: https://pypi.org/project/pyuca
License: MIT
Latest release: 1.1.2
published over 3 years ago

Versions: 2
Dependent Packages: 1
Dependent Repositories: 0

Rankings

Stargazers count: 26.1%

Dependent packages count: 28.9%

Average: 32.2%

Forks count: 35.2%

Dependent repos count: 38.4%

Last synced: 6 months ago

anaconda.org: pyuca

This is a Python implementation of the Unicode Collation Algorithm (UCA). It passes 100% of the UCA conformance tests for Unicode 5.2.0 (Python 2.7), Unicode 6.3.0 (Python 3.3+), Unicode 8.0.0 (Python 3.5+), Unicode 9.0.0 (Python 3.6+), and Unicode 10.0.0 (Python 3.7+) with a variable-weighting setting of Non-ignorable.

Homepage: https://pypi.org/project/pyuca
License: MIT AND Unicode-3.0
Latest release: 1.2
published over 1 year ago

Versions: 1
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent packages count: 50.9%

Average: 53.5%

Dependent repos count: 56.0%

Last synced: 6 months ago

pyuca

Science Score: 93.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

pyuca: Python Unicode Collation Algorithm implementation

What do you use it for?

How to use it

How to cite it

License

Contacting the Developer

Owner

JOSS Publication

pyuca: a Python implementation of the Unicode Collation Algorithm

Authors

Editor

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: pyuca

Rankings

Maintainers (1)

proxy.golang.org: github.com/jtauber/pyuca

Rankings

conda-forge.org: pyuca

Rankings

anaconda.org: pyuca

Rankings

Dependencies