pyuca
pyuca: a Python implementation of the Unicode Collation Algorithm - Published in JOSS (2016)
Science Score: 93.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README and JOSS metadata -
✓Academic publication links
Links to: joss.theoj.org, zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Keywords
Repository
a Python implementation of the Unicode Collation Algorithm
Basic Info
Statistics
- Stars: 221
- Watchers: 12
- Forks: 24
- Open Issues: 15
- Releases: 3
Topics
Metadata Files
README.md
pyuca: Python Unicode Collation Algorithm implementation
This is a Python implementation of the Unicode Collation Algorithm (UCA). It passes 100% of the UCA conformance tests for Unicode 5.2.0 (Python 2.7), Unicode 6.3.0 (Python 3.3+), Unicode 8.0.0 (Python 3.5+), Unicode 9.0.0 (Python 3.6+), and Unicode 10.0.0 (Python 3.7+) with a variable-weighting setting of Non-ignorable.
What do you use it for?
In short, sorting non-English strings properly.
The core of the algorithm involves multi-level comparison. For example,
café comes before caff because at the primary level, the accent is
ignored and the first word is treated as if it were cafe. The secondary
level (which considers accents) only applies then to words that are equivalent
at the primary level.
The Unicode Collation Algorithm and pyuca also support contraction and
expansion. Contraction is where multiple letters are treated as a single
unit. In Spanish, ch is treated as a letter coming between c and d
so that, for example, words beginning ch should sort after all other words
beginnings with c. Expansion is where a single letter is treated as
though it were multiple letters. In German, ä is sorted as if it were
ae, i.e. after ad but before af.
How to use it
Here is how to use the pyuca module.
pip install pyuca
Usage example:
from pyuca import Collator
c = Collator()
assert sorted(["cafe", "caff", "café"]) == ["cafe", "caff", "café"]
assert sorted(["cafe", "caff", "café"], key=c.sort_key) == ["cafe", "café", "caff"]
Collator can also take an optional filename for specifying a custom
collation element table.
You can also import collators for specific Unicode versions,
e.g. from pyuca.collator import Collator_8_0_0.
But just from pyuca import Collator will ensure that the collator version
matches the version of unicodata provided by the standard library for your
version of Python.
How to cite it
Tauber, J. K. (2016). pyuca: a Python implementation of the Unicode Collation Algorithm. The Journal of Open Source Software. DOI: 10.21105/joss.00021
License
Python code is made available under an MIT license (see LICENSE).
allkeys.txt is made available under the similar license defined in
LICENSE-allkeys.
Contacting the Developer
If you have any problems, questions or suggestions, it's best to file an issue on GitHub although you can also contact me at jtauber@jtauber.com.
For more of my work on linguistics and Ancient Greek, see http://jktauber.com/.
Owner
- Name: James Tauber
- Login: jtauber
- Kind: user
- Location: Greater Boston Area, US
- Website: https://jtauber.com/
- Repositories: 140
- Profile: https://github.com/jtauber
Python and Web developer using linguistics, data science, and open source software to help people better understand languages and texts.
JOSS Publication
pyuca: a Python implementation of the Unicode Collation Algorithm
GitHub Events
Total
- Watch event: 6
- Pull request event: 1
- Fork event: 1
Last Year
- Watch event: 6
- Pull request event: 1
- Fork event: 1
Committers
Last synced: 5 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| James Tauber | j****r@j****m | 118 |
| Chris Beaven | s****s@g****m | 12 |
| Michal Čihař | m****l@c****m | 3 |
| Paul McLanahan | p****c@m****m | 2 |
| Bruno Oliveira | n****s@g****m | 2 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 13
- Total pull requests: 16
- Average time to close issues: 3 months
- Average time to close pull requests: 21 days
- Total issue authors: 9
- Total pull request authors: 9
- Average comments per issue: 4.62
- Average comments per pull request: 1.38
- Merged pull requests: 5
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- jtauber (4)
- ChameleonRed (2)
- pmclanahan (1)
- penguinpee (1)
- jtojnar (1)
- santhoshtr (1)
- href (1)
- filak (1)
- Hultner (1)
Pull Request Authors
- lucafavatella (5)
- jtauber (3)
- penguinpee (2)
- bryanforbes (2)
- nicoddemus (1)
- nijel (1)
- feanil (1)
- SmileyChris (1)
- eric-wieser (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 4
-
Total downloads:
- pypi 222,329 last-month
- Total docker downloads: 295,081
-
Total dependent packages: 11
(may contain duplicates) -
Total dependent repositories: 263
(may contain duplicates) - Total versions: 16
- Total maintainers: 1
pypi.org: pyuca
a Python implementation of the Unicode Collation Algorithm
- Homepage: http://github.com/jtauber/pyuca
- Documentation: https://pyuca.readthedocs.io/
- License: MIT
-
Latest release: 1.1.2
published over 9 years ago
Rankings
Maintainers (1)
proxy.golang.org: github.com/jtauber/pyuca
- Documentation: https://pkg.go.dev/github.com/jtauber/pyuca#section-documentation
- License: mit
-
Latest release: v1.1.2
published over 9 years ago
Rankings
conda-forge.org: pyuca
- Homepage: https://pypi.org/project/pyuca
- License: MIT
-
Latest release: 1.1.2
published over 3 years ago
Rankings
anaconda.org: pyuca
This is a Python implementation of the Unicode Collation Algorithm (UCA). It passes 100% of the UCA conformance tests for Unicode 5.2.0 (Python 2.7), Unicode 6.3.0 (Python 3.3+), Unicode 8.0.0 (Python 3.5+), Unicode 9.0.0 (Python 3.6+), and Unicode 10.0.0 (Python 3.7+) with a variable-weighting setting of Non-ignorable.
- Homepage: https://pypi.org/project/pyuca
- License: MIT AND Unicode-3.0
-
Latest release: 1.2
published over 1 year ago
