Recent Releases of wordfreq

wordfreq - v3.0.2: packaging fixes

  • Updated the range of allowable versions of regex. Versions before 2021.7.6 don't have the regex.Match class.

  • Added the extras dependencies as optional dependencies in pyproject.toml.

- Python
Published by rspeer almost 4 years ago

wordfreq - v3.0: The "handle numbers better" release

Previously, wordfreq would group all digit sequences of the same 'shape', with length 2 or more, into a single token and return the frequency of that token, which would be a vast overestimate.

Now it distributes the frequency over all numbers of that shape, with an estimated distribution that allows for Benford's law (lower numbers are more frequent) and a special frequency distribution for 4-digit numbers that look like years (2010 is more frequent than 1020).

More changes related to digits:

  • Functions such as iter_wordlist and top_n_list no longer return multi-digit numbers (they used to return them in their "smashed" form, such as "0000").

  • lossy_tokenize no longer replaces digit sequences with 0s. That happens instead in a place that's internal to the word_frequency function, so we can look at the values of the digits before they're replaced.

Other changes:

  • wordfreq is now developed using poetry as its package manager, and with pyproject.toml as the source of configuration instead of setup.py.

  • The minimum version of Python supported is 3.7.

  • Type information is exported using py.typed.

- Python
Published by rspeer almost 4 years ago

wordfreq - v2.5.1

Version 2.5.1 (2021-09-02)

  • Import ftfy and use its uncurl_quotes method to turn curly quotes into straight ones, providing consistency with multiple forms of apostrophes.

  • Set minimum version requierements on regex, jieba, and langcodes so that tokenization will give consistent results.

  • Work around an inconsistency in the msgpack API around strict_map_key=False.

Version 2.5 (2021-04-15)

  • Incorporate data from the OSCAR corpus.

- Python
Published by rspeer almost 5 years ago

wordfreq - v2.2

- Python
Published by rspeer over 7 years ago

wordfreq - v1.7

This release of wordfreq gives word frequencies in 32 languages from a variety of data sources, which it checks against each other to mitigate outliers.

See CHANGELOG.md for more details on the version history.

- Python
Published by rspeer almost 9 years ago

wordfreq - v1.5.1

This release of wordfreq gives word frequencies in 27 languages from a variety of data sources, which it checks against each other to mitigate outliers.

See CHANGELOG.md for more details on the version history.

- Python
Published by rspeer almost 10 years ago