https://github.com/chasmani/zipfanalysis
Tools for analysing Zipf's law from text samples
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
✓Committers with academic emails
1 of 2 committers (50.0%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.1%) to scientific vocabulary
Last synced: 10 months ago
·
JSON representation
Repository
Tools for analysing Zipf's law from text samples
Basic Info
- Host: GitHub
- Owner: chasmani
- License: mit
- Language: Python
- Default Branch: master
- Size: 2.3 MB
Statistics
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 0
Created almost 6 years ago
· Last pushed almost 4 years ago
Metadata Files
Readme
License
README.rst
============
zipfanalysis
============
Tools in python for analysing Zipf's law from text samples.
This can be installed as a package from the python3 package library using the terminal command:
::
>>> pip install zipfanalysis
WARNING: This tool is still in development and should not be relied upon wholly for academic research.
-----
Usage
-----
The package can be used from within python scripts to estimate Zipf exponents, assuming a simple power law model for
word frequencies and ranks. To use the pacakge import it using
::
import zipfanalysis
-------------
Simple Method
-------------
The easiest way to carry out an analysis on a book or text file, using different estimators, is:
::
alpha_clauset = zipfanalysis.clauset("path_to_book.txt")
alpha_pdf = zipfanalysis.ols_pdf("path_to_book.txt", min_frequency=3)
alpha_cdf = zipfanalysis.ols_cdf("path_to_book.txt", min_frequency=3)
alpha_abc = zipfanalysis.abc("path_to_book.txt")
---------------
In Depth Method
---------------
Convert a book or text file to the frequency of words, ranked from highest to lowest:
::
word_counts = zipfanalysis.preprocessing.preprocessing.get_rank_frequency_from_text("path_to_book.txt")
Carry out different types of analysis to fit a power law to the data:
::
# Clauset et al estimator
alpha_clauset = zipfanalysis.estimators.clauset.clauset_estimator(word_counts)
# Ordinary Least Squares regression on log(rank) ~ log(frequency)
# Optional low frequency cut-off
alpha_pdf = zipfanalysis.estimators.ols_regression_pdf.ols_regression_pdf_estimator(word_counts, min_frequency=2)
# Ordinary least squares regression on the complemantary cumulative distribution function of ranks
# OLS on log(P(R>rank)) ~ log(rank)
# Optional low frequency cut-off
alpha_cdf = zipfanalysis.estimators.ols_regression_cdf.ols_regression_cdf_estimator(word_counts)
# Approximate Bayesian computation (regression method)
# Assumes model of p(rank) = C prob_rank^(-alpha)
# prob_rank is a word's rank in an underlying probability distribution
alpha_abc = zipfanalysis.estimators.approximate_bayesian_computation.abc_estimator(word_counts)
------------------
Development Notes
------------------
General workflow to use should be:
1. Import data to n vector. E.g.
n = zipfanalysis.import_book("filename.txt")
n = zipfanlysis.import_list([list of words])
n = zipfanlysis.import_counter(counter_of_words)
2. Carry out analsyis on data e.g.
zipfanalysis.n_pdf_regression(n)
3. Also convert to different representations
zipfanalysis.convert_to_f(n)
Owner
- Name: Chasmani
- Login: chasmani
- Kind: user
- Repositories: 6
- Profile: https://github.com/chasmani
GitHub Events
Total
Last Year
Committers
Last synced: over 3 years ago
All Time
- Total Commits: 37
- Total Committers: 2
- Avg Commits per committer: 18.5
- Development Distribution Score (DDS): 0.027
Top Committers
| Name | Commits | |
|---|---|---|
| chasmani | p****2@g****m | 36 |
| Charlie Pilgrim | m****m@t****k | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 17 last-month
- Total dependent packages: 0
- Total dependent repositories: 1
- Total versions: 3
- Total maintainers: 1
pypi.org: zipfanalysis
Tools for analysing Zipf's law from text samples
- Homepage: https://github.com/chasmani/zipfanalysis
- Documentation: https://zipfanalysis.readthedocs.io/
- License: MIT License
-
Latest release: 0.5
published almost 6 years ago
Rankings
Dependent packages count: 10.0%
Dependent repos count: 21.7%
Forks count: 29.8%
Average: 32.1%
Stargazers count: 38.8%
Downloads: 60.2%
Maintainers (1)
Last synced:
10 months ago