Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.3%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: joaopcnogueira
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 2.47 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 2 years ago · Last pushed over 2 years ago
Metadata Files
Readme Contributing License Citation

README.rst

The ``pyzipf`` package tallies the occurrences of words in text
files and plots each word's rank versus its frequency together 
with a line for the theoretical distribution for Zipf's Law.

Motivation
----------

Zipf's Law is often stated as an observational pattern in the
relationship between the frequency and rank of words in a text:

`"…the most frequent word will occur approximately twice as
often as the second most frequent word,
three times as often as the third most
frequent word, etc."`
— `wikipedia `_

Many books are available to download in plain text format
from sites such as
`Project Gutenberg `_,
so we created this package to qualitatively explore how well
different books align with the word frequencies predicted by
Zipf's Law.

Installation
------------

``pip install pyzipf``

Usage
-----

After installing this package, the following three commands will
be available from the command line

- ``countwords`` for counting the occurrences of words in a text
- ``collate`` for collating multiple word count files together
- ``plotcounts`` for visualizing the word counts

A typical usage scenario would include running the following
from your terminal::

    countwords dracula.txt > dracula.csv
    countwords moby_dick.txt > moby_dick.csv
    collate dracula.csv moby_dick.csv > collated.csv
    plotcounts collated.csv --outfile zipf-drac-moby.jpg

Additional information on each function
can be found in their docstrings and appending the ``-h`` flag,
e.g., ``countwords -h``.

Contributing
------------

Interested in contributing?
Check out the CONTRIBUTING.md
file for guidelines on how to contribute.
Please note that this project is released with a
Contributor Code of Conduct (CONDUCT.md).
By contributing to this project,
you agree to abide by its terms.
Both of these files can be found in our
`GitHub repository. `_

Owner

  • Name: João Paulo Nogueira
  • Login: joaopcnogueira
  • Kind: user
  • Location: Fortaleza, Ceará

Citation (CITATION.md)

# Citation

If you use the pyzipf package for work/research presented in a
publication, we ask that you please cite:

Khan, A., and Virtanen, S., 2020. pyzipf: A Python package for word
count analysis. *Journal of Important Software*, 5(51), 2317,
https://doi.org/10.21105/jois.02317

### BibTeX entry

    @article{Khan2020,
        title={pyzipf: A Python package for word count analysis.},
        author={Khan, Amira and Virtanen, Sami},
        journal={Journal of Important Software},
        volume={5},
        number={51},
        eid={2317},
        year={2020},
        doi={10.21105/jois.02317},
       url={https://doi.org/10.21105/jois.02317},
    }

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

environment.yml pypi
  • coverage ==5.3
  • pip ==20.3
  • pypandoc ==1.5
  • random-word-generator ==1.2
requirements.txt pypi
  • matplotlib *
  • numpy *
  • pandas *
  • pyyaml *
  • scipy *
setup.py pypi
  • matplotlib *
  • pandas *
  • pytest *
  • pyyaml *
  • scipy *