werpy

🐍📦 Ultra-fast Python package for calculating and analyzing the Word Error Rate (WER). Built for the scalable evaluation of speech and transcription accuracy.

https://github.com/analyticsinmotion/werpy

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.6%) to scientific vocabulary

Keywords

asr asr-evaluation automatic-speech-recognition levenshtein-distance metrics nlp python python-package speech-to-text stt stt-benchmark wer werpy word-error-rate

Keywords from Contributors

interactive mesh interpretability profiles sequences generic projection standardization optim embedded
Last synced: 6 months ago · JSON representation ·

Repository

🐍📦 Ultra-fast Python package for calculating and analyzing the Word Error Rate (WER). Built for the scalable evaluation of speech and transcription accuracy.

Basic Info
Statistics
  • Stars: 16
  • Watchers: 3
  • Forks: 4
  • Open Issues: 3
  • Releases: 17
Topics
asr asr-evaluation automatic-speech-recognition levenshtein-distance metrics nlp python python-package speech-to-text stt stt-benchmark wer werpy word-error-rate
Created almost 3 years ago · Last pushed 7 months ago
Metadata Files
Readme Changelog Contributing License Code of conduct Citation Security

README.md

werpy-logo-word-error-rate

Word Error Rate for Python Tweet

| | | | --- | --- | | Meta | Python Version   Black Code Style   Documentation Status   Analytics in Motion | | License | werpy License   FOSSA Status   REUSE status | | Security | CodeQL   Codacy Security Scan   Bandit | | Testing | CodeFactor   CircleCI   codecov | | Package | Pypi   PyPI Downloads   Downloads   PyPI - Trusted Publisher |

What is werpy?

werpy is an ultra-fast, lightweight Python package for calculating and analyzing Word Error Rate (WER) between two sets of text.

Built for flexibility and ease of use, it supports multiple input types such as strings, lists, and NumPy arrays. This makes it ideal for everything from quick experiments to large-scale evaluations.

With speed in mind at every scale, werpy harnesses the efficiency of C optimizations to accelerate processing, delivering ultra-fast results from small datasets to enterprise-level workloads.

It also comes packed with powerful features, including:
- 🔤 Built-in text normalization to handle data inconsistencies
- ⚙️ Customizable error penalties for insertions, deletions, and substitutions
- 📋 A detailed summary output for in-depth error analysis

werpy is a quality-focused package, built to production-grade standards for reliability and robustness.

Functions available in werpy

The following table provides an overview of the functions that can be used in werpy.

| Function | Description | | ------------- | ------------- | | normalize(text) | Preprocess input text to remove punctuation, remove duplicated spaces, leading/trailing blanks and convert all words to lowercase. | | wer(reference, hypothesis) | Calculate the overall Word Error Rate for the entire reference and hypothesis texts. | | wers(reference, hypothesis) | Calculates a list of the Word Error Rates for each of the reference and hypothesis texts. | | werp(reference, hypothesis) | Calculates a weighted Word Error Rate for the entire reference and hypothesis texts. | | werps(reference, hypothesis) | Calculates a list of weighted Word Error Rates for each of the reference and hypothesis texts. | | summary(reference, hypothesis) | Provides a comprehensive breakdown of the calculated results including the WER, Levenshtein Distance and all the insertion, deletion and substitution errors. | | summaryp(reference, hypothesis) | Delivers an in-depth breakdown of the results, covering metrics like WER, Levenshtein Distance, and a detailed account of insertion, deletion, and substitution errors, inclusive of the weighted WER. |

Installation

You can install the latest werpy release with Python's pip package manager:

```python

Install werpy from PyPi

pip install werpy ```

Usage

Import the werpy package

Python Code: python import werpy

Example 1 - Normalize a list of text

Python Code: python input_data = ["It's very popular in Antarctica.","The Sugar Bear character"] reference = werpy.normalize(input_data) print(reference)

Results Output: ['its very popular in antarctica', 'the sugar bear character']

Example 2 - Calculate the overall Word Error Rate on a set of strings

Python Code: python wer = werpy.wer('i love cold pizza', 'i love pizza') print(wer)

Results Output: 0.25

Example 3 - Calculate the overall Word Error Rate on a set of lists

Python Code: python ref = ['i love cold pizza','the sugar bear character was popular'] hyp = ['i love pizza','the sugar bare character was popular'] wer = werpy.wer(ref, hyp) print(wer)

Results Output: 0.2

Example 4 - Calculate the Word Error Rates for each set of texts

Python Code: python ref = ['no one else could claim that','she cited multiple reasons why'] hyp = ['no one else could claim that','she sighted multiple reasons why'] wers = werpy.wers(ref, hyp) print(wers)

Results Output: [0.0, 0.2]

Example 5 - Calculate the weighted Word Error Rates for the entire set of text

Python Code: python ref = ['it was beautiful and sunny today'] hyp = ['it was a beautiful and sunny day'] werp = werpy.werp(ref, hyp, insertions_weight=0.5, deletions_weight=0.5, substitutions_weight=1) print(werp)

Results Output: 0.25

Example 6 - Calculate a list of weighted Word Error Rates for each of the reference and hypothesis texts

Python Code: python ref = ['it blocked sight lines of central park', 'her father was an alderman in the city government'] hyp = ['it blocked sightlines of central park', 'our father was an elder man in the city government'] werps = werpy.werps(ref, hyp, insertions_weight = 0.5, deletions_weight = 0.5, substitutions_weight = 1) print(werps)

Results Output: [0.21428571428571427, 0.2777777777777778]

Example 7 - Provide a complete breakdown of the Word Error Rate calculations for each of the reference and hypothesis texts

Python Code: python ref = ['it is consumed domestically and exported to other countries', 'rufino street in makati right inside the makati central business district', 'its estuary is considered to have abnormally low rates of dissolved oxygen', 'he later cited his first wife anita as the inspiration for the song', 'no one else could claim that'] hyp = ['it is consumed domestically and exported to other countries', 'rofino street in mccauti right inside the macasi central business district', 'its estiary is considered to have a normally low rates of dissolved oxygen', 'he later sighted his first wife anita as the inspiration for the song', 'no one else could claim that'] summary = werpy.summary(ref, hyp) print(summary)

Results Output: <!-- --> <!-- --> <!-- werpy summary DataFrame-->

werpy-example-summary-results-word-error-rate-breakdown


Example 8 - Provide a complete breakdown of the Weighted Word Error Rate for each of the input texts

Python Code: python ref = ['the tower caused minor discontent because it blocked sight lines of central park', 'her father was an alderman in the city government', 'he was commonly referred to as the blacksmith of ballinalee'] hyp = ['the tower caused minor discontent because it blocked sightlines of central park', 'our father was an alderman in the city government', 'he was commonly referred to as the blacksmith of balen alley'] weighted_summary = werpy.summaryp(ref, hyp, insertions_weight = 0.5, deletions_weight = 0.5, substitutions_weight = 1) print(weighted_summary)

Results Output:

werpy-example-summaryp-results-word-error-rate-breakdown


Dependencies

  • NumPy - Provides an assortment of routines for fast operations on arrays
  • Pandas - Powerful data structures for data analysis, time series, and statistics

Licensing

werpy is released under the terms of the BSD 3-Clause License. Please refer to the LICENSE file for full details.

This project uses standard scientific Python libraries including NumPy and Pandas. For license details, please refer to their official repositories:

Owner

  • Name: Analytics in Motion
  • Login: analyticsinmotion
  • Kind: organization
  • Email: pi@analyticsinmotion.com

Analytics in Motion ❤️ Open Source Programming, Data Science & AI/ML Projects

Citation (CITATION.cff)

cff-version: 1.2.0
message: 'If you use this software, please cite it as below.'
authors:
- family-names: "Armstrong"
  given-names: "Ross"
title: 'werpy - Word Error Rate for Python'
abstract: "A powerful Python package that rapidly calculates and analyzes the Word Error Rate (WER)."
license: BSD-3-Clause
license-url: "https://github.com/analyticsinmotion/werpy/blob/main/LICENSE"
repository-code: "https://github.com/analyticsinmotion/werpy"
keywords:
  - word error rate
  - wer
  - levenshtein distance
  - speech recognition
  - speech-to-text
  - stt
  - metrics
  - natural language processing
  - data science
  - python
  - python package
type: software
url: "https://github.com/analyticsinmotion/werpy"

GitHub Events

Total
  • Release event: 4
  • Watch event: 5
  • Delete event: 6
  • Issue comment event: 10
  • Push event: 150
  • Pull request event: 13
  • Create event: 10
Last Year
  • Release event: 4
  • Watch event: 5
  • Delete event: 6
  • Issue comment event: 10
  • Push event: 150
  • Pull request event: 13
  • Create event: 10

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 463
  • Total Committers: 3
  • Avg Commits per committer: 154.333
  • Development Distribution Score (DDS): 0.035
Past Year
  • Commits: 175
  • Committers: 3
  • Avg Commits per committer: 58.333
  • Development Distribution Score (DDS): 0.057
Top Committers
Name Email Commits
Ross Armstrong 5****g 447
dependabot[bot] 4****] 13
doubleinfinity r****g@z****m 3
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 0
  • Total pull requests: 29
  • Average time to close issues: N/A
  • Average time to close pull requests: about 1 month
  • Total issue authors: 0
  • Total pull request authors: 3
  • Average comments per issue: 0
  • Average comments per pull request: 1.28
  • Merged pull requests: 13
  • Bot issues: 0
  • Bot pull requests: 27
Past Year
  • Issues: 0
  • Pull requests: 17
  • Average time to close issues: N/A
  • Average time to close pull requests: about 2 months
  • Issue authors: 0
  • Pull request authors: 3
  • Average comments per issue: 0
  • Average comments per pull request: 1.29
  • Merged pull requests: 4
  • Bot issues: 0
  • Bot pull requests: 15
Top Authors
Issue Authors
Pull Request Authors
  • dependabot[bot] (51)
  • fossabot (2)
  • LouisJalouzot (2)
Top Labels
Issue Labels
Pull Request Labels
dependencies (51) python (15)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 4,166 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 17
  • Total maintainers: 1
pypi.org: werpy

A powerful yet lightweight Python package to calculate and analyze the Word Error Rate (WER).

  • Documentation: https://werpy.readthedocs.io/
  • License: BSD 3-Clause License Copyright (c) 2023-2025, Analytics in Motion Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  • Latest release: 3.1.0
    published 10 months ago
  • Versions: 17
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 4,166 Last month
Rankings
Downloads: 5.6%
Dependent packages count: 10.1%
Average: 15.0%
Stargazers count: 18.5%
Forks count: 19.1%
Dependent repos count: 21.6%
Maintainers (1)
Last synced: 6 months ago