RapidFuzz
Provides a high-performance interface for calculating string similarities and distances, leveraging the efficient C++ library RapidFuzz <https://github.com/rapidfuzz/rapidfuzz-cpp> developed by Max Bachmann and Adam Cohen.
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (17.6%) to scientific vocabulary
Last synced: 9 months ago
·
JSON representation
Repository
Provides a high-performance interface for calculating string similarities and distances, leveraging the efficient C++ library RapidFuzz <https://github.com/rapidfuzz/rapidfuzz-cpp> developed by Max Bachmann and Adam Cohen.
Basic Info
- Host: GitHub
- Owner: StrategicProjects
- License: other
- Language: C++
- Default Branch: main
- Size: 650 KB
Statistics
- Stars: 4
- Watchers: 2
- Forks: 1
- Open Issues: 0
- Releases: 0
Created over 1 year ago
· Last pushed over 1 year ago
Metadata Files
Readme
Changelog
License
README.Rmd
---
output: github_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
library(pikchr)
```
# RapidFuzz



Provides a high-performance interface for calculating string similarities and distances, leveraging the efficient C++ library [RapidFuzz](https://github.com/rapidfuzz/rapidfuzz-cpp) developed by Max Bachmann and Adam Cohen. This package integrates the C++ implementation, allowing R users to access cutting-edge algorithms for fuzzy matching and text analysis.
## Installation
You can install directly from CRAN or the development version of pikchr from [GitHub](https://github.com/) with:
```{r eval=FALSE}
# install.packages("pak")
pak::pak("StrategicProjects/RapidFuzz")
library(RapidFuzz)
```
## Overview
The `RapidFuzz` package is an R wrapper around the highly efficient RapidFuzz C++ library. It provides implementations of multiple string comparison and similarity metrics, such as Levenshtein, Jaro-Winkler, and Damerau-Levenshtein distances. This package is particularly useful for applications like record linkage, approximate string matching, and fuzzy text processing.
String comparison algorithms calculate distances and similarities between two sequences of characters. These distances help to quantify how similar two strings are. For example, the Levenshtein distance measures the minimum number of single-character edits required to transform one string into another.
RapidFuzz leverages advanced algorithms to ensure high performance while maintaining accuracy. The original library is open-source and can be accessed on [RapidFuzz GitHub Repository](https://github.com/rapidfuzz/RapidFuzz).
---
## Functions
### Process String Function
- `processString()`: Process a string with options to trim, convert to lowercase, and transliterate to ASCII.
### Opcode Functions
- `opcodes_apply_str()`: Apply Opcodes to transform a string.
- `opcodes_apply_vec()`: Apply Opcodes to transform a string into a character vector.
### Edit Operation Utilities
- `get_editops()`: Retrieve Edit Operations between two strings.
### Edit Operations Functions
- `editops_apply_str()`: Apply Edit Operations to transform a string.
- `editops_apply_vec()`: Apply Edit Operations to transform a string into a character vector.
### Damerau-Levenshtein Functions
- `damerau_levenshtein_distance()`: Calculate the Damerau-Levenshtein Distance.
- `damerau_levenshtein_normalized_distance()`: Calculate the Normalized Damerau-Levenshtein Distance.
- `damerau_levenshtein_normalized_similarity()`: Calculate the Normalized Damerau-Levenshtein Similarity.
- `damerau_levenshtein_similarity()`: Calculate the Damerau-Levenshtein Similarity.
### Fuzz Ratio Functions
- `fuzz_QRatio()`: Perform a Quick Ratio Calculation.
- `fuzz_WRatio()`: Perform a Weighted Ratio Calculation.
- `fuzz_partial_ratio()`: Calculate Partial Ratio.
- `fuzz_ratio()`: Calculate a Simple Ratio.
- `fuzz_token_ratio()`: Calculate Combined Token Ratio.
- `fuzz_token_set_ratio()`: Perform Token Set Ratio Calculation.
- `fuzz_token_sort_ratio()`: Perform Token Sort Ratio Calculation.
### Hamming Functions
- `hamming_distance()`: Calculate Hamming Distance.
- `hamming_normalized_distance()`: Calculate Normalized Hamming Distance.
- `hamming_normalized_similarity()`: Calculate Normalized Hamming Similarity.
- `hamming_similarity()`: Calculate Hamming Similarity.
### Indel Functions
- `indel_distance()`: Calculate Indel Distance.
- `indel_normalized_distance()`: Calculate Normalized Indel Distance.
- `indel_normalized_similarity()`: Calculate Normalized Indel Similarity.
- `indel_similarity()`: Calculate Indel Similarity.
### Jaro Functions
- `jaro_distance()`: Calculate Jaro Distance.
- `jaro_normalized_distance()`: Calculate Normalized Jaro Distance.
- `jaro_normalized_similarity()`: Calculate Normalized Jaro Similarity.
- `jaro_similarity()`: Calculate Jaro Similarity.
### Jaro-Winkler Functions
- `jaro_winkler_distance()`: Calculate Jaro-Winkler Distance.
- `jaro_winkler_normalized_distance()`: Calculate Normalized Jaro-Winkler Distance.
- `jaro_winkler_normalized_similarity()`: Calculate Normalized Jaro-Winkler Similarity.
- `jaro_winkler_similarity()`: Calculate Jaro-Winkler Similarity.
### Longest Common Subsequence (LCSseq) Functions
- `lcs_seq_distance()`: Calculate LCSseq Distance.
- `lcs_seq_editops()`: Retrieve LCSseq Edit Operations.
- `lcs_seq_normalized_distance()`: Calculate Normalized LCSseq Distance.
- `lcs_seq_normalized_similarity()`: Calculate Normalized LCSseq Similarity.
- `lcs_seq_similarity()`: Calculate LCSseq Similarity.
### Levenshtein Functions
- `levenshtein_distance()`: Calculate Levenshtein Distance.
- `levenshtein_normalized_distance()`: Calculate Normalized Levenshtein Distance.
- `levenshtein_normalized_similarity()`: Calculate Normalized Levenshtein Similarity.
- `levenshtein_similarity()`: Calculate Levenshtein Similarity.
### Optimal String Alignment (OSA) Functions
- `osa_distance()`: Calculate Distance Using OSA.
- `osa_editops()`: Retrieve Edit Operations Using OSA.
- `osa_normalized_distance()`: Calculate Normalized Distance Using OSA.
- `osa_normalized_similarity()`: Calculate Normalized Similarity Using OSA.
- `osa_similarity()`: Calculate Similarity Using OSA.
### Prefix Functions
- `prefix_distance()`: Calculate the Prefix Distance between two strings.
- `prefix_normalized_distance()`: Calculate the Normalized Prefix Distance between two strings.
- `prefix_normalized_similarity()`: Calculate the Normalized Prefix Similarity between two strings.
- `prefix_similarity()`: Calculate the Prefix Similarity between two strings.
---
## Example Usage
### Prefix Functions
```R
prefix_distance("abcdef", "abcxyz")
# Output: 3
prefix_normalized_similarity("abcdef", "abcxyz", score_cutoff = 0.0)
# Output: 0.5
```
### Postfix Functions
```R
postfix_distance("abcdef", "xyzdef")
# Output: 3
```
### Damerau-Levenshtein Functions
```R
damerau_levenshtein_distance("abcdef", "abcfed")
# Output: 2
```
### Extract Matches
```R
# Example data
query <- "new york jets"
choices <- c("Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys")
score_cutoff <- 0.0
# Find the best match
extract_matches(query, choices, score_cutoff, scorer = "PartialRatio")
# Output:
# choice score
# 1 New York Jets 100.00000
# 2 New York Giants 81.81818
# 3 Atlanta Falcons 33.33333
```
---
### Original Library
The `RapidFuzz` package is a wrapper of the [RapidFuzz](https://github.com/maxbachmann/RapidFuzz) C++ library, developed by Max Bachmann and Adam Cohen. The library implements efficient algorithms for approximate string matching and comparison.
[](https://rapidfuzz.github.io/RapidFuzz/)]
Owner
- Name: Secretaria de Projetos Estratégicos
- Login: StrategicProjects
- Kind: organization
- Email: andre.leite@sepe.pe.gov.br
- Location: Brazil
- Website: monitoramento.sepe.pe.gov.br
- Repositories: 1
- Profile: https://github.com/StrategicProjects
GitHub Events
Total
- Watch event: 4
- Push event: 1
- Fork event: 1
Last Year
- Watch event: 4
- Push event: 1
- Fork event: 1
Packages
- Total packages: 1
-
Total downloads:
- cran 512 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 1
- Total maintainers: 1
cran.r-project.org: RapidFuzz
String Similarity Computation Using 'RapidFuzz'
- Homepage: <https://github.com/StrategicProjects/RapidFuzz>
- Documentation: http://cran.r-project.org/web/packages/RapidFuzz/RapidFuzz.pdf
- License: MIT + file LICENSE
-
Latest release: 1.0
published over 1 year ago
Rankings
Dependent packages count: 27.6%
Dependent repos count: 34.0%
Average: 49.5%
Downloads: 86.9%
Maintainers (1)
Last synced:
10 months ago
Dependencies
DESCRIPTION
cran
- Rcpp >= 1.0.13 imports