pkgmatch
Find R packages matching either descriptions or other R packages
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (17.1%) to scientific vocabulary
Keywords
embeddings
llms
natural-language-processing
r
Last synced: 6 months ago
·
JSON representation
Repository
Find R packages matching either descriptions or other R packages
Basic Info
- Host: GitHub
- Owner: ropensci-review-tools
- License: other
- Language: R
- Default Branch: main
- Homepage: http://docs.ropensci.org/pkgmatch/
- Size: 2.7 MB
Statistics
- Stars: 6
- Watchers: 1
- Forks: 1
- Open Issues: 3
- Releases: 0
Topics
embeddings
llms
natural-language-processing
r
Created over 1 year ago
· Last pushed 6 months ago
Metadata Files
Readme
Contributing
License
Codemeta
README.Rmd
---
title: "pkgmatch"
output:
md_document:
variant: gfm
rmarkdown::html_vignette:
self_contained: no
---
[](https://github.com/ropensci-review-tools/pkgmatch/actions?query=workflow%3AR-CMD-check)
[](https://app.codecov.io/gh/ropensci-review-tools/pkgmatch)
[](https://www.repostatus.org/#active)
# pkgmatch
A tool that uses language models to help find R packages, by matching packages
either to a text description, or to entire packages. Can find matching
packages either from rOpenSci's [suite of
packages](https://ropensci.org/packages), or from all packages currently on
[CRAN](https://cran.r-project.org).
## Installation
This package relies on a locally-running instance of
[ollama](https://ollama.com). Procedures for setting that up are described in a
[separate vignette](https://docs.ropensci.org/pkgmatch/articles/B_ollama.html)
(`vignette("ollama", package = "pkgmatch")`). Although some functionality of
this package may be used without ollama, the main functions require ollama to
be installed.
Once ollama is running, the easiest way to install this package is via the
[associated
`r-universe`](https://ropensci-review-tools.r-universe.dev/). As shown
there, simply enable the universe with
```{r options, eval = FALSE}
options (repos = c (
ropenscireviewtools = "https://ropensci-review-tools.r-universe.dev",
CRAN = "https://cloud.r-project.org"
))
```
And then install the usual way with,
```{r install, eval = FALSE}
install.packages ("pkgmatch")
```
Alternatively, the package can be installed by first installing either the
[remotes](https://remotes.r-lib.org) or [pak](https://pak.r-lib.org/) packages
and running one of the following lines:
```{r remotes, eval = FALSE}
remotes::install_github ("ropensci-review-tools/pkgmatch")
pak::pkg_install ("ropensci-review-tools/pkgmatch")
```
The package can then loaded for use with
```{r library, eval = TRUE}
library (pkgmatch)
```
The [`ollama_check()`
function](https://docs.ropensci.org/pkgmatch/reference/ollama_check.html) can
then be used to confirm that [ollama](https://ollama.com) is up and running as
expected.
## Using the `pkgmatch` package
The 'pkgmatch' package takes input either from a text description or local path
to an R package, and finds matching packages based on both Language Model (LM)
embeddings, and more traditional text and code matching algorithms.
The package has two main functions:
- [`pkgmatch_similar_pkgs()`](https://docs.ropensci.org/pkgmatch/reference/pkgmatch_similar_pkgs.html)
to find similar rOpenSci or CRAN packages based on input as either a local path to
an entire package, the name of an installed package, or as a single descriptive
text string; and
- [`pkgmatch_similar_fns()`](https://docs.ropensci.org/pkgmatch/reference/pkgmatch_similar_fns.html)
to find similar functions from rOpenSci packages based on descriptive text
input. (Not available for functions from CRAN packages.)
The following code demonstrates how these functions work, first matching
general text strings packages from rOpenSci:
```{r pkgs1}
input <- "
Packages for analysing evolutionary trees, with a particular focus
on visualising inter-relationships among distinct trees.
"
pkgmatch_similar_pkgs (input, corpus = "ropensci")
```
The corpus parameter must be specified as one of "ropensci" or "cran"
(case-insensitive). The CRAN corpus is much larger than the rOpenSci corpus,
and matching for `corpus = "cran"` will generally take notably longer.
Websites of packages returned by [the `pkgmatch_similar_pkgs()`
function](https://docs.ropensci.org/pkgmatch/reference/pkgmatch_similar_pkgs.html)
can be automatically opened, either by calling the function with `browse =
TRUE`, or by storing the return value of [the `pkgmatch_similar_pkgs()`
function](https://docs.ropensci.org/pkgmatch/reference/pkgmatch_similar_pkgs.html)
as an object and passing that to [the `pkgmatch_browse()`
function](https://docs.ropensci.org/pkgmatch/reference/pkgmatch_browse.html).
### Matching entire packages
The `input` parameter can also specify an entire package, either as a local
path to a package directory, or the name of an installed package. To
demonstrate that, the following code downloads a `.tar.gz` file of the `httr2`
package from CRAN:
```{r download-httr2}
pkg <- "httr2"
p <- available.packages () |>
data.frame () |>
dplyr::filter (Package == pkg)
url_base <- "https://cran.r-project.org/src/contrib/"
url <- paste0 (url_base, p$Package, "_", p$Version, ".tar.gz")
path <- fs::path (fs::path_temp (), basename (url))
download.file (url, destfile = path, quiet = TRUE)
```
The path to that package (in this case as a compressed tarball) can then be
passed to the
[`pkgmatch_similar_pkgs()`](https://docs.ropensci.org/pkgmatch/reference/pkgmatch_similar_pkgs.html)
function:
```{r pkgmatch-similar-cran}
pkgmatch_similar_pkgs (path, corpus = "cran")
```
The result includes the top five matches based from both text and code of the
input package. The input package itself is the second-placed match in both
cases, and not the top match. This happens because embeddings are "chunked" or
randomly permuted, and because matches are statistical and not deterministic.
Nevertheless, the only two packages which appear in the top five matches on
both lists are the package itself, `httr2`, and the very closely related,
`httptest2` package for testing output of `httr2`. See the [vignette on _Why
are the results not what I
expect?_](https://docs.ropensci.org/pkgmatch/articles/F_why-are-the-results-not-what-i-expect.html)
for more detail on how matches are generated.
## Finding functions
There is an additional function to find functions within packages which best
match a text description.
```{r fns1}
input <- "A function to label a set of geographic coordinates"
pkgmatch_similar_fns (input)
```
```{r fns2}
input <- "Identify genetic sequences matching a given input fragment"
pkgmatch_similar_fns (input)
```
Setting `browse = TRUE` will then open the documentation pages corresponding to
those best-matching functions.
## Package vignettes
The `pkgmatch` package includes the following vignettes:
- [A main _pkgmatch_
vignette](https://docs.ropensci.org/pkgmatch/articles/pkgmatch.html) which
gives an overview of how to use the package.
- [_Example
applications_](https://docs.ropensci.org/pkgmatch/articles/A_extended-use-case.html)
which describes several different example applications of `pkgmatch`, and
illustrates the ways by which this package provides different kind of results
to search engines and to general language model interfaces.
- [_Before you begin: ollama
installation_](https://docs.ropensci.org/pkgmatch/articles/B_ollama.html)
which describes how to install and setup the [`ollama`](https://ollama.com)
software needed to download and run the language models.
- [_How does pkgmatch
work?_](https://docs.ropensci.org/pkgmatch/articles/C_how-does-it-work.html)
which provides detailed explanations of the matching algorithms implemented in
the package.
- [_Data caching and
updating_](https://docs.ropensci.org/pkgmatch/articles/D_data-caching-and-updating.html)
which describes how `pkgmatch` caches and updates the language model results
for the individual corpora.
- [_Why local language models
(LMs)?_](https://docs.ropensci.org/pkgmatch/articles/E_why-local-lms.html)
which explains why `pkgmatch` uses locally-running language models, instead of
relying on external APIs.
- [_Why are the results not what I
expect?_](https://docs.ropensci.org/pkgmatch/articles/F_why-are-the-results-not-what-i-expect.html)
which explains in detail why matches generated by `pkgmatch` may sometimes
differ from what you might expect, and includes advice for how to improve
matches.
## Prior Art
- The [`utils::RSiteSearch()`
function](https://stat.ethz.ch/R-manual/R-devel/library/utils/html/RSiteSearch.html).
- The [`sos` package](https://github.com/sbgraves237/sos) that queries the
"RSiteSearch" database.
## Contributors
All contributions to this project are gratefully acknowledged using the [`allcontributors` package](https://github.com/ropensci/allcontributors) following the [allcontributors](https://allcontributors.org) specification. Contributions of any kind are welcome!
### Code
|
mpadge |
Bisaloo |
|
MargaretSiple-NOAA |
maelle |
Selbosh |
nhejazi |
|
agricolamz |
Owner
- Name: ropensci-review-tools
- Login: ropensci-review-tools
- Kind: organization
- Website: https://ropensci-review-tools.readthedocs.io/
- Repositories: 10
- Profile: https://github.com/ropensci-review-tools
Tools for automation of software review at rOpenSci
CodeMeta (codemeta.json)
{
"@context": "https://doi.org/10.5063/schema/codemeta-2.0",
"@type": "SoftwareSourceCode",
"identifier": "pkgmatch",
"description": "Find R packages matching either descriptions or other R packages.",
"name": "pkgmatch: Find R Packages Matching Either Descriptions or Other R Packages",
"relatedLink": "https://docs.ropensci.org/pkgmatch/",
"codeRepository": "https://github.com/ropensci-review-tools/pkgmatch",
"issueTracker": "https://github.com/ropensci-review-tools/pkgmatch/issues",
"license": "https://spdx.org/licenses/MIT",
"version": "0.5.0.098",
"programmingLanguage": {
"@type": "ComputerLanguage",
"name": "R",
"url": "https://r-project.org"
},
"runtimePlatform": "R version 4.5.1 (2025-06-13)",
"author": [
{
"@type": "Person",
"givenName": "Mark",
"familyName": "Padgham",
"email": "mark.padgham@email.com",
"@id": "https://orcid.org/0000-0003-2172-5265"
}
],
"contributor": [
{
"@type": "Person",
"givenName": "Davis",
"familyName": "Vaughan",
"email": "davis@posit.co"
}
],
"maintainer": [
{
"@type": "Person",
"givenName": "Mark",
"familyName": "Padgham",
"email": "mark.padgham@email.com",
"@id": "https://orcid.org/0000-0003-2172-5265"
}
],
"softwareSuggestions": [
{
"@type": "SoftwareApplication",
"identifier": "gert",
"name": "gert",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=gert"
},
{
"@type": "SoftwareApplication",
"identifier": "hms",
"name": "hms",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=hms"
},
{
"@type": "SoftwareApplication",
"identifier": "httptest2",
"name": "httptest2",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=httptest2"
},
{
"@type": "SoftwareApplication",
"identifier": "jsonlite",
"name": "jsonlite",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=jsonlite"
},
{
"@type": "SoftwareApplication",
"identifier": "pkgbuild",
"name": "pkgbuild",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=pkgbuild"
},
{
"@type": "SoftwareApplication",
"identifier": "rappdirs",
"name": "rappdirs",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=rappdirs"
},
{
"@type": "SoftwareApplication",
"identifier": "roxygen2",
"name": "roxygen2",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=roxygen2"
},
{
"@type": "SoftwareApplication",
"identifier": "testthat",
"name": "testthat",
"version": ">= 3.0.0",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=testthat"
},
{
"@type": "SoftwareApplication",
"identifier": "withr",
"name": "withr",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=withr"
},
{
"@type": "SoftwareApplication",
"identifier": "knitr",
"name": "knitr",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=knitr"
},
{
"@type": "SoftwareApplication",
"identifier": "rmarkdown",
"name": "rmarkdown",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=rmarkdown"
}
],
"softwareRequirements": {
"1": {
"@type": "SoftwareApplication",
"identifier": "R",
"name": "R",
"version": ">= 4.1.0"
},
"2": {
"@type": "SoftwareApplication",
"identifier": "brio",
"name": "brio",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=brio"
},
"3": {
"@type": "SoftwareApplication",
"identifier": "checkmate",
"name": "checkmate",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=checkmate"
},
"4": {
"@type": "SoftwareApplication",
"identifier": "cli",
"name": "cli",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=cli"
},
"5": {
"@type": "SoftwareApplication",
"identifier": "curl",
"name": "curl",
"version": ">= 6.0.0",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=curl"
},
"6": {
"@type": "SoftwareApplication",
"identifier": "dplyr",
"name": "dplyr",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=dplyr"
},
"7": {
"@type": "SoftwareApplication",
"identifier": "fs",
"name": "fs",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=fs"
},
"8": {
"@type": "SoftwareApplication",
"identifier": "httr2",
"name": "httr2",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=httr2"
},
"9": {
"@type": "SoftwareApplication",
"identifier": "memoise",
"name": "memoise",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=memoise"
},
"10": {
"@type": "SoftwareApplication",
"identifier": "methods",
"name": "methods"
},
"11": {
"@type": "SoftwareApplication",
"identifier": "pbapply",
"name": "pbapply",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=pbapply"
},
"12": {
"@type": "SoftwareApplication",
"identifier": "piggyback",
"name": "piggyback",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=piggyback"
},
"13": {
"@type": "SoftwareApplication",
"identifier": "Rcpp",
"name": "Rcpp",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=Rcpp"
},
"14": {
"@type": "SoftwareApplication",
"identifier": "rvest",
"name": "rvest",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=rvest"
},
"15": {
"@type": "SoftwareApplication",
"identifier": "tibble",
"name": "tibble",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=tibble"
},
"16": {
"@type": "SoftwareApplication",
"identifier": "tidyr",
"name": "tidyr",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=tidyr"
},
"17": {
"@type": "SoftwareApplication",
"identifier": "tokenizers",
"name": "tokenizers",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=tokenizers"
},
"18": {
"@type": "SoftwareApplication",
"identifier": "treesitter",
"name": "treesitter",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=treesitter"
},
"19": {
"@type": "SoftwareApplication",
"identifier": "treesitter.r",
"name": "treesitter.r",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=treesitter.r"
},
"20": {
"@type": "SoftwareApplication",
"identifier": "vctrs",
"name": "vctrs",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=vctrs"
},
"SystemRequirements": {}
},
"fileSize": "3466.832KB",
"readme": "https://github.com/ropensci-review-tools/pkgmatch/blob/main/README.md",
"contIntegration": [
"https://github.com/ropensci-review-tools/pkgmatch/actions?query=workflow%3AR-CMD-check",
"https://app.codecov.io/gh/ropensci-review-tools/pkgmatch"
],
"developmentStatus": "https://www.repostatus.org/#active",
"keywords": [
"embeddings",
"llms",
"natural-language-processing",
"r"
]
}
GitHub Events
Total
- Issues event: 60
- Watch event: 4
- Delete event: 94
- Issue comment event: 119
- Push event: 188
- Pull request event: 172
- Fork event: 1
- Create event: 90
Last Year
- Issues event: 60
- Watch event: 4
- Delete event: 94
- Issue comment event: 119
- Push event: 188
- Pull request event: 172
- Fork event: 1
- Create event: 90
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 23
- Total pull requests: 65
- Average time to close issues: 9 days
- Average time to close pull requests: about 4 hours
- Total issue authors: 5
- Total pull request authors: 1
- Average comments per issue: 0.65
- Average comments per pull request: 0.43
- Merged pull requests: 53
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 23
- Pull requests: 65
- Average time to close issues: 9 days
- Average time to close pull requests: about 4 hours
- Issue authors: 5
- Pull request authors: 1
- Average comments per issue: 0.65
- Average comments per pull request: 0.43
- Merged pull requests: 53
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- mpadge (34)
- Bisaloo (2)
- maelle (1)
- MargaretSiple-NOAA (1)
- Selbosh (1)
- nhejazi (1)
Pull Request Authors
- mpadge (117)
- Bisaloo (1)
Top Labels
Issue Labels
bug (2)
Pull Request Labels
Dependencies
.github/workflows/check-standard.yaml
actions
- actions/checkout v4 composite
- r-lib/actions/check-r-package v2 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/test-coverage.yaml
actions
- actions/checkout v4 composite
- actions/upload-artifact v4 composite
- codecov/codecov-action v4 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
.hooks/description
cran
DESCRIPTION
cran
- testthat >= 3.0.0 suggests