joyn
joyn provides a set of tools to analyze the quality of merging (i.e., joining) data frames. It is a JOY to join with joyn
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.0%) to scientific vocabulary
Keywords
join
merge
Last synced: 6 months ago
·
JSON representation
Repository
joyn provides a set of tools to analyze the quality of merging (i.e., joining) data frames. It is a JOY to join with joyn
Basic Info
- Host: GitHub
- Owner: randrescastaneda
- License: other
- Language: R
- Default Branch: master
- Homepage: https://randrescastaneda.github.io/joyn/
- Size: 12.1 MB
Statistics
- Stars: 9
- Watchers: 1
- Forks: 4
- Open Issues: 3
- Releases: 13
Topics
join
merge
Created almost 5 years ago
· Last pushed 10 months ago
Metadata Files
Readme
Changelog
License
README.Rmd
---
output: github_document
editor_options:
markdown:
wrap: 72
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# joyn
`r badger::badge_cran_checks("joyn")` `r badger::badge_cran_release("joyn", "orange")` `r badger::badge_devel("randrescastaneda/joyn", "blue")` `r badger::badge_codecov("randrescastaneda/joyn")` `r badger::badge_lifecycle("maturing", "green")`
`joyn` empowers you to assess the results of joining data frames, making it easier and more efficient to combine your tables. Similar in philosophy to the `merge` command in `Stata`, `joyn` offers matching key variables and detailed join reports to ensure accurate and insightful results.
## Motivation
Merging tables in R can be tricky. Ensuring accuracy and understanding the joined data fully can be tedious tasks. That's where `joyn` comes in. Inspired by Stata's informative approach to merging, `joyn` makes the process smoother and more insightful.
While standard R merge functions are powerful, they often lack features like assessing join accuracy, detecting potential issues, and providing detailed reports. `joyn` fills this gap by offering:
* **Intuitive join handling:** Whether you're dealing with one-to-one, one-to-many, or many-to-many relationships, `joyn` helps you navigate them confidently.
* **Informative reports:** Get clear insights into the join process with helpful reports that identify duplicate observations, missing values, and potential inconsistencies.
## What makes `joyn` special?
While standard R merge functions offer basic functionality, `joyn` goes above and beyond by providing comprehensive tools and features tailored to your data joining needs:
**1. Flexibility in join types:** Choose your ideal join type ("left", "right", or "inner") with the `keep` argument. Unlike R's default, `joyn` performs a full join by default, ensuring all observations are included, but you have full control to tailor the results.
**2. Seamless variable handling:** No more wrestling with duplicate variable names! `joyn` offers multiple options:
* **Update values:** Use `update_values` or `update_NA` to automatically update conflicting variables in the left table with values from the right table.
* **Keep both (with different names):** Enable `keep_common_vars = TRUE` to retain both variables, each with a unique suffix.
* **Selective inclusion:** Choose specific variables from the right table with `y_vars_to_keep`, ensuring you get only the data you need.
**3. Relationship awareness:** `joyn` recognizes one-to-one, one-to-many, many-to-one, and many-to-many relationships between tables. While it defaults to many-to-many for compatibility, **remember this is often not ideal**. **Always specify the correct relationship using `by` arguments** for accurate and meaningful results.
**4. Join success at a glance:** Get instant feedback on your join with the automatically generated reporting variable. Identify potential issues like unmatched observations or missing values to ensure data integrity and informed decision-making.
By addressing these common pain points and offering enhanced flexibility, `joyn` empowers you to confidently and effectively join your data frames, paving the way for deeper insights and data-driven success.
## Performance and flexibility
### The cost of Reliability
While raw speed is essential, understanding your joins every step of the way is equally crucial. `joyn` prioritizes providing **insightful information** and preventing errors over solely focusing on speed. Unlike other functions, it adds:
* **Meticulous checks:** `joyn` performs comprehensive checks to ensure your join is accurate and avoids potential missteps, like unmatched observations or missing values.
* **Detailed reporting:** Get a clear picture of your join with a dedicated report, highlighting any issues you should be aware of.
* **User-friendly summary:** Quickly grasp the join's outcome with a concise overview presented in a clear table.
These valuable features contribute to a slightly slower performance compared to functions like `data.table::merge.data.table()` or `collapse::join()`. However, the benefits of **preventing errors and gaining invaluable insights** far outweigh the minor speed difference.
### Know your needs, choose your tool
* **Speed is your top priority for massive datasets?** Consider using `data.table` or `collapse` directly.
* **Seek clear understanding and error prevention for your joins?** `joyn` is your trusted guide.
### Protective by design
`joyn` intentionally restricts certain actions and provides clear messages when encountering unexpected data configurations. This might seem **opinionated**, but it's designed to **protect you from accidentally creating inaccurate or misleading joins**. This "safety net" empowers you to confidently merge your data, knowing `joyn` has your back.
### Flexibility
Currently, `joyn` focuses on the most common and valuable join types. Future development might explore expanding its flexibility based on user needs and feedback.
## `joyn` as wrapper: Familiar Syntax, Familiar Power
While `joyn::join()` offers the core functionality and Stata-inspired arguments, you might prefer a syntax more aligned with your existing workflow. `joyn` has you covered!
**Embrace base R and `data.table`:**
* `joyn::merge()`: Leverage familiar base R and `data.table` syntax for seamless integration with your existing code.
**Join with flair using `dplyr`:**
* `joyn::{dplyr verbs}()`: Enjoy the intuitive [verb-based](https://dplyr.tidyverse.org/reference/mutate-joins.html) syntax of `dplyr` for a powerful and expressive way to perform joins.
**Dive deeper:** Explore the corresponding vignettes to unlock the full potential of these alternative interfaces and find the perfect fit for your data manipulation style.
## Installation
You can install the stable version of `joyn` from
[CRAN](https://CRAN.R-project.org) with:
``` r
install.packages("joyn")
```
The development version from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
devtools::install_github("randrescastaneda/joyn")
```
## Examples
```{r example}
library(joyn)
library(data.table)
x1 = data.table(id = c(1L, 1L, 2L, 3L, NA_integer_),
t = c(1L, 2L, 1L, 2L, NA_integer_),
x = 11:15)
y1 = data.table(id = c(1,2, 4),
y = c(11L, 15L, 16))
x2 = data.table(id = c(1, 4, 2, 3, NA),
t = c(1L, 2L, 1L, 2L, NA_integer_),
x = c(16, 12, NA, NA, 15))
y2 = data.table(id = c(1, 2, 5, 6, 3),
yd = c(1, 2, 5, 6, 3),
y = c(11L, 15L, 20L, 13L, 10L),
x = c(16:20))
# using common variable `id` as key.
joyn(x = x1,
y = y1,
match_type = "m:1")
# keep just those observations that match
joyn(x = x1,
y = y1,
match_type = "m:1",
keep = "inner")
# Bad merge for not specifying by argument
joyn(x = x2,
y = y2,
match_type = "1:1")
# good merge, ignoring variable x from y
joyn(x = x2,
y = y2,
by = "id",
match_type = "1:1")
# update NAs in var x in table x from var x in y
joyn(x = x2,
y = y2,
by = "id",
update_NAs = TRUE)
# update values in var x in table x from var x in y
joyn(x = x2,
y = y2,
by = "id",
update_values = TRUE)
# do not bring any variable from y into x, just the report
joyn(x = x2,
y = y2,
by = "id",
y_vars_to_keep = NULL)
```
Owner
- Name: R.Andrés Castañeda
- Login: randrescastaneda
- Kind: user
- Location: Washington DC
- Company: The World Bank
- Website: https://randrescastaneda.rbind.io/
- Repositories: 57
- Profile: https://github.com/randrescastaneda
Economist/Data Scientist
GitHub Events
Total
- Create event: 3
- Release event: 1
- Issues event: 3
- Watch event: 2
- Delete event: 1
- Push event: 26
- Pull request review event: 3
- Pull request event: 7
- Fork event: 1
Last Year
- Create event: 3
- Release event: 1
- Issues event: 3
- Watch event: 2
- Delete event: 1
- Push event: 26
- Pull request review event: 3
- Pull request event: 7
- Fork event: 1
Committers
Last synced: over 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| R.Andres Castaneda | a****a@w****g | 179 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 11
- Total pull requests: 70
- Average time to close issues: 2 months
- Average time to close pull requests: 5 days
- Total issue authors: 3
- Total pull request authors: 5
- Average comments per issue: 0.73
- Average comments per pull request: 0.26
- Merged pull requests: 58
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 4
- Pull requests: 14
- Average time to close issues: 9 days
- Average time to close pull requests: 7 days
- Issue authors: 2
- Pull request authors: 4
- Average comments per issue: 0.25
- Average comments per pull request: 0.21
- Merged pull requests: 12
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- randrescastaneda (9)
- zander-prinsloo (1)
- krlmlr (1)
- SebKrantz (1)
Pull Request Authors
- randrescastaneda (47)
- RossanaTat (41)
- zander-prinsloo (17)
- krlmlr (2)
- olivroy (2)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- cran 381 last-month
- Total dependent packages: 0
- Total dependent repositories: 2
- Total versions: 7
- Total maintainers: 1
cran.r-project.org: joyn
Tool for Diagnosis of Tables Joins and Complementary Join Features
- Homepage: https://github.com/randrescastaneda/joyn
- Documentation: http://cran.r-project.org/web/packages/joyn/joyn.pdf
- License: MIT + file LICENSE
-
Latest release: 0.2.4
published about 1 year ago
Rankings
Dependent repos count: 19.1%
Forks count: 21.0%
Average: 27.9%
Dependent packages count: 28.6%
Stargazers count: 30.8%
Downloads: 39.9%
Maintainers (1)
Last synced:
6 months ago