joyn

joyn provides a set of tools to analyze the quality of merging (i.e., joining) data frames. It is a JOY to join with joyn

https://github.com/randrescastaneda/joyn

Keywords

join merge

Last synced: 6 months ago · JSON representation

Repository

joyn provides a set of tools to analyze the quality of merging (i.e., joining) data frames. It is a JOY to join with joyn

Basic Info

Host: GitHub
Owner: randrescastaneda
License: other
Language: R
Default Branch: master
Homepage: https://randrescastaneda.github.io/joyn/
Size: 12.1 MB

Statistics

Stars: 9
Watchers: 1
Forks: 4
Open Issues: 3
Releases: 13

Topics

join merge

Created almost 5 years ago · Last pushed 10 months ago

Metadata Files

Readme Changelog License

README.Rmd

---
output: github_document
editor_options: 
  markdown: 
    wrap: 72
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# joyn



`r badger::badge_cran_checks("joyn")` `r badger::badge_cran_release("joyn", "orange")` `r badger::badge_devel("randrescastaneda/joyn", "blue")` `r badger::badge_codecov("randrescastaneda/joyn")` `r badger::badge_lifecycle("maturing", "green")`




`joyn` empowers you to assess the results of joining data frames, making it easier and more efficient to combine your tables. Similar in philosophy to the `merge` command in `Stata`, `joyn` offers matching key variables and detailed join reports to ensure accurate and insightful results.

## Motivation

Merging tables in R can be tricky. Ensuring accuracy and understanding the joined data fully can be tedious tasks. That's where `joyn` comes in. Inspired by Stata's informative approach to merging, `joyn` makes the process smoother and more insightful.

While standard R merge functions are powerful, they often lack features like assessing join accuracy, detecting potential issues, and providing detailed reports. `joyn` fills this gap by offering:

* **Intuitive join handling:** Whether you're dealing with one-to-one, one-to-many, or many-to-many relationships, `joyn` helps you navigate them confidently.
* **Informative reports:** Get clear insights into the join process with helpful reports that identify duplicate observations, missing values, and potential inconsistencies.

## What makes `joyn` special?

While standard R merge functions offer basic functionality, `joyn` goes above and beyond by providing comprehensive tools and features tailored to your data joining needs:

**1. Flexibility in join types:** Choose your ideal join type ("left", "right", or "inner") with the `keep` argument. Unlike R's default, `joyn` performs a full join by default, ensuring all observations are included, but you have full control to tailor the results.

**2. Seamless variable handling:** No more wrestling with duplicate variable names! `joyn` offers multiple options:

* **Update values:** Use `update_values` or `update_NA` to automatically update conflicting variables in the left table with values from the right table.

* **Keep both (with different names):** Enable `keep_common_vars = TRUE` to retain both variables, each with a unique suffix.

* **Selective inclusion:** Choose specific variables from the right table with `y_vars_to_keep`, ensuring you get only the data you need.

**3. Relationship awareness:** `joyn` recognizes one-to-one, one-to-many, many-to-one, and many-to-many relationships between tables. While it defaults to many-to-many for compatibility, **remember this is often not ideal**. **Always specify the correct relationship using `by` arguments** for accurate and meaningful results.

**4. Join success at a glance:** Get instant feedback on your join with the automatically generated reporting variable. Identify potential issues like unmatched observations or missing values to ensure data integrity and informed decision-making.

By addressing these common pain points and offering enhanced flexibility, `joyn` empowers you to confidently and effectively join your data frames, paving the way for deeper insights and data-driven success.


## Performance and flexibility

### The cost of Reliability

While raw speed is essential, understanding your joins every step of the way is equally crucial. `joyn` prioritizes providing **insightful information** and preventing errors over solely focusing on speed. Unlike other functions, it adds:

* **Meticulous checks:** `joyn` performs comprehensive checks to ensure your join is accurate and avoids potential missteps, like unmatched observations or missing values.
* **Detailed reporting:** Get a clear picture of your join with a dedicated report, highlighting any issues you should be aware of.
* **User-friendly summary:** Quickly grasp the join's outcome with a concise overview presented in a clear table.

These valuable features contribute to a slightly slower performance compared to functions like `data.table::merge.data.table()` or `collapse::join()`. However, the benefits of **preventing errors and gaining invaluable insights** far outweigh the minor speed difference.

### Know your needs, choose your tool

* **Speed is your top priority for massive datasets?** Consider using `data.table` or `collapse` directly.
* **Seek clear understanding and error prevention for your joins?** `joyn` is your trusted guide.

### Protective by design

`joyn` intentionally restricts certain actions and provides clear messages when encountering unexpected data configurations. This might seem **opinionated**, but it's designed to **protect you from accidentally creating inaccurate or misleading joins**. This "safety net" empowers you to confidently merge your data, knowing `joyn` has your back.

### Flexibility

Currently, `joyn` focuses on the most common and valuable join types. Future development might explore expanding its flexibility based on user needs and feedback.

## `joyn` as wrapper: Familiar Syntax, Familiar Power

While `joyn::join()` offers the core functionality and Stata-inspired arguments, you might prefer a syntax more aligned with your existing workflow. `joyn` has you covered!

**Embrace base R and `data.table`:**

* `joyn::merge()`: Leverage familiar base R and `data.table` syntax for seamless integration with your existing code.

**Join with flair using `dplyr`:**

* `joyn::{dplyr verbs}()`: Enjoy the intuitive [verb-based](https://dplyr.tidyverse.org/reference/mutate-joins.html) syntax of `dplyr` for a powerful and expressive way to perform joins.

**Dive deeper:** Explore the corresponding vignettes to unlock the full potential of these alternative interfaces and find the perfect fit for your data manipulation style.


## Installation

You can install the stable version of `joyn` from
[CRAN](https://CRAN.R-project.org) with:

``` r
install.packages("joyn")
```

The development version from [GitHub](https://github.com/) with:

``` r
# install.packages("devtools")
devtools::install_github("randrescastaneda/joyn")
```

## Examples

```{r example}

library(joyn)
library(data.table)

x1 = data.table(id = c(1L, 1L, 2L, 3L, NA_integer_),
                t  = c(1L, 2L, 1L, 2L, NA_integer_),
                x  = 11:15)

y1 = data.table(id = c(1,2, 4),
                y  = c(11L, 15L, 16))


x2 = data.table(id = c(1, 4, 2, 3, NA),
                t  = c(1L, 2L, 1L, 2L, NA_integer_),
                x  = c(16, 12, NA, NA, 15))


y2 = data.table(id = c(1, 2, 5, 6, 3),
                yd = c(1, 2, 5, 6, 3),
                y  = c(11L, 15L, 20L, 13L, 10L),
                x  = c(16:20))

# using common variable `id` as key.
joyn(x = x1, 
     y = y1,
     match_type = "m:1")

# keep just those observations that match
joyn(x = x1, 
     y = y1, 
     match_type = "m:1",
     keep = "inner")

# Bad merge for not specifying by argument
joyn(x = x2, 
     y = y2,
     match_type = "1:1")

# good merge, ignoring variable x from y
joyn(x = x2, 
     y = y2,
     by = "id",
     match_type = "1:1")

# update NAs in var x in table x from var x in y
joyn(x = x2, 
     y = y2, 
     by = "id", 
     update_NAs = TRUE)

# update values in var x in table x from var x in y
joyn(x = x2, 
     y = y2, 
     by = "id", 
     update_values = TRUE)


# do not bring any variable from y into x, just the report
joyn(x = x2, 
     y = y2, 
     by = "id", 
     y_vars_to_keep = NULL)

```

Owner

Name: R.Andrés Castañeda
Login: randrescastaneda
Kind: user
Location: Washington DC
Company: The World Bank

Website: https://randrescastaneda.rbind.io/
Repositories: 57
Profile: https://github.com/randrescastaneda

Economist/Data Scientist

GitHub Events

Total

Create event: 3
Release event: 1
Issues event: 3
Watch event: 2
Delete event: 1
Push event: 26
Pull request review event: 3
Pull request event: 7
Fork event: 1

Last Year

Create event: 3
Release event: 1
Issues event: 3
Watch event: 2
Delete event: 1
Push event: 26
Pull request review event: 3
Pull request event: 7
Fork event: 1

Committers

Last synced: over 2 years ago

All Time

Total Commits: 179
Total Committers: 1
Avg Commits per committer: 179.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
R.Andres Castaneda	a**a@w**g	179

Committer Domains (Top 20 + Academic)

worldbank.org: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 11
Total pull requests: 70
Average time to close issues: 2 months
Average time to close pull requests: 5 days
Total issue authors: 3
Total pull request authors: 5
Average comments per issue: 0.73
Average comments per pull request: 0.26
Merged pull requests: 58
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 4
Pull requests: 14
Average time to close issues: 9 days
Average time to close pull requests: 7 days
Issue authors: 2
Pull request authors: 4
Average comments per issue: 0.25
Average comments per pull request: 0.21
Merged pull requests: 12
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

randrescastaneda (9)
zander-prinsloo (1)
krlmlr (1)
SebKrantz (1)

Pull Request Authors

randrescastaneda (47)
RossanaTat (41)
zander-prinsloo (17)
krlmlr (2)
olivroy (2)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- cran 381 last-month

Total dependent packages: 0
Total dependent repositories: 2
Total versions: 7
Total maintainers: 1

cran.r-project.org: joyn

Tool for Diagnosis of Tables Joins and Complementary Join Features

Homepage: https://github.com/randrescastaneda/joyn
Documentation: http://cran.r-project.org/web/packages/joyn/joyn.pdf
License: MIT + file LICENSE
Latest release: 0.2.4
published about 1 year ago

Versions: 7
Dependent Packages: 0
Dependent Repositories: 2
Downloads: 381 Last month

Rankings

Dependent repos count: 19.1%

Forks count: 21.0%

Average: 27.9%

Dependent packages count: 28.6%

Stargazers count: 30.8%

Downloads: 39.9%

Maintainers (1)

acastanedaa@worldbank.org

Last synced: 6 months ago

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

joyn

Science Score: 26.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.Rmd

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

cran.r-project.org: joyn

Rankings

Maintainers (1)