joyn

joyn provides a set of tools to analyze the quality of merging (i.e., joining) data frames. It is a JOY to join with joyn

https://github.com/randrescastaneda/joyn

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.0%) to scientific vocabulary

Keywords

join merge
Last synced: 6 months ago · JSON representation

Repository

joyn provides a set of tools to analyze the quality of merging (i.e., joining) data frames. It is a JOY to join with joyn

Basic Info
Statistics
  • Stars: 9
  • Watchers: 1
  • Forks: 4
  • Open Issues: 3
  • Releases: 13
Topics
join merge
Created almost 5 years ago · Last pushed 10 months ago
Metadata Files
Readme Changelog License

README.Rmd

---
output: github_document
editor_options: 
  markdown: 
    wrap: 72
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# joyn



`r badger::badge_cran_checks("joyn")` `r badger::badge_cran_release("joyn", "orange")` `r badger::badge_devel("randrescastaneda/joyn", "blue")` `r badger::badge_codecov("randrescastaneda/joyn")` `r badger::badge_lifecycle("maturing", "green")`




`joyn` empowers you to assess the results of joining data frames, making it easier and more efficient to combine your tables. Similar in philosophy to the `merge` command in `Stata`, `joyn` offers matching key variables and detailed join reports to ensure accurate and insightful results.

## Motivation

Merging tables in R can be tricky. Ensuring accuracy and understanding the joined data fully can be tedious tasks. That's where `joyn` comes in. Inspired by Stata's informative approach to merging, `joyn` makes the process smoother and more insightful.

While standard R merge functions are powerful, they often lack features like assessing join accuracy, detecting potential issues, and providing detailed reports. `joyn` fills this gap by offering:

* **Intuitive join handling:** Whether you're dealing with one-to-one, one-to-many, or many-to-many relationships, `joyn` helps you navigate them confidently.
* **Informative reports:** Get clear insights into the join process with helpful reports that identify duplicate observations, missing values, and potential inconsistencies.

## What makes `joyn` special?

While standard R merge functions offer basic functionality, `joyn` goes above and beyond by providing comprehensive tools and features tailored to your data joining needs:

**1. Flexibility in join types:** Choose your ideal join type ("left", "right", or "inner") with the `keep` argument. Unlike R's default, `joyn` performs a full join by default, ensuring all observations are included, but you have full control to tailor the results.

**2. Seamless variable handling:** No more wrestling with duplicate variable names! `joyn` offers multiple options:

* **Update values:** Use `update_values` or `update_NA` to automatically update conflicting variables in the left table with values from the right table.

* **Keep both (with different names):** Enable `keep_common_vars = TRUE` to retain both variables, each with a unique suffix.

* **Selective inclusion:** Choose specific variables from the right table with `y_vars_to_keep`, ensuring you get only the data you need.

**3. Relationship awareness:** `joyn` recognizes one-to-one, one-to-many, many-to-one, and many-to-many relationships between tables. While it defaults to many-to-many for compatibility, **remember this is often not ideal**. **Always specify the correct relationship using `by` arguments** for accurate and meaningful results.

**4. Join success at a glance:** Get instant feedback on your join with the automatically generated reporting variable. Identify potential issues like unmatched observations or missing values to ensure data integrity and informed decision-making.

By addressing these common pain points and offering enhanced flexibility, `joyn` empowers you to confidently and effectively join your data frames, paving the way for deeper insights and data-driven success.


## Performance and flexibility

### The cost of Reliability

While raw speed is essential, understanding your joins every step of the way is equally crucial. `joyn` prioritizes providing **insightful information** and preventing errors over solely focusing on speed. Unlike other functions, it adds:

* **Meticulous checks:** `joyn` performs comprehensive checks to ensure your join is accurate and avoids potential missteps, like unmatched observations or missing values.
* **Detailed reporting:** Get a clear picture of your join with a dedicated report, highlighting any issues you should be aware of.
* **User-friendly summary:** Quickly grasp the join's outcome with a concise overview presented in a clear table.

These valuable features contribute to a slightly slower performance compared to functions like `data.table::merge.data.table()` or `collapse::join()`. However, the benefits of **preventing errors and gaining invaluable insights** far outweigh the minor speed difference.

### Know your needs, choose your tool

* **Speed is your top priority for massive datasets?** Consider using `data.table` or `collapse` directly.
* **Seek clear understanding and error prevention for your joins?** `joyn` is your trusted guide.

### Protective by design

`joyn` intentionally restricts certain actions and provides clear messages when encountering unexpected data configurations. This might seem **opinionated**, but it's designed to **protect you from accidentally creating inaccurate or misleading joins**. This "safety net" empowers you to confidently merge your data, knowing `joyn` has your back.

### Flexibility

Currently, `joyn` focuses on the most common and valuable join types. Future development might explore expanding its flexibility based on user needs and feedback.

## `joyn` as wrapper: Familiar Syntax, Familiar Power

While `joyn::join()` offers the core functionality and Stata-inspired arguments, you might prefer a syntax more aligned with your existing workflow. `joyn` has you covered!

**Embrace base R and `data.table`:**

* `joyn::merge()`: Leverage familiar base R and `data.table` syntax for seamless integration with your existing code.

**Join with flair using `dplyr`:**

* `joyn::{dplyr verbs}()`: Enjoy the intuitive [verb-based](https://dplyr.tidyverse.org/reference/mutate-joins.html) syntax of `dplyr` for a powerful and expressive way to perform joins.

**Dive deeper:** Explore the corresponding vignettes to unlock the full potential of these alternative interfaces and find the perfect fit for your data manipulation style.


## Installation

You can install the stable version of `joyn` from
[CRAN](https://CRAN.R-project.org) with:

``` r
install.packages("joyn")
```

The development version from [GitHub](https://github.com/) with:

``` r
# install.packages("devtools")
devtools::install_github("randrescastaneda/joyn")
```

## Examples

```{r example}

library(joyn)
library(data.table)

x1 = data.table(id = c(1L, 1L, 2L, 3L, NA_integer_),
                t  = c(1L, 2L, 1L, 2L, NA_integer_),
                x  = 11:15)

y1 = data.table(id = c(1,2, 4),
                y  = c(11L, 15L, 16))


x2 = data.table(id = c(1, 4, 2, 3, NA),
                t  = c(1L, 2L, 1L, 2L, NA_integer_),
                x  = c(16, 12, NA, NA, 15))


y2 = data.table(id = c(1, 2, 5, 6, 3),
                yd = c(1, 2, 5, 6, 3),
                y  = c(11L, 15L, 20L, 13L, 10L),
                x  = c(16:20))

# using common variable `id` as key.
joyn(x = x1, 
     y = y1,
     match_type = "m:1")

# keep just those observations that match
joyn(x = x1, 
     y = y1, 
     match_type = "m:1",
     keep = "inner")

# Bad merge for not specifying by argument
joyn(x = x2, 
     y = y2,
     match_type = "1:1")

# good merge, ignoring variable x from y
joyn(x = x2, 
     y = y2,
     by = "id",
     match_type = "1:1")

# update NAs in var x in table x from var x in y
joyn(x = x2, 
     y = y2, 
     by = "id", 
     update_NAs = TRUE)

# update values in var x in table x from var x in y
joyn(x = x2, 
     y = y2, 
     by = "id", 
     update_values = TRUE)


# do not bring any variable from y into x, just the report
joyn(x = x2, 
     y = y2, 
     by = "id", 
     y_vars_to_keep = NULL)

```

Owner

  • Name: R.Andrés Castañeda
  • Login: randrescastaneda
  • Kind: user
  • Location: Washington DC
  • Company: The World Bank

Economist/Data Scientist

GitHub Events

Total
  • Create event: 3
  • Release event: 1
  • Issues event: 3
  • Watch event: 2
  • Delete event: 1
  • Push event: 26
  • Pull request review event: 3
  • Pull request event: 7
  • Fork event: 1
Last Year
  • Create event: 3
  • Release event: 1
  • Issues event: 3
  • Watch event: 2
  • Delete event: 1
  • Push event: 26
  • Pull request review event: 3
  • Pull request event: 7
  • Fork event: 1

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 179
  • Total Committers: 1
  • Avg Commits per committer: 179.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
R.Andres Castaneda a****a@w****g 179
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 11
  • Total pull requests: 70
  • Average time to close issues: 2 months
  • Average time to close pull requests: 5 days
  • Total issue authors: 3
  • Total pull request authors: 5
  • Average comments per issue: 0.73
  • Average comments per pull request: 0.26
  • Merged pull requests: 58
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 4
  • Pull requests: 14
  • Average time to close issues: 9 days
  • Average time to close pull requests: 7 days
  • Issue authors: 2
  • Pull request authors: 4
  • Average comments per issue: 0.25
  • Average comments per pull request: 0.21
  • Merged pull requests: 12
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • randrescastaneda (9)
  • zander-prinsloo (1)
  • krlmlr (1)
  • SebKrantz (1)
Pull Request Authors
  • randrescastaneda (47)
  • RossanaTat (41)
  • zander-prinsloo (17)
  • krlmlr (2)
  • olivroy (2)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 381 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 2
  • Total versions: 7
  • Total maintainers: 1
cran.r-project.org: joyn

Tool for Diagnosis of Tables Joins and Complementary Join Features

  • Versions: 7
  • Dependent Packages: 0
  • Dependent Repositories: 2
  • Downloads: 381 Last month
Rankings
Dependent repos count: 19.1%
Forks count: 21.0%
Average: 27.9%
Dependent packages count: 28.6%
Stargazers count: 30.8%
Downloads: 39.9%
Maintainers (1)
Last synced: 6 months ago