Science Score: 31.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.6%) to scientific vocabulary
Keywords
Repository
Optimal transport for data recoding
Basic Info
- Host: GitHub
- Owner: otrecoding
- License: lgpl-3.0
- Language: Julia
- Default Branch: master
- Homepage: https://otrecoding.github.io/OTRecod.jl/dev
- Size: 838 KB
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 1
- Releases: 0
Topics
Metadata Files
README.md
OTRecod.jl
Valérie Garès & Jérémy Omer, 2022. "Regularized Optimal Transport of Covariates and Outcomes in Data Recoding," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 117(537), pages 320-333, January.
Abstract: When databases are constructed from heterogeneous sources, it is not unusual that different encodings are used for the same outcome. In such case, it is necessary to recode the outcome variable before merging two databases. The method proposed for the recoding is an application of optimal transportation where we search for a bijective mapping between the distributions of such variable in two databases. In this article, we build upon the work by Garés et al., where they transport the distributions of categorical outcomes assuming that they are distributed equally in the two databases. Here, we extend the scope of the model to treat all the situations where the covariates explain the outcomes similarly in the two databases. In particular, we do not require that the outcomes be distributed equally. For this, we propose a model where joint distributions of outcomes and covariates are transported. We also propose to enrich the model by relaxing the constraints on marginal distributions and adding an L1 regularization term. The performances of the models are evaluated in a simulation study, and they are applied to a real dataset.
Keywords: https://ideas.repec.org/a/taf/jnlasa/v117y2022i537p320-333.html
Installation
The package runs on julia 1.1 and above.
In a Julia session switch to pkg> mode to add the package:
julia
julia>] # switch to pkg> mode
pkg> add https://github.com/otrecoding/OTRecod.jl
Alternatively, you can achieve the above using the Pkg API:
julia
julia> import Pkg
julia> Pkg.add(url = "https://github.com/otrecoding/OTRecod.jl")
When finished, make sure that you're back to the Julian prompt (julia>)
and bring OTRecod into scope:
julia
julia> using OTRecod
You can test the package with
julia
julia>] # switch to pkg> mode
pkg> test OTRecod
To run an example from a dataset
```julia julia> using OTRecod
help?> rundirectory search: rundirectory
rundirectory(path, method; outname="result.out", maxrelax=0.0, lambdareg=0.0, nbfiles=0, norme=0, percent_closest=0.2)
Run one given method on a given number of data files of a given directory. The data files must be the only files with extension ".txt" in the directory.
path: name of the directorymethod::groupor:jointmaxrelax: maximum percentage of deviation from expected probability masseslambda_reg: coefficient measuring the importance of the regularization termnbfiles: number of files considered, 0 if all the data files are testednorme: 0, 1 or 2, norm used for distances in the space of covariatespercent_closest: percent of closest neighbors taken in the computation of the costs (both distance and regularization related)observed: if nonempty, list of indices of the observed covariates; this allows to exclude some latent variables. ```
Copyright © 2020 Jeremy Omer jeremy.omer@insa-rennes.fr.
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License version 3 as published by the Free Software Foundation. See LICENSE file.
Owner
- Name: Optimal transport to recode data variables
- Login: otrecoding
- Kind: organization
- Location: Rennes
- Repositories: 3
- Profile: https://github.com/otrecoding
Citation (CITATION.bib)
@article{doi:10.1080/01621459.2020.1775615,
author = {Valérie Garès and Jérémy Omer},
title = {Regularized Optimal Transport of Covariates and Outcomes in Data Recoding},
journal = {Journal of the American Statistical Association},
volume = {117},
number = {537},
pages = {320-333},
year = {2022},
publisher = {Taylor & Francis},
doi = {10.1080/01621459.2020.1775615},
URL = { https://doi.org/10.1080/01621459.2020.1775615 },
eprint = { https://doi.org/10.1080/01621459.2020.1775615 },
abstract = { When databases are constructed from heterogeneous sources, it is not unusual that different encodings are used for the same outcome. In such case, it is necessary to recode the outcome variable before merging two databases. The method proposed for the recoding is an application of optimal transportation where we search for a bijective mapping between the distributions of such variable in two databases. In this article, we build upon the work by Garés et al., where they transport the distributions of categorical outcomes assuming that they are distributed equally in the two databases. Here, we extend the scope of the model to treat all the situations where the covariates explain the outcomes similarly in the two databases. In particular, we do not require that the outcomes be distributed equally. For this, we propose a model where joint distributions of outcomes and covariates are transported. We also propose to enrich the model by relaxing the constraints on marginal distributions and adding an L1 regularization term. The performances of the models are evaluated in a simulation study, and they are applied to a real dataset. The code used in the computational assessment and in the simulation of test cases is publicly available on Github repository: https://github.com/otrecoding/OTRecod.jl. }
}
GitHub Events
Total
- Push event: 1
- Fork event: 1
Last Year
- Push event: 1
- Fork event: 1
Issues and Pull Requests
Last synced: about 1 year ago
All Time
- Total issues: 1
- Total pull requests: 88
- Average time to close issues: N/A
- Average time to close pull requests: about 2 months
- Total issue authors: 1
- Total pull request authors: 2
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 45
- Bot issues: 0
- Bot pull requests: 86
Past Year
- Issues: 0
- Pull requests: 2
- Average time to close issues: N/A
- Average time to close pull requests: less than a minute
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- pnavaro (1)
Pull Request Authors
- github-actions[bot] (45)
- pnavaro (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- julia-actions/setup-julia v1 composite
- actions/cache v1 composite
- actions/checkout v3 composite
- codecov/codecov-action v1 composite
- julia-actions/julia-buildpkg latest composite
- julia-actions/julia-docdeploy latest composite
- julia-actions/julia-processcoverage v1 composite
- julia-actions/julia-runtest latest composite
- julia-actions/setup-julia v1 composite