multicolumncorruption

https://github.com/ptsialis/multicolumncorruption

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
✓
Academic publication links
Links to: frontiersin.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.4%) to scientific vocabulary

Last synced: 9 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: ptsialis
License: apache-2.0
Language: Jupyter Notebook
Default Branch: main
Size: 22.8 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created almost 2 years ago · Last pushed over 1 year ago

Metadata Files

Readme Changelog License Citation Authors

Source Code for Masterthesis: Assessing and Predicting the Optimal Imputation Method Regarding the Predictive Performance of Machine Learning Models

Target

An optimization approach for decision making in data science based on benchmarking the impact of different data imputation methods on the predictive performance of machine learning models.

Disclaimer

This is research project and not intended for production usage.

This Masterthesis is building on the work of Sebastian Jäger, Arndt Allhorn and Felix Bießmann, as described in their paper: "A Benchmark for Data Imputation Methods" https://www.frontiersin.org/article/10.3389/fdata.2021.693674

Installation

Steps to set up the required conda environment:

create an environment Data-Imputation-Thesis with conda, bash conda env create -f environment.yaml
activate the new environment bash conda activate Data-Imputation-Thesis
install jenga bash cd src/jenga python setup.py develop
install data-imputation-paper bash cd ../.. python setup.py develop # or `install` It might be necessary to install the required GPU drivers manually (Version might change based on used hardware): bash conda install -c conda-forge cudatoolkit=11.7.0 pip install nvidia-cudnn-cu11==8.6.0.163 Activate the packages every time you activate the environment: bash CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)")) export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib https://www.tensorflow.org/install/pip

Usage

Imputation Experiments execute run-experiment.pywith the required settings (explained below). The experiment name must contain `corrupted.
Baseline Experiments execute run-experiment.pywith the required settings (explained below). The experiment name is not allowed to contain `corrupted.
Imputation Experiments execute run-experiment-subset.pywith the required settings (explained below). The experiment name must contain `corrupted.
Baseline Experiments execute run-experiment-subset.pywith the required settings (explained below). The experiment name is not allowed to contain `corrupted.
Examples to start the experiments start requires the ID of the dataset (737), imputation method (mode), experiment name (testexperiment), missing fractions (0.3, 0.5), missing patterns (MAR,MCAR), strategies (singlesingle), number of repetitions (3), and a path to the storage folder for results (../results).

bash python run-experiment.py 737 mode test_experiment --missing-fractions 0.3,0.5 --missing-types MAR,MCAR --strategies single_single --num-repetitions 3 --base-path ../results

Note from Sebastian Jäger

This project has been set up using PyScaffold 3.2.2 and the dsproject extension 0.4. For details and usage information on PyScaffold see https://pyscaffold.org/.

Owner

Login: ptsialis
Kind: user

Repositories: 1
Profile: https://github.com/ptsialis

Citation (CITATION.bib)

@ARTICLE{optimal_imputation_dittrich_2023,
	AUTHOR={Dittrich, Pascal},
	TITLE={Assessing and Predicting the Optimal Impution Method Regarding the Predictive Performance of Machine Learning Models},
	YEAR={2023},
	MONTH={May},
	URL={-},
	ABSTRACT={-}
}

GitHub Events

Total

Push event: 1

Last Year

Push event: 1

Dependencies

cluster/docker/Dockerfile docker

python 3.8.8 build

.eggs/PyScaffold-3.2.3-py3.8.egg/EGG-INFO/requires.txt pypi

cookiecutter *
django *
flake8 *
pyscaffoldext-custom-extension *
pyscaffoldext-dsproject *
pyscaffoldext-markdown *
pyscaffoldext-pyproject *
pytest *
pytest-cov *
pytest-fixture-config *
pytest-shutil *
pytest-virtualenv *
pytest-xdist *
setuptools >=38.3
sphinx *

condaenv.q_vpwh2j.requirements.txt pypi

autokeras *
flake8 *
flake8-mypy *
jedi ==0.17.2
openml *
pydocstyle *
tensorflow *
typer *

environment.yaml pypi

autokeras *
flake8 *
flake8-mypy *
jedi ==0.17.2
openml *
plotly ==5.14.1
pydocstyle *
tensorflow ==2.10
tensorflow-text ==2.10
typer *

environment.yml pypi

absl-py ==2.1.0
accelerate ==0.21.0
aiohttp ==3.9.5
aiohttp-cors ==0.7.0
aiosignal ==1.3.1
annotated-types ==0.7.0
antlr4-python3-runtime ==4.9.3
astunparse ==1.6.3
async-timeout ==4.0.3
autogluon ==1.1.1
autogluon-common ==1.1.1
autogluon-core ==1.1.1
autogluon-features ==1.1.1
autogluon-multimodal ==1.1.1
autogluon-tabular ==1.1.1
autogluon-timeseries ==1.1.1
autokeras ==1.1.0
blis ==0.7.11
boto3 ==1.34.144
botocore ==1.34.144
catalogue ==2.0.10
catboost ==1.2.5
category-encoders ==2.6.3
cloudpathlib ==0.18.1
cloudpickle ==3.0.0
coloredlogs ==15.0.1
colorful ==0.5.6
confection ==0.1.5
cymem ==2.0.8
dask ==2023.5.0
datasets ==2.20.0
dill ==0.3.8
distributed ==2023.5.0
dm-tree ==0.1.8
evaluate ==0.4.2
fastai ==2.7.15
fastcore ==1.5.54
fastdownload ==0.0.7
fastprogress ==1.0.3
flake8 ==7.1.0
flake8-mypy ==17.8.0
flatbuffers ==24.3.25
frozenlist ==1.4.1
fsspec ==2024.6.1
gast ==0.4.0
gdown ==5.2.0
gluonts ==0.15.1
google-api-core ==2.19.1
google-auth ==2.30.0
google-auth-oauthlib ==0.4.6
google-pasta ==0.2.0
googleapis-common-protos ==1.63.2
grpcio ==1.64.1
h5py ==3.11.0
huggingface-hub ==0.23.4
humanfriendly ==10.0
hyperopt ==0.2.7
imagecorruptions ==1.1.2
imageio ==2.34.1
imgaug ==0.4.0
jedi ==0.17.2
jmespath ==1.0.1
keras ==2.10.0
keras-core ==0.1.5
keras-nlp ==0.6.1
keras-preprocessing ==1.1.2
keras-tuner ==1.4.7
kt-legacy ==1.0.5
langcodes ==3.4.0
language-data ==1.2.0
lazy-loader ==0.4
libclang ==18.1.1
lightgbm ==4.3.0
lightning ==2.3.3
lightning-utilities ==0.11.5
llvmlite ==0.41.1
marisa-trie ==1.2.0
markdown ==3.6
markdown-it-py ==3.0.0
mccabe ==0.7.0
mdurl ==0.1.2
minio ==7.2.7
mlforecast ==0.10.0
model-index ==0.1.11
mpmath ==1.3.0
msgpack ==1.0.8
multidict ==6.0.5
multiprocess ==0.70.16
murmurhash ==1.0.10
mypy ==1.10.0
mypy-extensions ==1.0.0
namex ==0.0.8
networkx ==3.1
nlpaug ==1.1.11
nltk ==3.8.1
nptyping ==2.4.1
numba ==0.58.1
nvidia-cublas-cu11 ==11.11.3.6
nvidia-cublas-cu12 ==12.1.3.1
nvidia-cuda-cupti-cu12 ==12.1.105
nvidia-cuda-nvrtc-cu12 ==12.1.105
nvidia-cuda-runtime-cu12 ==12.1.105
nvidia-cudnn-cu11 ==8.6.0.163
nvidia-cudnn-cu12 ==8.9.2.26
nvidia-cufft-cu12 ==11.0.2.54
nvidia-curand-cu12 ==10.3.2.106
nvidia-cusolver-cu12 ==11.4.5.107
nvidia-cusparse-cu12 ==12.1.0.106
nvidia-ml-py3 ==7.352.0
nvidia-nccl-cu12 ==2.20.5
nvidia-nvjitlink-cu12 ==12.5.82
nvidia-nvtx-cu12 ==12.1.105
oauthlib ==3.2.2
omegaconf ==2.2.3
onnx ==1.16.1
onnxruntime ==1.18.1
opencensus ==0.11.4
opencensus-context ==0.1.3
opencv-python ==4.10.0.84
opendatalab ==0.0.10
openmim ==0.3.9
openml ==0.14.2
openxlab ==0.0.11
opt-einsum ==3.3.0
optimum ==1.18.1
ordered-set ==4.1.0
orjson ==3.10.6
panda ==0.3.1
parso ==0.7.1
patsy ==0.5.6
pdf2image ==1.17.0
plotly ==5.14.1
preshed ==3.0.9
proto-plus ==1.24.0
protobuf ==3.19.6
py-spy ==0.3.14
py4j ==0.10.9.7
pyarrow ==16.1.0
pyarrow-hotfix ==0.6
pyasn1 ==0.6.0
pyasn1-modules ==0.4.0
pycodestyle ==2.12.0
pycryptodome ==3.20.0
pydantic ==2.8.2
pydantic-core ==2.20.1
pydocstyle ==6.3.0
pyflakes ==3.2.0
pytesseract ==0.3.10
python-graphviz ==0.20.3
pytorch-lightning ==2.3.3
pytorch-metric-learning ==2.3.0
pywavelets ==1.4.1
ray ==2.10.0
regex ==2024.5.15
requests-oauthlib ==2.0.0
rich ==13.7.1
rsa ==4.9
s3transfer ==0.10.2
safetensors ==0.4.3
scikit-image ==0.20.0
scikit-learn ==1.3.2
scipy ==1.9.1
sentencepiece ==0.2.0
seqeval ==1.2.2
setuptools ==70.3.0
shapely ==2.0.4
shellingham ==1.5.4
smart-open ==7.0.4
spacy ==3.7.5
spacy-legacy ==3.0.12
spacy-loggers ==1.0.5
srsly ==2.4.8
statsforecast ==1.4.0
statsmodels ==0.14.1
sympy ==1.13.0
tabulate ==0.9.0
tblib ==3.0.0
tenacity ==8.4.1
tensorboard ==2.10.0
tensorboard-data-server ==0.6.1
tensorboard-plugin-wit ==1.8.1
tensorboardx ==2.6.2.2
tensorflow ==2.10.0
tensorflow-estimator ==2.10.0
tensorflow-hub ==0.16.1
tensorflow-io-gcs-filesystem ==0.34.0
tensorflow-text ==2.10.0
termcolor ==2.4.0
text-unidecode ==1.3
tf-keras ==2.15.0
thinc ==8.2.5
tifffile ==2023.7.10
timm ==0.9.16
tokenizers ==0.15.2
toolz ==0.12.1
torch ==2.3.1
torchmetrics ==1.2.1
torchvision ==0.18.1
transformers ==4.39.3
triton ==2.3.1
typer ==0.12.3
urllib3 ==1.26.19
utilsforecast ==0.0.10
wasabi ==1.1.3
weasel ==0.4.1
werkzeug ==3.0.3
window-ops ==0.0.15
wrapt ==1.16.0
xgboost ==2.1.0
xmltodict ==0.13.0
xxhash ==3.4.1
yarl ==1.9.4

requirements_new.txt pypi

Keras-Preprocessing ==1.1.2
Markdown ==3.6
PyQt5 ==5.15.10
PyWavelets ==1.4.1
Werkzeug ==3.0.3
absl-py ==2.1.0
accelerate ==0.21.0
aiohttp ==3.9.5
aiohttp-cors ==0.7.0
aiosignal ==1.3.1
annotated-types ==0.7.0
antlr4-python3-runtime ==4.9.3
astunparse ==1.6.3
async-timeout ==4.0.3
autogluon ==1.1.1
autogluon.common ==1.1.1
autogluon.core ==1.1.1
autogluon.features ==1.1.1
autogluon.multimodal ==1.1.1
autogluon.tabular ==1.1.1
autogluon.timeseries ==1.1.1
autokeras ==1.1.0
blis ==0.7.11
boto3 ==1.34.144
botocore ==1.34.144
catalogue ==2.0.10
catboost ==1.2.5
category-encoders ==2.6.3
cloudpathlib ==0.18.1
coloredlogs ==15.0.1
colorful ==0.5.6
colorlog ==5.0.1
confection ==0.1.5
cymem ==2.0.8
datasets ==2.20.0
dill ==0.3.8
dm-tree ==0.1.8
evaluate ==0.4.2
fastai ==2.7.15
fastcore ==1.5.54
fastdownload ==0.0.7
fastprogress ==1.0.3
flake8 ==7.1.0
flake8-mypy ==17.8.0
flatbuffers ==24.3.25
frozenlist ==1.4.1
gast ==0.4.0
gdown ==5.2.0
gluonts ==0.15.1
google-api-core ==2.19.1
google-auth ==2.30.0
google-auth-oauthlib ==1.0.0
google-pasta ==0.2.0
googleapis-common-protos ==1.63.2
graphviz ==0.20.3
grpcio ==1.64.1
h5py ==3.11.0
huggingface-hub ==0.23.4
humanfriendly ==10.0
hyperopt ==0.2.7
imagecorruptions ==1.1.2
imageio ==2.34.1
imgaug ==0.4.0
jedi ==0.17.2
jmespath ==1.0.1
kaleido ==0.2.1
keras ==2.13.1
keras-core ==0.1.5
keras-nlp ==0.6.1
keras-tuner ==1.4.7
kt-legacy ==1.0.5
langcodes ==3.4.0
language_data ==1.2.0
lazy_loader ==0.4
libclang ==18.1.1
lightgbm ==4.3.0
lightning ==2.3.3
lightning-utilities ==0.11.5
llvmlite ==0.41.1
marisa-trie ==1.2.0
markdown-it-py ==3.0.0
mccabe ==0.7.0
mdurl ==0.1.2
minio ==7.2.7
mkl-service ==2.4.0
mlforecast ==0.10.0
model-index ==0.1.11
multidict ==6.0.5
multiprocess ==0.70.16
murmurhash ==1.0.10
mypy ==1.10.0
mypy-extensions ==1.0.0
namex ==0.0.8
nlpaug ==1.1.11
nltk ==3.8.1
nptyping ==2.4.1
numba ==0.58.1
nvidia-cublas-cu11 ==11.11.3.6
nvidia-cublas-cu12 ==12.1.3.1
nvidia-cuda-cupti-cu12 ==12.1.105
nvidia-cuda-nvrtc-cu12 ==12.1.105
nvidia-cuda-runtime-cu12 ==12.1.105
nvidia-cudnn-cu11 ==8.6.0.163
nvidia-cudnn-cu12 ==8.9.2.26
nvidia-cufft-cu12 ==11.0.2.54
nvidia-curand-cu12 ==10.3.2.106
nvidia-cusolver-cu12 ==11.4.5.107
nvidia-cusparse-cu12 ==12.1.0.106
nvidia-ml-py3 ==7.352.0
nvidia-nccl-cu12 ==2.20.5
nvidia-nvjitlink-cu12 ==12.5.82
nvidia-nvtx-cu12 ==12.1.105
oauthlib ==3.2.2
omegaconf ==2.2.3
onnx ==1.16.1
onnxruntime ==1.18.1
opencensus ==0.11.4
opencensus-context ==0.1.3
opencv-python ==4.10.0.84
opendatalab ==0.0.10
openmim ==0.3.9
openml ==0.14.2
openxlab ==0.0.11
opt-einsum ==3.3.0
optimum ==1.18.1
ordered-set ==4.1.0
orjson ==3.10.6
panda ==0.3.1
parso ==0.7.1
patsy ==0.5.6
pdf2image ==1.17.0
plotly ==5.14.1
ply ==3.11
preshed ==3.0.9
proto-plus ==1.24.0
protobuf ==4.25.4
py-spy ==0.3.14
py4j ==0.10.9.7
pyarrow ==16.1.0
pyarrow-hotfix ==0.6
pyasn1 ==0.6.0
pyasn1_modules ==0.4.0
pycodestyle ==2.12.0
pycryptodome ==3.20.0
pydantic ==2.8.2
pydantic_core ==2.20.1
pydocstyle ==6.3.0
pyflakes ==3.2.0
pytesseract ==0.3.10
pytorch-lightning ==2.3.3
pytorch-metric-learning ==2.3.0
ray ==2.10.0
regex ==2024.5.15
requests-oauthlib ==2.0.0
rich ==13.7.1
rsa ==4.9
s3transfer ==0.10.2
safetensors ==0.4.3
scikit-image ==0.20.0
scikit-learn ==1.3.2
scipy ==1.9.1
sentencepiece ==0.2.0
seqeval ==1.2.2
shapely ==2.0.4
shellingham ==1.5.4
smart-open ==7.0.4
spacy ==3.7.5
spacy-legacy ==3.0.12
spacy-loggers ==1.0.5
srsly ==2.4.8
statsforecast ==1.4.0
statsmodels ==0.14.1
tabulate ==0.9.0
tenacity ==8.4.1
tensorboard ==2.13.0
tensorboard-data-server ==0.7.2
tensorboard-plugin-wit ==1.8.1
tensorboardX ==2.6.2.2
tensorflow ==2.13.1
tensorflow-estimator ==2.13.0
tensorflow-hub ==0.16.1
tensorflow-io-gcs-filesystem ==0.34.0
tensorflow-text ==2.10.0
tensorrt ==10.3.0
tensorrt-cu12 ==10.3.0
tensorrt-cu12-bindings ==10.3.0
tensorrt-cu12-libs ==10.3.0
termcolor ==2.4.0
text-unidecode ==1.3
tf-keras ==2.15.0
thinc ==8.2.5
tifffile ==2023.7.10
timm ==0.9.16
tokenizers ==0.15.2
torch ==2.3.1
torchaudio ==2.3.1
torchmetrics ==1.2.1
torchvision ==0.18.1
transformers ==4.39.3
triton ==2.3.1
typer ==0.12.3
typing_extensions ==4.5.0
urllib3 ==1.26.19
utilsforecast ==0.0.10
wasabi ==1.1.3
weasel ==0.4.1
webencodings ==0.5.1
window_ops ==0.0.15
wrapt ==1.16.0
xarray ==2023.1.0
xgboost ==2.1.0
xmltodict ==0.13.0
xxhash ==3.4.1
yarl ==1.9.4
zstandard ==0.22.0

setup.py pypi

src/data_imputation_paper.egg-info/requires.txt pypi

pytest *
pytest-cov *

src/jenga/environment.yaml pypi

flake8 *
flake8-mypy *
imagecorruptions *
imgaug *
jsonpickle *
tensorflow-data-validation *

src/jenga/setup.py pypi

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science