paper_2023_ci_for_ge
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.6%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: slds-lmu
- Language: HTML
- Default Branch: main
- Size: 459 MB
Statistics
- Stars: 1
- Watchers: 7
- Forks: 0
- Open Issues: 1
- Releases: 0
Metadata Files
README.md
Constructing CIs for 'the' Generalization Error
This is the code to reproduce the experiments from the Paper: Constructing CIs for 'the' Generalization Error.
The content of this repository is:
./analysis/contains the code to process the results and to reproduce all the figures../data/contains some input data and is the folder where generated datasets are stored../datamodels/contains the code related to the generation of the datasets../experimentscontains the code for the main experiments to investigate the inference methods../figuresis the folder where the figures are stored. Only the figures from the main paper are stored here, some additional l./inferGE/is the R package that implements confidence interval methods that are being compared. This is research code. If you want to use the well-performing inference methods in R use this repository: https://github.com/mlr-org/mlr3inferr../renv/andrenv.lockare for the reproducible R environment./results/contains the final results (such as figures, tables) included in the paper
Reproducibility
The instructions to reproduce the experiments are separated into:
- the dataset generation, see
./datamodels/README.md. Note that the resulting datasets are also made available on OpenML, so the main experiments can be reproduced without this step. Note: The code and instructions to reproduce the results are still being cleaned up. - the main experiments, see
./experiments/README.md. Note: The code and instructions to reproduce the results are still being cleaned up. - To recreate the preprocessing and the figures from the paper you need to first download
the additional material from zenodo: https://zenodo.org/records/13744382.
Specifically, move the content of the results data into the
./resultssubdirectory of this repository.
Downloading the datasets
Because of the large size of the benchmark datasets, it is important to download them in parquet format. However, the download might still fail. In this case, simply retry until it works.
Extending the experiment
In order to evaluate new inference methods, the following steps need to be followed:
- In case the inference method requires a new resampling method that is not yet implemented in
mlr3, you need to implement a newResamplingclass, e.g. by adding it to theinferGER package in the folder with the same name. For an example, e.g. seeResamplingNestedCV. - Implement the inference method itself, e.g. in the
inferGEpackages. As an example see theinfer_bates.Rfile which uses the resample result ccreated byResamplingNestedCV. - Add the resampling method to the experiment definition from
./experiments/resample. - Add the inferece method to the definitions from
./experiments/ci.R - Run the resample experiment and then the CIs.
Don't hesitate to contact us if you want to reuse this code!
Converting the Files to another format
If you don't want to work with R but still work with the results via e.g. python, you can achieve this by:
- Starting the
Rinterpreter - Read in the relevant
.rdsfile usingreadRDS(<path>) - Write the data e.g. to CSV using the
write.csvfunction.
Owner
- Name: slds-lmu
- Login: slds-lmu
- Kind: organization
- Repositories: 34
- Profile: https://github.com/slds-lmu
GitHub Events
Total
- Issues event: 6
- Watch event: 1
- Push event: 20
- Pull request event: 2
Last Year
- Issues event: 6
- Watch event: 1
- Push event: 20
- Pull request event: 2
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 4
- Total pull requests: 2
- Average time to close issues: 3 months
- Average time to close pull requests: about 2 months
- Total issue authors: 1
- Total pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 4
- Pull requests: 2
- Average time to close issues: 3 months
- Average time to close pull requests: about 2 months
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- sebffischer (106)
Pull Request Authors
- sebffischer (6)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- R >= 4.0.0 depends
- mlr3 * depends
- R6 * imports
- data.table * imports
- testthat >= 3.0.0 suggests
- accelerate ==0.21.0
- aiohttp ==3.8.4
- aiosignal ==1.3.1
- async-timeout ==4.0.2
- attrs ==23.1.0
- be-great ==0.0.5
- certifi ==2023.5.7
- charset-normalizer ==3.2.0
- cmake ==3.26.4
- datasets ==2.13.1
- dill ==0.3.6
- filelock ==3.12.2
- frozenlist ==1.4.0
- fsspec ==2023.6.0
- huggingface-hub ==0.16.4
- idna ==3.4
- jinja2 ==3.1.2
- joblib ==1.3.1
- lit ==16.0.6
- markupsafe ==2.1.3
- mpmath ==1.3.0
- multidict ==6.0.4
- multiprocess ==0.70.14
- networkx ==3.1
- numpy ==1.25.1
- nvidia-cublas-cu11 ==11.10.3.66
- nvidia-cuda-cupti-cu11 ==11.7.101
- nvidia-cuda-nvrtc-cu11 ==11.7.99
- nvidia-cuda-runtime-cu11 ==11.7.99
- nvidia-cudnn-cu11 ==8.5.0.96
- nvidia-cufft-cu11 ==10.9.0.58
- nvidia-curand-cu11 ==10.2.10.91
- nvidia-cusolver-cu11 ==11.4.0.1
- nvidia-cusparse-cu11 ==11.7.4.91
- nvidia-nccl-cu11 ==2.14.3
- nvidia-nvtx-cu11 ==11.7.91
- packaging ==23.1
- pandas ==2.0.3
- peft ==0.4.0
- psutil ==5.9.5
- pyarrow ==12.0.1
- pyprojroot ==0.3.0
- python-dateutil ==2.8.2
- pytz ==2023.3
- pyyaml ==6.0.1
- regex ==2023.6.3
- requests ==2.31.0
- safetensors ==0.3.1
- scikit-learn ==1.3.0
- scipy ==1.11.1
- six ==1.16.0
- sympy ==1.12
- threadpoolctl ==3.2.0
- tokenizers ==0.13.3
- torch ==2.0.1
- tqdm ==4.65.0
- transformers ==4.27.3
- triton ==2.0.0
- typing-extensions ==4.7.1
- tzdata ==2023.3
- urllib3 ==2.0.3
- xxhash ==3.2.0
- yarl ==1.9.2