https://github.com/ai-readi/dataset-documentation-paper-code
Code associated with the data documentation paper
https://github.com/ai-readi/dataset-documentation-paper-code
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 7 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.5%) to scientific vocabulary
Repository
Code associated with the data documentation paper
Basic Info
- Host: GitHub
- Owner: AI-READI
- License: mit
- Language: Jupyter Notebook
- Default Branch: main
- Size: 7.81 MB
Statistics
- Stars: 0
- Watchers: 5
- Forks: 1
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Code: Dataset Documentation for AI Paper
About
This is the code associated with our paper where we analyzed various dataset documentation approaches that can help with the responsible development of AI models. See this inventory for all related resources, including the paper.
Standards followed
The overall code is structured according to the FAIR-BioRS guidelines. The Python code in the various Jupyter notebooks follows the PEP8 guidelines. All the dependencies are documented in the environment.yml file.
Using the Jupyter notebooks
Prerequisites
We recommend using Anaconda to create and manage your development environment and using JupyterLab to run the notebook. All the subsequent instructions are provided assuming you are using Anaconda (Python 3 version) and JupyterLab.
Clone repo
Clone the repo or download as a zip and extract.
cd into the code folder
Open Anaconda prompt (Windows) or the system Command line interface then naviguate to the code ```sh cd .dataset-documentation-paper-code
```
Setup conda env
sh
$ conda env create -f environment.yml
Setup kernell for Jupyter lab
sh
$ conda activate dataset-documentation-env
$ conda install ipykernel
$ ipython kernel install --user --name=dataset-documentation
$ conda deactivate
Setup env vars
The environment variables required are listed in the table below along with information on how to get them
| Suggested name | Value or instructions for obtaining it | Purpose |
|---|---|---|
| GITHUB_ACCESS_TOKEN | https://docs.github.com/en/rest/authentication/authenticating-to-the-rest-api | Required to run the GitHub search code in real-world-usage.ipynb |
Launch Jupyter lab
Launch Jupyter lab and naviguate to open the Jupyter notebook of interest. Make sure to change the kernel to the one created above called "dataset-documentation" (e.g., see here). We recommend to use the JupyterLab code formatter along with the Black and isort formatters to facilitate compliance with PEP8 if you are editing the notebook.
Inputs/outputs
The Jupyter notebook makes use of files in the dataset associated with the paper (see here). You will need to download the dataset at add it in the inputs folder (call the dataset folder 'dataset' after downloading it).
Outputs of the code include plots and tables displayed in the notebook but also saved as files. These saved plot files are included in the outputs folder.
License
This work is licensed under MIT. See LICENSE for more information.
Feedback and contribution
Use the GitHub issues for submitting feedback or making suggestions. You can also work the repository and submit a pull request with suggestions.
How to cite
If you use this code, please cite the related paper (it will be listed here when available) and also cite this repository as:
```bash Simpkins, Kyongmi, Patel, Bhavesh. Code: Dataset Documentation for AI Paper [Software]. Zenodo. https://doi.org/10.5281/zenodo.14583673
Owner
- Name: AI-READI
- Login: AI-READI
- Kind: organization
- Location: United States of America
- Repositories: 22
- Profile: https://github.com/AI-READI
Organization of the AI-READI data generation project of the NIH-funded Bridge2AI Program
GitHub Events
Total
- Push event: 21
- Pull request event: 2
Last Year
- Push event: 21
- Pull request event: 2
Dependencies
- PyGithub ==2.5.0
- PyPDF2 ==3.0.1
- matplotlib ==3.10.0
- pandas ==2.2.3
- seaborn ==0.13.2
- sentence-transformers ==3.3.1