https://github.com/hauselin/docdata
R package to generate dataset documentation semi-automatically https://hauselin.github.io/docdata/
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.0%) to scientific vocabulary
Keywords
data-docs
data-management
data-sharing
documentation
documentation-tool
open-science
Last synced: 4 months ago
·
JSON representation
Repository
R package to generate dataset documentation semi-automatically https://hauselin.github.io/docdata/
Basic Info
- Host: GitHub
- Owner: hauselin
- License: other
- Language: R
- Default Branch: master
- Homepage: https://hauselin.github.io/docdata/
- Size: 271 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
data-docs
data-management
data-sharing
documentation
documentation-tool
open-science
Created about 6 years ago
· Last pushed about 6 years ago
Metadata Files
Readme
License
README.Rmd
---
output: github_document
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures",
out.width = "100%"
)
```
# docdata
docdata is an R package that **generates documentation for datasets semi-automatically**. It streamlines the process of documenting when/where/who etc. a dataset is from. It also **standardizes documentation**.
Ideally, every dataset (e.g., csv/txt file) with tabular data should have a corresponding documentation file that describes the rows and columns of that dataset and other information about the dataset. `docdata` helps you accomplish all that.
`docdata` aims to make data docmentation and sharing easier. It helps you avoid being **that** person who shares data that no one else can use because nothing was documented.
[](https://travis-ci.org/hauselin/docdata)
[](https://ci.appveyor.com/project/hauselin/docdata)
## Examples
Below are examples of documentation generated by `docdata`:
* [Data from experimental research](https://github.com/hauselin/depletion_bayes/tree/master/Data)
* Cognitive task data in [GitHub repository](https://github.com/hauselin/depletion_bayes/blob/master/Data/stroop_single_trial.md) and as a [raw markdown file](https://raw.githubusercontent.com/hauselin/depletion_bayes/master/Data/stroop_single_trial.md)
## Installation
To install the package, type the following commands into the R console:
``` r
# install.packages("devtools")
devtools::install_github("hauselin/docdata") # you might have to install devtools first (see above)
```
## How to use docdata?
**Step 1: use `doc_data()` to generate a documentation (markdown file)**
* Example: `doc_data("mtcars.csv")` (assuming `mtcars.csv` is a dataset in your working directory.)
**Step 2: use `disp_doc()` to print the doc in your console**
* Example: `disp_doc("mtcars.csv")` or `disp_doc("mtcars.md")`
**Step 3: use `doc_open()` to open the doc to edit it**
* Example: `doc_open("mtcars.csv")` or `doc_open("mtcars.md")`
**Step 4: use `doc_refresh()` to refresh/update your documentation**
* Example: `doc_refresh(mtcars.csv)` or `doc_refresh(mtcars.md)`
**Step 5: share your dataset and documentation file with others or your future self(!)**
### Step 1: `doc_data()`
`doc_data()` generates a markdown file that looks like the one shown below. If you dataset is `mtcars.csv`, the markdown file will be named `mtcars.md` and will be located in the same directory as `mtcars.csv`.
Example usage: `doc_data("mtcars.csv")` (assuming `mtcars.csv` is a dataset in your working directory.)
```
A GitHub flavored Markdown textfile documenting a dataset.
Generated using [docdata package](https://hauselin.github.io/docdata/) on 2019-12-08 18:16:46.
To cite this package, type citations("docdata") in console.
## Data source
mtcars.csv
## About this file
* What (is the data):
* Who (generated this documentation):
* Who (collected the data):
* When (was the data collected):
* Where (was the data collected):
* How (was the data collected):
* Why (was the data collected):
## Additional information
* Contact: XXX@XXX.com
* Registration: https://osf.io
## Columns
* Rows: 32
* Columns: 4
| Column | Type | Description |
| ------- | -------- | ----------- |
| mpg | numeric | |
| cyl | numeric | |
| disp | numeric | |
| hp | numeric | |
End of documentation.
```
### Step 2: `disp_doc()`
`disp_doc()` prints the documentation in your console. An example (truncated) output is shown below.
Example usage: `disp_doc("mtcars.csv")` or `disp_doc("mtcars.md")`
```
--- DOCUMENTATION BEGIN ---
1 A GitHub flavored Markdown textfile documenting a dataset.
2
3 Generated using docdata package on 2019-12-08 12:50:50.
4 To cite this package, type citations("docdata") in console.
5
6 ## Data source
7
8 mtcars.csv
9
10 ## About this file
...
--- DOCUMENTATION END ---
```
### Step 3: `doc_open()`
`doc_open()` opens the documentation in R or RStudio so you can edit it and fill in the details.
Example usage: `doc_open("mtcars.csv")` or `doc_open("mtcars.md")`
### Step 4: `doc_refresh()`
If your documentation looks messy after you've edited it (especially if the description column isn't aligned), run `doc_refresh()` to clean it up. Or if the columns/rows of your dataset have changed since the last time the documentation was generated, run this function again to update your documentation, which merges your previous documentation with a refreshed/updated one.
Example usage: `doc_refresh("mtcars.csv")` or `doc_refresh("mtcars.md")`
* Before (messy)
```
| Column | Type | Description |
| ------- | -------- | --------------------- |
| mpg | numeric | miles per gallon |
| cyl | numeric | number of cylinders |
| disp | numeric | displacement (cu.in.) |
| fakecolumn | numeric | non-existent column |
```
* After running `doc_refresh()`: spacing are cleaned and new columns are deleted/added
```
| Column | Type | Description |
| ------- | -------- | ---------------------- |
| mpg | numeric | miles per gallon |
| cyl | numeric | number of cylinders |
| disp | numeric | displacement (cu.in.) |
| hp | numeric | |
| drat | numeric | |
```
### Step 5: Share your dataset + documentation
Owner
- Name: Hause Lin
- Login: hauselin
- Kind: user
- Website: hauselin.com
- Twitter: hauselin
- Repositories: 182
- Profile: https://github.com/hauselin
Researcher at MIT & World Bank
GitHub Events
Total
Last Year
Issues and Pull Requests
Last synced: 7 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0