https://github.com/hauselin/docdata

R package to generate dataset documentation semi-automatically https://hauselin.github.io/docdata/

https://github.com/hauselin/docdata

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.0%) to scientific vocabulary

Keywords

data-docs data-management data-sharing documentation documentation-tool open-science
Last synced: 4 months ago · JSON representation

Repository

R package to generate dataset documentation semi-automatically https://hauselin.github.io/docdata/

Basic Info
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
data-docs data-management data-sharing documentation documentation-tool open-science
Created about 6 years ago · Last pushed about 6 years ago
Metadata Files
Readme License

README.Rmd

---
output: github_document
---



```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures",
  out.width = "100%"
)
```

# docdata

docdata is an R package that **generates documentation for datasets semi-automatically**. It streamlines the process of documenting when/where/who etc. a dataset is from. It also **standardizes documentation**. 

Ideally, every dataset (e.g., csv/txt file) with tabular data should have a corresponding documentation file that describes the rows and columns of that dataset and other information about the dataset. `docdata` helps you accomplish all that.

`docdata` aims to make data docmentation and sharing easier. It helps you avoid being **that** person who shares data that no one else can use because nothing was documented.


[![Travis build status](https://travis-ci.org/hauselin/docdata.svg?branch=master)](https://travis-ci.org/hauselin/docdata)
[![AppVeyor build status](https://ci.appveyor.com/api/projects/status/github/hauselin/docdata?branch=master&svg=true)](https://ci.appveyor.com/project/hauselin/docdata)


## Examples

Below are examples of documentation generated by `docdata`:

* [Data from experimental research](https://github.com/hauselin/depletion_bayes/tree/master/Data)
* Cognitive task data in [GitHub repository](https://github.com/hauselin/depletion_bayes/blob/master/Data/stroop_single_trial.md) and as a [raw markdown file](https://raw.githubusercontent.com/hauselin/depletion_bayes/master/Data/stroop_single_trial.md)

## Installation

To install the package, type the following commands into the R console:

``` r
# install.packages("devtools")
devtools::install_github("hauselin/docdata") # you might have to install devtools first (see above)
```

## How to use docdata?

**Step 1: use `doc_data()` to generate a documentation (markdown file)**

* Example: `doc_data("mtcars.csv")` (assuming `mtcars.csv` is a dataset in your working directory.)

**Step 2: use `disp_doc()` to print the doc in your console**

* Example: `disp_doc("mtcars.csv")` or `disp_doc("mtcars.md")`

**Step 3: use `doc_open()` to open the doc to edit it**

* Example: `doc_open("mtcars.csv")` or `doc_open("mtcars.md")`

**Step 4: use `doc_refresh()` to refresh/update your documentation**

* Example: `doc_refresh(mtcars.csv)` or `doc_refresh(mtcars.md)`

**Step 5: share your dataset and documentation file with others or your future self(!)**

### Step 1: `doc_data()`

`doc_data()` generates a markdown file that looks like the one shown below. If you dataset is `mtcars.csv`, the markdown  file will be named `mtcars.md` and will be located in the same directory as `mtcars.csv`. 

Example usage: `doc_data("mtcars.csv")` (assuming `mtcars.csv` is a dataset in your working directory.)

```
A GitHub flavored Markdown textfile documenting a dataset.

Generated using [docdata package](https://hauselin.github.io/docdata/) on 2019-12-08 18:16:46.
To cite this package, type citations("docdata") in console.

## Data source

mtcars.csv

## About this file

* What (is the data): 
* Who (generated this documentation): 
* Who (collected the data):
* When (was the data collected): 
* Where (was the data collected):
* How (was the data collected):
* Why (was the data collected): 

## Additional information

* Contact: XXX@XXX.com
* Registration: https://osf.io

## Columns

* Rows: 32
* Columns: 4

| Column  | Type     | Description |
| ------- | -------- | ----------- |
| mpg     | numeric  |             |
| cyl     | numeric  |             |
| disp    | numeric  |             |
| hp      | numeric  |             |

End of documentation.

```

### Step 2: `disp_doc()`

`disp_doc()` prints the documentation in your console. An example (truncated) output is shown below.

Example usage: `disp_doc("mtcars.csv")` or `disp_doc("mtcars.md")`

```
--- DOCUMENTATION BEGIN ---
    1     A GitHub flavored Markdown textfile documenting a dataset.
    2     
    3     Generated using docdata package on 2019-12-08 12:50:50.
    4     To cite this package, type citations("docdata") in console.
    5     
    6     ## Data source
    7     
    8     mtcars.csv
    9     
   10     ## About this file
   ...
--- DOCUMENTATION END ---
```

### Step 3: `doc_open()`

`doc_open()` opens the documentation in R or RStudio so you can edit it and fill in the details.

Example usage: `doc_open("mtcars.csv")` or `doc_open("mtcars.md")`

### Step 4: `doc_refresh()`

If your documentation looks messy after you've edited it (especially if the description column isn't aligned), run `doc_refresh()` to clean it up. Or if the columns/rows of your dataset have changed since the last time the documentation was generated, run this function again to update your documentation, which merges your previous documentation with a refreshed/updated one.

Example usage: `doc_refresh("mtcars.csv")` or `doc_refresh("mtcars.md")`

* Before (messy)

```
| Column  | Type     | Description           |
| ------- | -------- | --------------------- |
| mpg     | numeric  | miles per gallon           |
| cyl     | numeric  | number of cylinders            |
| disp    | numeric  |       displacement (cu.in.)           |
| fakecolumn     | numeric  | non-existent column            |
```

* After running `doc_refresh()`: spacing are cleaned and new columns are deleted/added

```
| Column  | Type     | Description            |
| ------- | -------- | ---------------------- |
| mpg     | numeric  | miles per gallon       |
| cyl     | numeric  | number of cylinders    |
| disp    | numeric  | displacement (cu.in.)  |
| hp      | numeric  |                        |
| drat    | numeric  |                        |
```

### Step 5: Share your dataset + documentation

Owner

  • Name: Hause Lin
  • Login: hauselin
  • Kind: user

Researcher at MIT & World Bank

GitHub Events

Total
Last Year

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 21
  • Total Committers: 1
  • Avg Commits per committer: 21.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Hause Lin h****n@g****m 21

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels