portawaterperu

Data about Peruvian Portable Water System

https://github.com/openwashdata/portawaterperu

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.5%) to scientific vocabulary
Last synced: 9 months ago · JSON representation ·

Repository

Data about Peruvian Portable Water System

Basic Info
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 1
  • Releases: 1
Created almost 2 years ago · Last pushed almost 2 years ago
Metadata Files
Readme License Citation Zenodo

README.Rmd

---
output: github_document
always_allow_html: true
editor_options: 
  markdown: 
    wrap: 72
  chunk_output_type: console
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%",
  message = FALSE,
  warning = FALSE,
  fig.retina = 2,
  fig.align = 'center'
)
```

# portawaterperu



[![License: CC BY
4.0](https://img.shields.io/badge/License-CC_BY_4.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)
[![R-CMD-check](https://github.com/openwashdata/portawaterperu/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/openwashdata/portawaterperu/actions/workflows/R-CMD-check.yaml)


The goal of the package `portawaterperu` is to provide access to data about community portable water systems in Peru. The data is collected from SIASAR database consisted of information and surveys about water catchments, storage system, treatment, distribution networks and maintainance. 

## Installation

You can install the development version of portawaterperu from
[GitHub](https://github.com/) with:

``` r
# install.packages("devtools")
devtools::install_github("openwashdata/portawaterperu")
```

```{r}
## Run the following code in console if you don't have the packages
## install.packages(c("dplyr", "knitr", "readr", "stringr", "gt", "kableExtra"))
library(dplyr)
library(knitr)
library(readr)
library(stringr)
library(gt)
library(kableExtra)
```

Alternatively, you can download the individual datasets as a CSV or XLSX
file from the table below.

```{r, echo=FALSE, message=FALSE, warning=FALSE}

extdata_path <- "https://github.com/openwashdata/portawaterperu/raw/main/inst/extdata/"

read_csv("data-raw/dictionary.csv") |> 
  distinct(file_name) |> 
  dplyr::mutate(file_name = str_remove(file_name, ".rda")) |> 
  dplyr::rename(dataset = file_name) |> 
  mutate(
    CSV = paste0("[Download CSV](", extdata_path, dataset, ".csv)"),
    XLSX = paste0("[Download XLSX](", extdata_path, dataset, ".xlsx)")
  ) |> 
  knitr::kable()

```

## Data

The package provides access to one dataset `portawaterperu`.

```{r}
library(portawaterperu)
```

### portawaterperu

The dataset `portawaterperu` contains data abour portable water system from 32 communities in Peru. It has
`r nrow(portawaterperu)` observations and `r ncol(portawaterperu)`
variables

```{r}
portawaterperu |> 
  head(3) |> 
  gt::gt() |>
  gt::as_raw_html()
```

For an overview of the variable names, see the following table.

```{r echo=FALSE, message=FALSE, warning=FALSE}
readr::read_csv("data-raw/dictionary.csv") |>
  dplyr::filter(file_name == "portawaterperu.rda") |>
  dplyr::select(variable_name:description) |> 
  knitr::kable() |> 
  kableExtra::kable_styling("striped") |> 
  kableExtra::scroll_box(height = "200px")
```

## Example

```{r}
library(portawaterperu)
library(ggplot2)
# Provide some example code here
portawaterperu |> 
  #dplyr::filter(stringr::str_starts(divisiones, "AMAZONAS")) |>
  #dplyr::group_by(divisiones) |> 
  #dplyr::summarise(mean = mean(pob_servida)) |> 
  ggplot(aes(y = pop_serviced, color = type_gravity))+
  geom_boxplot(outliers = F)+
  labs(title = "Population served given different gravity types",
       y= "Population") +
  theme_classic()
  
```

## Capstone Project
This dataset is shared as part of a capstone project in Data Science for openwashdata. For more information about the project and to explore further insights, please visit the project page at https://ds4owd-001.github.io/project-laurenjudah/ (to be public available)


## Methodology
The data was obtained from @SIASAR, an information system containing data on rural water supply and sanitation services. Using SIASAR's "download data by country" tool, all available data for Peru (10 excel files) were downloaded. After examining the 10 excel files, only 5 pertained to potable water systems. Those 5 data sets were imported into R and subsequently empty values and unnecessary columns were deleted from them. Finally, the 5 data sets were combined into 1 data frame based on community ID. The combined, cleaned data set contains data from 32 communities.

SIASAR does not provide a matching data dictionary. openwashdata developer went through the attachements of the original questionnaire:

- https://globalsiasar.org/es/content/documentacion-tecnica
- https://globalsiasar.org/en/content/technical-documentation

The variable description is written with our best guess with the information from the attachements.

## License

Data are available as
[CC-BY](https://github.com/openwashdata/portawaterperu/blob/main/LICENSE.md).

## Citation

Please cite this package using:

```{r}
citation("portawaterperu")
```

Owner

  • Name: openwashdata
  • Login: openwashdata
  • Kind: organization

Citation (CITATION.cff)

# --------------------------------------------
# CITATION file created with {cffr} R package
# See also: https://docs.ropensci.org/cffr/
# --------------------------------------------
 
cff-version: 1.2.0
message: 'To cite package "portawaterperu" in publications use:'
type: software
license: CC-BY-4.0
title: 'portawaterperu: A Preliminary Review of Peruvian Potable Water System Data'
version: 0.0.0.9000
abstract: Data to gain preliminary understanding of community potable water systems
  in Peru.
authors:
- family-names: Judah
  given-names: Lauren
- family-names: Loos
  given-names: Sebastian Camilo
  email: seba.loos@hotmail.com
  orcid: https://orcid.org/0000-0002-8830-1734
- family-names: Zhong
  given-names: Mian
  email: mzhong@ethz.ch
  orcid: https://orcid.org/0009-0009-4546-7214
repository-code: https://github.com/openwashdata/portawaterperu
url: https://github.com/openwashdata/portawaterperu
date-released: '2024-06-21'
contact:
- family-names: Loos
  given-names: Sebastian Camilo
  email: seba.loos@hotmail.com
  orcid: https://orcid.org/0000-0002-8830-1734
references:
- type: software
  title: 'R: A Language and Environment for Statistical Computing'
  notes: Depends
  url: https://www.R-project.org/
  authors:
  - name: R Core Team
  institution:
    name: R Foundation for Statistical Computing
    address: Vienna, Austria
  year: '2024'
  version: '>= 2.10'

GitHub Events

Total
Last Year

Dependencies

DESCRIPTION cran
  • R >= 2.10 depends
.github/workflows/R-CMD-check.yaml actions
  • actions/checkout v4 composite
  • r-lib/actions/check-r-package v2 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite