statcanR

R Package to connect to Statistics Canada's open data portal

https://github.com/warint/statcanr

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.3%) to scientific vocabulary

Keywords

statistics-canada
Last synced: 6 months ago · JSON representation ·

Repository

R Package to connect to Statistics Canada's open data portal

Basic Info
Statistics
  • Stars: 22
  • Watchers: 4
  • Forks: 6
  • Open Issues: 3
  • Releases: 2
Topics
statistics-canada
Created over 6 years ago · Last pushed 7 months ago
Metadata Files
Readme Changelog License Citation

README.md

statcanR

AppVeyor build
status Mentioned in
Awesome CRAN
status

Overview

Easily connect to Statistics Canada’s Web Data Service with R. Find and access open economic data (formerly known as CANSIM tables, now identified by Product IDs (PID)) which are accessible as a data frame, directly in the user’s R environment.

Shiny App : statcanR ExploR

For people less comfortable with R and to allow more people to have access to our package, we have also developed a Shiny application.Through the same logic present in our package, researchers can retrieve data from Statistics Canada.

statcanR ExploR is available [here]

Installation

The released version of statcanR package is accessible through CRAN and devtools.

``` r install.packages("statcanR")

install.packages("devtools") devtools::install_github('warint/statcanR') ```

Example

This section presents an example of how to use the statcanR R package and its functions: statcan_search(), statcan_data(), and statcan_download_data().

The following example is provided to illustrate how to use the functions. It consists in collecting some descriptive statistics about the Canadian Labour Force at the federal, provincial and industrial levels, on a monthly basis.

To identify a relevant table, the statcan_search() function can be used by using a keyword or set of keywords and specifying the language in which the data will be presented (English or French). Below is an example that reveals the data tables we could be interested in:

r library(statcanR) statcan_search(c("federal","expenditures","objectives"),"eng")

Notice that for each corresponding table, the unique table number identifier is also presented. Let's focus the first table out of the two that appear, which contains data on Federal expenditures on science and technology, by socio-economic objectives. Once this table number is identified (‘27-10-0014-01’), the statcan_data() function is easy to use in order to collect the data, as following:

r library(statcanR) mydata <- statcan_data("27-10-0014-01","eng")

For the statcan_download_data() function there is no difference on how to use it, the only difference is that this function allow you to download the data in a csv file on top of having the data in your environment.

r library(statcanR) mydata <- statcan_download_data("27-10-0014-01","eng")

Video Tutorial

Tutorial made by Professor Charles Saunders, Director of Master of Financial Economics Program at Western University biography

Thanks!

https://www.youtube.com/embed/z9TDUlgT5lc

Statistics Canada Open Licence

This licence is issued on behalf of His Majesty the King in Right of Canada, as represented by the Minister for Statistics Canada (“Statistics Canada”) to you (an individual or a legal entity that you are authorized to represent).

Statistics Canada may modify this licence at any time, and such modifications shall be effective immediately upon posting of the modified licence on the Statistics Canada website. Your use of the Information will be governed by the terms of the licence in force as of the date and time you accessed the Information.

Please refer to the terms of licence before using the Information.

Acknowledgment of Source according to Statistics Canada Open Licence Agreement

Statistics Canada has a specific procedure regarding the acknowledgment of source :

You shall include and maintain the following notice on all licensed rights of the Information:

Source: Statistics Canada, name of product, reference date. Reproduced and distributed on an "as is" basis with the permission of Statistics Canada.

Where any Information is contained within a Value-added Product, you shall include on such Value-added Product the following notice:

Adapted from Statistics Canada, name of product, reference date. This does not constitute an endorsement by Statistics Canada of this product.

Cite statcanR

To cite statcanR package in your work:

Warin, T. (2024). Access Statistics Canada’s Open Economic Data for Statistics and Data Science Courses. Technology Innovations in Statistics Education, 15(1). http://dx.doi.org/10.5070/T5.1868 Retrieved from https://escholarship.org/uc/item/9jr7k5hp

r @article{warin_access_2024, title = {Access {Statistics} {Canada}’s {Open} {Economic} {Data} for {Statistics} and {Data} {Science} {Courses}}, volume = {15}, url = {https://escholarship.org/uc/item/9jr7k5hp}, doi = {10.5070/T5.1868}, abstract = {This article is about the two conflicting goals when teaching statistics or data science courses based on real-world data in a business school environment. We propose to look at structured socio-economic data about the Canadian economy. Canada was ranked 8th in 2017 by Open Data Watch (Government of Canada) for its data accessibility policy. Statistics Canada offers several ways to access data across its over 11,000 data tables. We built an R package to ease access to Statistics Canada's open economic data. With this package, we offer students another option to collect data about the Canadian economy.}, language = {en}, number = {1}, urldate = {2024-01-17}, journal = {Technology Innovations in Statistics Education}, author = {Warin, Thierry}, month = jan, year = {2024}, file = {Full Text PDF:/Users/thierrywarin/Zotero/storage/7LNXFPKL/Warin - 2024 - Access Statistics Canada’s Open Economic Data for .pdf:application/pdf}, }

Acknowledgments

A previous version of this package was developed with Romain Le Duc. This version has benefitted from Thibault Senegas's contribution. The author would like to thank the Center for Interuniversity Research and Analysis of Organizations (CIRANO, Montreal) for its support, as well as Thibault Senegas, Jeremy Schneider, Marine Leroi, Martin Paquette and Romain Le Duc. However, errors and omissions are his.

Contributing to the package

Bug reports

When you file a bug report, please spend some time making it easy for me to follow and reproduce. The more time you spend on making the bug report coherent, the more time I can dedicate to investigate the bug as opposed to the bug report.

Contributing to the package development

To get started, consider either adding a new example or enhancing the existing documentation.

If you're interested in submitting a Pull Request to include your own functions, please include the following:

  • The code for the new function(s), complete with roxygen annotations and sample usage.
  • A dedicated section in the relevant vignette that explains how to utilize the new function.

To ensure your changes are compliant, run rhub::checkforcran() using rhub. After submission, your Pull Request will undergo automated evaluation via GitHub Actions, allowing you to monitor for any issues.

Owner

  • Name: Thierry Warin
  • Login: warint
  • Kind: user
  • Location: Montreal
  • Company: HEC Montréal

Professor of Data Science

Citation (CITATION.cff)

# -----------------------------------------------------------
# CITATION file created with {cffr} R package, v0.4.1
# See also: https://docs.ropensci.org/cffr/
# -----------------------------------------------------------
 
cff-version: 1.2.0
message: 'To cite package "statcanR" in publications use:'
type: software
license: MIT
title: 'Access Statistics Canada’s Open Economic Data for Statistics and Data Science Courses'
version: 0.2.5
abstract: This article is about the two conflicting goals when teaching statistics or data science courses based on real-world data in a business school environment. We propose to look at structured socio-economic data about the Canadian economy. Canada was ranked 8th in 2017 by Open Data Watch (Government of Canada) for its data accessibility policy. Statistics Canada offers several ways to access data across its over 11,000 data tables. We built an R package to ease access to Statistics Canada's open economic data. With this package, we offer students another option to collect data about the Canadian economy. Warin (2024) <doi:10.5070/T5.1868>.
authors:
- family-names: Warin
  given-names: Thierry
  email: thierry.warin@hec.ca
  orcid: https://orcid.org/0000-0002-5921-3428
repository: https://CRAN.R-project.org/package=statcanR
repository-code: https://github.com/warint/statcanR/issues/
url: https://github.com/warint/statcanR/
contact:
- family-names: Warin
  given-names: Thierry
  email: thierry.warin@hec.ca
  orcid: https://orcid.org/0000-0002-5921-3428
references:
- type: software
  title: knitr
  abstract: 'knitr: A General-Purpose Package for Dynamic Report Generation in R'
  notes: Suggests
  url: https://yihui.org/knitr/
  repository: https://CRAN.R-project.org/package=knitr
  authors:
  - family-names: Xie
    given-names: Yihui
    email: xie@yihui.name
    orcid: https://orcid.org/0000-0003-0645-5666
  year: '2023'
- type: software
  title: rmarkdown
  abstract: 'rmarkdown: Dynamic Documents for R'
  notes: Suggests
  url: https://pkgs.rstudio.com/rmarkdown/
  repository: https://CRAN.R-project.org/package=rmarkdown
  authors:
  - family-names: Allaire
    given-names: JJ
    email: jj@rstudio.com
  - family-names: Xie
    given-names: Yihui
    email: xie@yihui.name
    orcid: https://orcid.org/0000-0003-0645-5666
  - family-names: McPherson
    given-names: Jonathan
    email: jonathan@rstudio.com
  - family-names: Luraschi
    given-names: Javier
    email: javier@rstudio.com
  - family-names: Ushey
    given-names: Kevin
    email: kevin@rstudio.com
  - family-names: Atkins
    given-names: Aron
    email: aron@rstudio.com
  - family-names: Wickham
    given-names: Hadley
    email: hadley@rstudio.com
  - family-names: Cheng
    given-names: Joe
    email: joe@rstudio.com
  - family-names: Chang
    given-names: Winston
    email: winston@rstudio.com
  - family-names: Iannone
    given-names: Richard
    email: rich@rstudio.com
    orcid: https://orcid.org/0000-0003-3925-190X
  year: '2023'
- type: software
  title: data.table
  abstract: 'data.table: Extension of `data.frame`'
  notes: Imports
  url: https://r-datatable.com
  repository: https://CRAN.R-project.org/package=data.table
  authors:
  - family-names: Dowle
    given-names: Matt
    email: mattjdowle@gmail.com
  - family-names: Srinivasan
    given-names: Arun
    email: asrini@pm.me
  year: '2023'
- type: software
  title: qpdf
  abstract: 'qpdf: Split, Combine and Compress PDF Files'
  notes: Imports
  url: https://docs.ropensci.org/qpdf/
  repository: https://CRAN.R-project.org/package=qpdf
  authors:
  - family-names: Ooms
    given-names: Jeroen
    email: jeroen@berkeley.edu
    orcid: https://orcid.org/0000-0002-4035-0289
  year: '2023'
- type: software
  title: DT
  abstract: 'DT: A Wrapper of the JavaScript Library ''DataTables'''
  notes: Imports
  url: https://github.com/rstudio/DT
  repository: https://CRAN.R-project.org/package=DT
  authors:
  - family-names: Xie
    given-names: Yihui
    email: xie@yihui.name
  - family-names: Cheng
    given-names: Joe
  - family-names: Tan
    given-names: Xianying
  year: '2023'
- type: software
  title: curl
  abstract: 'curl: A Modern and Flexible Web Client for R'
  notes: Imports
  url: https://curl.se/libcurl/
  repository: https://CRAN.R-project.org/package=curl
  authors:
  - family-names: Ooms
    given-names: Jeroen
    email: jeroen@berkeley.edu
    orcid: https://orcid.org/0000-0002-4035-0289
  year: '2023'
- type: software
  title: httr
  abstract: 'httr: Tools for Working with URLs and HTTP'
  notes: Imports
  url: https://httr.r-lib.org/
  repository: https://CRAN.R-project.org/package=httr
  authors:
  - family-names: Wickham
    given-names: Hadley
    email: hadley@rstudio.com
  year: '2023'
- type: software
  title: readr
  abstract: 'readr: Read Rectangular Text Data'
  notes: Imports
  url: https://readr.tidyverse.org
  repository: https://CRAN.R-project.org/package=readr
  authors:
  - family-names: Wickham
    given-names: Hadley
    email: hadley@rstudio.com
  - family-names: Hester
    given-names: Jim
  - family-names: Bryan
    given-names: Jennifer
    email: jenny@rstudio.com
    orcid: https://orcid.org/0000-0002-6983-2759
  year: '2023'
- type: software
  title: tibble
  abstract: 'tibble: Simple Data Frames'
  notes: Imports
  url: https://tibble.tidyverse.org/
  repository: https://CRAN.R-project.org/package=tibble
  authors:
  - family-names: Müller
    given-names: Kirill
    email: krlmlr+r@mailbox.org
  - family-names: Wickham
    given-names: Hadley
    email: hadley@rstudio.com
  year: '2023'

GitHub Events

Total
  • Watch event: 2
  • Push event: 1
  • Fork event: 1
Last Year
  • Watch event: 2
  • Push event: 1
  • Fork event: 1

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 192
  • Total Committers: 7
  • Avg Commits per committer: 27.429
  • Development Distribution Score (DDS): 0.635
Past Year
  • Commits: 36
  • Committers: 2
  • Avg Commits per committer: 18.0
  • Development Distribution Score (DDS): 0.139
Top Committers
Name Email Commits
Martin Paquette p****m@s****o 70
Thierry Warin t****n@g****m 43
Thierry Warin t****n@n****m 31
Thierry Warin t****u@g****m 31
Thierry Warin t****y@n****m 14
paquettem m****e@n****m 2
Martin Paquette p****m@n****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 4
  • Total pull requests: 2
  • Average time to close issues: about 22 hours
  • Average time to close pull requests: about 1 year
  • Total issue authors: 4
  • Total pull request authors: 2
  • Average comments per issue: 1.75
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • st501351 (1)
  • AgriExam (1)
  • GuerraSa (1)
  • dmurdoch (1)
Pull Request Authors
  • olivroy (2)
  • dmurdoch (2)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 382 last-month
  • Total docker downloads: 41,971
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 6
  • Total maintainers: 1
cran.r-project.org: statcanR

Client for Statistics Canada's Open Economic Data

  • Versions: 6
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 382 Last month
  • Docker Downloads: 41,971
Rankings
Forks count: 11.3%
Stargazers count: 16.3%
Downloads: 23.2%
Average: 23.2%
Dependent packages count: 29.8%
Dependent repos count: 35.5%
Maintainers (1)
Last synced: 7 months ago

Dependencies

DESCRIPTION cran
  • curl * imports
  • data.table * imports
  • httr * imports
  • readr * imports
  • tibble * imports
  • knitr * suggests
  • rmarkdown * suggests