jstor

jstor: Import and Analyse Data from Scientific Texts - Published in JOSS (2018)

https://github.com/ropensci/jstor

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 6 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org, zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

jstor peer-reviewed r r-package rstats text-analysis text-mining

Keywords from Contributors

tidyverse exploratory-data-analysis missingness ropensci visualisation tidy-data
Last synced: 6 months ago · JSON representation

Repository

Import journal data from DfR (JSTOR)

Basic Info
Statistics
  • Stars: 47
  • Watchers: 5
  • Forks: 10
  • Open Issues: 17
  • Releases: 9
Topics
jstor peer-reviewed r r-package rstats text-analysis text-mining
Created about 8 years ago · Last pushed 10 months ago
Metadata Files
Readme Changelog

README.Rmd

---
output: github_document
---



```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```
# jstor: Import and Analyse Data from Scientific Articles

**Author:** [Thomas Klebel](https://thomasklebel.eu) 
**License:** [GPL v3.0](https://www.gnu.org/licenses/gpl-3.0.en.html) [![R-CMD-check](https://github.com/ropensci/jstor/actions/workflows/check-standard.yaml/badge.svg)](https://github.com/ropensci/jstor/actions/workflows/check-standard.yaml) [![AppVeyorBuild status](https://ci.appveyor.com/api/projects/status/sry2gtwam7qyfw6l?svg=true)](https://ci.appveyor.com/project/tklebel/jstor) [![Coverage status](https://codecov.io/gh/ropensci/jstor/branch/master/graph/badge.svg)](https://codecov.io/github/ropensci/jstor?branch=master) [![lifecycle](https://img.shields.io/badge/lifecycle-maturing-blue.svg)](https://www.tidyverse.org/lifecycle/#maturing) [![CRAN status](http://www.r-pkg.org/badges/version/jstor)](https://cran.r-project.org/package=jstor) [![CRAN\_Download\_Badge](http://cranlogs.r-pkg.org/badges/grand-total/jstor)](https://CRAN.R-project.org/package=jstor) [![rOpenSci badge](https://badges.ropensci.org/189_status.svg)](https://github.com/ropensci/onboarding/issues/189) [![JOSS badge](http://joss.theoj.org/papers/ba29665c4bff35c37c0ef68cfe356e44/status.svg)](http://joss.theoj.org/papers/ba29665c4bff35c37c0ef68cfe356e44) [![Zenodo DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.1169861.svg)](https://doi.org/10.5281/zenodo.1169861) The tool [Data for Research (DfR)](http://www.jstor.org/dfr/) by JSTOR is a valuable source for citation analysis and text mining. `jstor` provides functions and suggests workflows for importing datasets from DfR. It was developed to deal with very large datasets which require an agreement, but can be used with smaller ones as well. **Note**: As of 2021, JSTOR has moved changed the way they provide data to a new platform called [Constellate](https://constellate.org/). The package `jstor` has not been adapted to this change, and might therefore only be used for legacy data that was optained from the old DfR platform. The most important set of functions is a group of `jst_get_*` functions: - `jst_get_article` - `jst_get_authors` - `jst_get_references` - `jst_get_footnotes` - `jst_get_book` - `jst_get_chapters` - `jst_get_full_text` - `jst_get_ngram` All functions which are concerned with meta data (therefore excluding `jst_get_full_text` and `jst_get_ngram`) operate along the same lines: 1. The file is read with `xml2::read_xml()`. 2. Content of the file is extracted via XPATH or CSS-expressions. 3. The resulting data is returned in a `tibble`. ## Installation To install the package use: ```{r, eval=FALSE} install.packages("jstor") ``` You can install the development version from GitHub with: ```{r gh-installation, eval = FALSE} # install.packages("remotes") remotes::install_github("ropensci/jstor") ``` ## Usage In order to use `jstor`, you first need to load it: ```{r} library(jstor) library(magrittr) ``` The basic usage is simple: supply one of the `jst_get_*`-functions with a path and it will return a tibble with the extracted information. ```{r, results='asis'} jst_get_article(jst_example("article_with_references.xml")) %>% knitr::kable() jst_get_authors(jst_example("article_with_references.xml")) %>% knitr::kable() ``` Further explanations, especially on how to use jstor's functions for importing many files, can be found in the vignettes. ## Getting started In order to use `jstor`, you need some data from DfR. From the [main page](http://www.jstor.org/dfr/) you can create a dataset by searching for terms and restricting the search regarding time, subject and content type. After you created an account, you can download your selection. Alternatively, you can download [sample datasets](http://www.jstor.org/dfr/about/sample-datasets) with documents from before 1923 for the US, and before 1870 for all other countries. ## Supported Elements In their [technical specifications](http://www.jstor.org/dfr/about/technical-specifications), DfR lists fields which should be reliably present in all articles and books. The following table gives an overview, which elements are supported by `jstor`. ### Articles |`xml`-field |reliably present |supported in `jstor`| |:---------------------------------|:----------------|:-------------------| |journal-id (type="jstor") |x |x | |journal-id (type="publisher-id") |x |x | |journal-id (type="doi") | |x | |issn |x | | |journal-title |x |x | |publisher-name |x | | |article-id (type="doi") |x |x | |article-id (type="jstor") |x |x | |article-id (type="publisher-id") | |x | |article-type | |x | |volume | |x | |issue | |x | |article-categories |x | | |article-title |x |x | |contrib-group |x |x | |pub-date |x |x | |fpage |x |x | |lpage | |x | |page-range | |x | |product |x | | |self-uri |x | | |kwd-group |x | | |custom-meta-group |x |x | |fn-group (footnotes) | |x | |ref-list (references) | |x | ### Books |`xml`-field |reliably present |supported in `jstor`| |:---------------------------------|:----------------|:-------------------| |book-id (type="jstor") |x |x | |discipline |x |x | |call-number |x | | |lcsh |x | | |book-title |x |x | |book-subtitle | |x | |contrib-group |x |x | |pub-date |x |x | |isbn |x |x | |publisher-name |x |x | |publisher-loc |x |x | |permissions |x | | |self-uri |x | | |counts |x |x | |custom-meta-group |x |x | ### Book Chapters |`xml`-field |reliably present |supported in `jstor`| |:---------------------------------|:----------------|:-------------------| |book-id (type="jstor") |x |x | |part_id |x |x | |part_label |x |x | |part-title |x |x | |part-subtitle | |x | |contrib-group |x |x | |fpage |x |x | |abstract |x |x | ## Code of conduct Please note that this project is released with a [Contributor Code of Conduct](CONDUCT.md). By participating in this project you agree to abide by its terms. ## Citation To cite `jstor`, please refer to `citation(package = "jstor")`: ``` Klebel (2018). jstor: Import and Analyse Data from Scientific Texts. Journal of Open Source Software, 3(28), 883, https://doi.org/10.21105/joss.00883 ``` ## Acknowledgements Work on `jstor` benefited from financial support for the project "Academic Super-Elites in Sociology and Economics" by the Austrian Science Fund (FWF), project number "P 29211 Einzelprojekte". Some internal functions regarding file paths and example files were adapted from the package `readr`. [![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)

Owner

  • Name: rOpenSci
  • Login: ropensci
  • Kind: organization
  • Email: info@ropensci.org
  • Location: Berkeley, CA

JOSS Publication

jstor: Import and Analyse Data from Scientific Texts
Published
August 08, 2018
Volume 3, Issue 28, Page 883
Authors
Thomas Klebel ORCID
Department of Sociology, University of Graz
Editor
Karthik Ram ORCID
Tags
JSTOR DfR Data for Research scientometrics bibliometrics text mining text analysis citation analysis

GitHub Events

Total
  • Push event: 2
  • Fork event: 1
Last Year
  • Push event: 2
  • Fork event: 1

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 746
  • Total Committers: 6
  • Avg Commits per committer: 124.333
  • Development Distribution Score (DDS): 0.012
Past Year
  • Commits: 2
  • Committers: 1
  • Avg Commits per committer: 2.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Thomas Klebel t****l@h****m 737
Maëlle Salmon m****n@y****e 3
Jim Hester j****r@g****m 3
Nishank Saini n****0@g****m 1
Jeroen Ooms j****s@g****m 1
Benjamin Klebel 3****l 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 81
  • Total pull requests: 9
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 8 days
  • Total issue authors: 9
  • Total pull request authors: 6
  • Average comments per issue: 0.65
  • Average comments per pull request: 2.44
  • Merged pull requests: 7
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • tklebel (73)
  • elinw (1)
  • HenrikBengtsson (1)
  • BillyHall5 (1)
  • krlmlr (1)
  • stefaniebutland (1)
  • yonicd (1)
  • jeroen (1)
  • lionel- (1)
Pull Request Authors
  • jimhester (3)
  • maelle (2)
  • krlmlr (2)
  • tklebel (1)
  • starship9 (1)
  • bklebel (1)
Top Labels
Issue Labels
parallel (2) help wanted (2) hacktoberfest (2) good first issue (1) enhancement (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 309 last-month
  • Total docker downloads: 42,767
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 11
  • Total maintainers: 1
cran.r-project.org: jstor

Read Data from JSTOR/DfR

  • Versions: 11
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 309 Last month
  • Docker Downloads: 42,767
Rankings
Forks count: 7.1%
Stargazers count: 7.4%
Average: 23.2%
Dependent packages count: 29.8%
Dependent repos count: 35.5%
Downloads: 36.1%
Maintainers (1)
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • R >= 3.1 depends
  • cli * imports
  • crayon * imports
  • dplyr >= 1.0.0 imports
  • furrr >= 0.1.0 imports
  • magrittr * imports
  • pryr * imports
  • purrr >= 0.2.4 imports
  • readr >= 2.0.0 imports
  • rlang >= 0.2.0 imports
  • stringr >= 1.3.0 imports
  • tibble >= 3.0.0 imports
  • tidyr >= 0.7.2 imports
  • xml2 >= 1.2.0 imports
  • covr * suggests
  • future * suggests
  • knitr * suggests
  • rmarkdown * suggests
  • testthat * suggests
.github/workflows/r-cmd-check.yml actions
  • actions/checkout v2 composite
  • r-lib/actions/check-r-package v2 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite