refsplitr

refsplitr: Author name disambiguation, author georeferencing, and mapping of coauthorship networks with Web of Science data - Published in JOSS (2020)

https://github.com/ropensci/refsplitr

Science Score: 95.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 7 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org
  • Committers with academic emails
    1 of 7 committers (14.3%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Scientific Fields

Biology Life Sciences - 40% confidence
Last synced: 4 months ago · JSON representation

Repository

R package for processing, organizing, and visualizing reference records downloaded from the Web of Science.

Basic Info
Statistics
  • Stars: 55
  • Watchers: 6
  • Forks: 6
  • Open Issues: 17
  • Releases: 3
Created about 8 years ago · Last pushed 9 months ago
Metadata Files
Readme Changelog Contributing License Code of conduct Codemeta

README.Rmd

---
fig_caption: yes
---

# refsplitr 



[![Project Status: Active The project has reached a stable, usable state and is being actively developed.](http://www.repostatus.org/badges/latest/active.svg)](http://www.repostatus.org/#active)
[![](https://badges.ropensci.org/256_status.svg)](https://github.com/ropensci/onboarding/issues/256)
[![Coverage Status](https://coveralls.io/repos/github/ropensci/refsplitr/badge.svg?branch=master)](https://coveralls.io/github/ropensci/refsplitr?branch=master)
[![status](https://joss.theoj.org/papers/3e46a4970fea0da996251617d2fa85ca/status.svg)](https://joss.theoj.org/papers/3e46a4970fea0da996251617d2fa85ca)
[![R-CMD-check](https://github.com/embruna/refsplitr/workflows/R-CMD-check/badge.svg)](https://github.com/embruna/refsplitr/actions)


refsplitr: author name disambiguation, author georeferencing, and mapping of coauthorship networks with Web of Science data.

refsplitr is an R package to parse and organize reference records downloaded from the Web of Science citation database, disambiguate the names of authors, geocode their locations, and generate/visualize coauthorship networks. The WOS is a toll-access literature and citation database maintained by Clarivate Analytics that indexes articles from \~12,000 academic journals. WOS records include a diversity of data about each article (e.g., article title, journal name, author names, author addresses, number of times the article has been cited, funding sources), making them very useful for studying patterns of scientific productivity, coauthorship, research impact, and other Science of Science topics. Because bulk WOS records and API access to the WOS are very expensive, researchers at WOS-subscribing institutions often gather data by conducting WOS searches and downloading reference records in batches. However, this requires a cumbersome process of extracting, merging, and correcting data from the downloaded records prior to conducting any analyses. refsplitr will rapidly merge and process reference data files downloaded from the WOS, and then process and organize them in a format amenable for use in scientometric, social network, and Science of Science analyses.

Support for the development of refsplitr was provided by grants from the [University of Florida Center for Latin American Studies](http://www.latam.ufl.edu/) and the [University of Florida Informatics Institute](https://informatics.institute.ufl.edu/).

## Installation

You can install the development version from [GitHub](https://github.com/) with:

``` r
# install.packages("devtools")
devtools::install_github("ropensci/refsplitr")
```


## Workflow

There are four steps in the `refsplitr` package's workflow:

1.  Importing and tidying Web of Science reference records (be sure to download records using the procedure in Appendix 1 of the [vignette](https://docs.ropensci.org/refsplitr/articles/refsplitr.html))
2.  Author name disambiguation and parsing of author addresses
3.  Georeferencing of author institutions using either the [Nominatim](https://nominatim.org/) service, which uses OpenStreetMap data and which `refsplitr` queries via the [`tidygeocoder`]((https://jessecambon.github.io/tidygeocoder/) package (default; free) _OR_ the  [Data Science Toolkit](http://www.datasciencetoolkit.org/), which uses the Google maps API (limited number of free queries after which users must pay); for additional details on pricing information how to register with Google to use their API see the `refsplitr` [vignette](https://docs.ropensci.org/refsplitr/articles/refsplitr.html).
4.  Data visualization

The procedures required for these four steps,each of which is implemented with a simple command, are described in detail in the `refsplitr` [vignette](https://docs.ropensci.org/refsplitr/articles/refsplitr.html). An example of this workflow is provided below:

```{r example1, eval=FALSE}

# load the Web of Science records into a dataframe
dat1 <- references_read(data = system.file("extdata", "example_data.txt", package = "refsplitr"), dir = FALSE)

# disambiguate author names and parse author address
dat2 <- authors_clean(references = dat1)

# after revieiwng disambiguation, merge any necessary corrections
dat3 <- authors_refine(dat2$review, dat2$prelim)

# georeference the author locations
dat4 <- authors_georef(dat3)

# generate a map of coauthorships; this is only one of the five possible visualizations  
plot_net_address(dat4$addresses)
```

## Improvements & Suggestions

We welcome any suggestions for package improvement or ideas for features to include in future versions. We welcome any suggestions for package improvement or ideas for features to include in future versions. If you have suggestions, [here is how to contribute](https://github.com/ropensci/refsplitr/blob/master/CONTRIBUTING.md). We expect everyone contributing to the package to abide by our [Code of Conduct](https://github.com/ropensci/refsplitr/blob/master/CODE_OF_CONDUCT.md).

Map of georeferenced article coauthorships generated with refsplitr.
## Contributors - [Auriel Fournier](https://github.com/aurielfournier), Porzana Solutions - [Matt Boone](https://github.com/birderboone), Porzana Solutions - [Forrest Stevens](http://forreststevens.com/teaching/research.html), University of Louisville - [Emilio M. Bruna](https://github.com/embruna), University of Florida ## Citation The Refsplitr package has been described in an article in the [_Journal of Open Source Software_](https://joss.theoj.org/papers/10.21105/joss.02028). We request that you cite both the package and the publication when using Refsplitr in your work. ### Citation: Refsplitr Package Fournier, Auriel M.V., Matthew E. Boone, Forrest R. Stevens, and Emilio M. Bruna (2025). refsplitr: author name disambiguation, author georeferencing, and mapping of coauthorship networks with Web of Science data. R package version 1.2.0. @Manual{refsplitr2025, title = {refsplitr: author name disambiguation, author georeferencing, and mapping of coauthorship networks with Web of Science data.}, author = {Fournier, Auriel M.V., Matthew E. Boone, Forrest R. Stevens, and Emilio M. Bruna}, year = {2020}, note = {R package version 1.2.0.}, url ={https://github.com/ropensci/refsplitr} } ### Citation: _JOSS_ paper Fournier et al., (2020). refsplitr: Author name disambiguation, author georeferencing, and mapping of coauthorship networks with Web of Science data. Journal of Open Source Software, 5(45), 2028, https://doi.org/10.21105/joss.02028 @article{Fournier2020, doi = {10.21105/joss.02028}, url = {https://doi.org/10.21105/joss.02028}, year = {2020}, publisher = {The Open Journal}, volume = {5}, number = {45}, pages = {2028}, author = {Auriel M.v. Fournier and Matthew E. Boone and Forrest R. Stevens and Emilio M. Bruna}, title = {refsplitr: Author name disambiguation, author georeferencing, and mapping of coauthorship networks with Web of Science data}, journal = {Journal of Open Source Software} } ## License [GPL3](https://www.r-project.org/Licenses/GPL-3) --- **_Note regarding early package development_**: The early development of refsplitr - initially known as `refnet` - was by Forrest Stevens and Emilio M. Bruna and was on [r-forge](https://r-forge.r-project.org/projects/refnet/). In December 2017 Bruna moved it to Github and hired [Porzana Solutions](https://github.com/aurielfournier) to finalize the package and prepare it for submission to rOpenSci. *Please make all suggestions for changes via this Github repository - do not* make a repo mirror of the R-forge version. [![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)

Owner

  • Name: rOpenSci
  • Login: ropensci
  • Kind: organization
  • Email: info@ropensci.org
  • Location: Berkeley, CA

JOSS Publication

refsplitr: Author name disambiguation, author georeferencing, and mapping of coauthorship networks with Web of Science data
Published
January 22, 2020
Volume 5, Issue 45, Page 2028
Authors
Auriel M.v. Fournier ORCID
Porzana Solutions, Marquette Heights, IL, 61554, USA
Matthew E. Boone ORCID
Porzana Solutions, Marquette Heights, IL, 61554, USA
Forrest R. Stevens ORCID
Department of Geography & Geosciences, University of Louisville, Louisville, KY, 40292, USA
Emilio M. Bruna ORCID
Center for Latin American Studies, University of Florida, Gainesville, FL, 32611-5530, USA, Department of Wildlife Ecology & Conservation, University of Florida, Gainesville, FL, 32611-4430, USA
Editor
Kyle Niemeyer ORCID
Tags
name disambiguation bibliometrics coauthorship collaboration georeferencing metascience scientometrics science of science Web of Science

CodeMeta (codemeta.json)

{
  "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
  "@type": "SoftwareSourceCode",
  "identifier": "refsplitr",
  "description": "Tools to parse and organize reference records downloaded from the 'Web of Science' citation database into an R-friendly format, disambiguate the names of authors, geocode their locations, and generate/visualize coauthorship networks. This package has been peer-reviewed by rOpenSci (v. 1.0).",
  "name": "refsplitr: author name disambiguation, author georeferencing, and mapping of \n    coauthorship networks with 'Web of Science' data ",
  "relatedLink": "https://docs.ropensci.org/refsplitr/",
  "codeRepository": "https://github.com/ropensci/refsplitr",
  "issueTracker": "https://github.com/ropensci/refsplitr/issues",
  "license": "https://spdx.org/licenses/GPL-3.0",
  "version": "1.0",
  "programmingLanguage": {
    "@type": "ComputerLanguage",
    "name": "R",
    "url": "https://r-project.org"
  },
  "runtimePlatform": "R version 4.4.1 (2024-06-14)",
  "author": [
    {
      "@type": "Person",
      "givenName": "Auriel M.V.",
      "familyName": "Fournier",
      "email": "aurielfournier@gmail.com"
    },
    {
      "@type": "Person",
      "givenName": "Matthew E.",
      "familyName": "Boone"
    },
    {
      "@type": "Person",
      "givenName": "Forrest R.",
      "familyName": "Stevens"
    },
    {
      "@type": "Person",
      "givenName": "Emilio",
      "familyName": "Bruna",
      "email": "embruna@ufl.edu"
    }
  ],
  "maintainer": [
    {
      "@type": "Person",
      "givenName": "Emilio",
      "familyName": "Bruna",
      "email": "embruna@ufl.edu"
    }
  ],
  "softwareSuggestions": [
    {
      "@type": "SoftwareApplication",
      "identifier": "covr",
      "name": "covr",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=covr"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "gdtools",
      "name": "gdtools",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=gdtools"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "knitr",
      "name": "knitr",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=knitr"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "mapproj",
      "name": "mapproj",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=mapproj"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "rmarkdown",
      "name": "rmarkdown",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=rmarkdown"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "testthat",
      "name": "testthat",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=testthat"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "utils",
      "name": "utils"
    }
  ],
  "softwareRequirements": {
    "1": {
      "@type": "SoftwareApplication",
      "identifier": "R",
      "name": "R",
      "version": ">= 2.10"
    },
    "2": {
      "@type": "SoftwareApplication",
      "identifier": "dplyr",
      "name": "dplyr",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=dplyr"
    },
    "3": {
      "@type": "SoftwareApplication",
      "identifier": "ggmap",
      "name": "ggmap",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://github.com/dkahle/ggmap"
    },
    "4": {
      "@type": "SoftwareApplication",
      "identifier": "ggplot2",
      "name": "ggplot2",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=ggplot2"
    },
    "5": {
      "@type": "SoftwareApplication",
      "identifier": "Hmisc",
      "name": "Hmisc",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=Hmisc"
    },
    "6": {
      "@type": "SoftwareApplication",
      "identifier": "igraph",
      "name": "igraph",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=igraph"
    },
    "7": {
      "@type": "SoftwareApplication",
      "identifier": "Matrix",
      "name": "Matrix",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=Matrix"
    },
    "8": {
      "@type": "SoftwareApplication",
      "identifier": "magrittr",
      "name": "magrittr",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=magrittr"
    },
    "9": {
      "@type": "SoftwareApplication",
      "identifier": "network",
      "name": "network",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=network"
    },
    "10": {
      "@type": "SoftwareApplication",
      "identifier": "stringdist",
      "name": "stringdist",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=stringdist"
    },
    "11": {
      "@type": "SoftwareApplication",
      "identifier": "rworldmap",
      "name": "rworldmap",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=rworldmap"
    },
    "12": {
      "@type": "SoftwareApplication",
      "identifier": "sna",
      "name": "sna",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=sna"
    },
    "SystemRequirements": null
  },
  "isPartOf": "https://ropensci.org",
  "keywords": [
    "namedisambiguation",
    "bibliometrics",
    "coauthorship",
    "collaboration",
    "georeferencing",
    "metascience",
    "references",
    "scientometrics",
    "scienceofscience",
    "WebofScience"
  ],
  "fileSize": "78870.734KB",
  "citation": [
    {
      "@type": "CreativeWork",
      "datePublished": "2020",
      "author": [
        {
          "@type": "Person",
          "givenName": [
            "Auriel",
            "M.V."
          ],
          "familyName": "Fournier"
        },
        {
          "@type": "Person",
          "givenName": [
            "Matthew",
            "E."
          ],
          "familyName": "Boone"
        },
        {
          "@type": "Person",
          "givenName": [
            "Forrest",
            "R."
          ],
          "familyName": "Stevens"
        },
        {
          "@type": "Person",
          "givenName": [
            "Emilio",
            "M."
          ],
          "familyName": "Bruna"
        }
      ],
      "name": "refsplitr: Author name disambiguation, author georeferencing, \n  and mapping of coauthorship networks with Web of Science data.",
      "url": "https://github.com/ropensci/refsplitr",
      "description": "R package version 1.0.0."
    }
  ],
  "releaseNotes": "https://github.com/ropensci/refsplitr/blob/master/NEWS.md",
  "readme": "https://github.com/ropensci/refsplitr/blob/master/README.md",
  "contIntegration": [
    "https://coveralls.io/github/ropensci/refsplitr?branch=master",
    "https://github.com/embruna/refsplitr/actions"
  ],
  "developmentStatus": "http://www.repostatus.org/#active",
  "review": {
    "@type": "Review",
    "url": "https://github.com/ropensci/software-review/issues/256",
    "provider": "https://ropensci.org"
  }
}

GitHub Events

Total
  • Create event: 2
  • Release event: 1
  • Issues event: 6
  • Watch event: 1
  • Issue comment event: 3
  • Push event: 33
  • Pull request event: 13
Last Year
  • Create event: 2
  • Release event: 1
  • Issues event: 6
  • Watch event: 1
  • Issue comment event: 3
  • Push event: 33
  • Pull request event: 13

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 870
  • Total Committers: 7
  • Avg Commits per committer: 124.286
  • Development Distribution Score (DDS): 0.562
Past Year
  • Commits: 51
  • Committers: 1
  • Avg Commits per committer: 51.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
embruna e****a@u****u 381
aurielfournier a****r@g****m 311
birderboone b****e@g****m 156
Najko Jahn n****n@g****m 9
Aariq s****r@g****m 6
Maëlle Salmon m****n@y****e 4
Auriel Fournier a****r@f****l 3
Committer Domains (Top 20 + Academic)
ufl.edu: 1

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 70
  • Total pull requests: 38
  • Average time to close issues: 6 months
  • Average time to close pull requests: 2 days
  • Total issue authors: 9
  • Total pull request authors: 6
  • Average comments per issue: 1.91
  • Average comments per pull request: 0.16
  • Merged pull requests: 34
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 2
  • Pull requests: 16
  • Average time to close issues: 2 months
  • Average time to close pull requests: about 1 hour
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 0.5
  • Average comments per pull request: 0.0
  • Merged pull requests: 14
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • embruna (37)
  • aurielfournier (21)
  • maelle (4)
  • birderboone (2)
  • tilltnet (2)
  • kellijohnson-NOAA (1)
  • oguzozbay (1)
  • danieltorrano (1)
  • codetsang (1)
Pull Request Authors
  • embruna (32)
  • aurielfournier (5)
  • maelle (4)
  • tilltnet (2)
  • njahn82 (2)
  • Aariq (1)
Top Labels
Issue Labels
enhancement (5) help wanted (3) bug (2)
Pull Request Labels

Dependencies

DESCRIPTION cran
  • R >= 2.10 depends
  • Hmisc * imports
  • Matrix * imports
  • ggmap * imports
  • ggplot2 * imports
  • igraph * imports
  • maptools * imports
  • network * imports
  • rworldmap * imports
  • sna * imports
  • stringdist * imports
  • covr * suggests
  • gdtools * suggests
  • knitr * suggests
  • mapproj * suggests
  • rmarkdown * suggests
  • testthat * suggests
  • utils * suggests
.github/workflows/R-CMD-check.yaml actions
  • actions/checkout v2 composite
  • r-lib/actions/check-r-package v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/check-standard.yaml actions
  • actions/checkout v2 composite
  • actions/upload-artifact main composite
  • r-lib/actions/check-r-package master composite
  • r-lib/actions/setup-pandoc master composite
  • r-lib/actions/setup-r master composite
  • r-lib/actions/setup-r-dependencies master composite
.github/workflows/pr-commands.yaml actions
  • actions/checkout v2 composite
  • r-lib/actions/pr-fetch v2 composite
  • r-lib/actions/pr-push v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite