r-nameparser-lib

An R library allowing parsing of surname, first name, and gender based on US census data.

https://github.com/mjfii/r-nameparser-lib

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (8.9%) to scientific vocabulary

Keywords

algorithm census-data determination gender library parse r

Last synced: 10 months ago · JSON representation ·

Repository

An R library allowing parsing of surname, first name, and gender based on US census data.

Basic Info

Host: GitHub
Owner: mjfii
License: agpl-3.0
Language: R
Default Branch: master
Homepage:
Size: 1.88 MB

Statistics

Stars: 5
Watchers: 1
Forks: 0
Open Issues: 2
Releases: 0

Topics

algorithm census-data determination gender library parse r

Created over 9 years ago · Last pushed almost 2 years ago

Metadata Files

Readme License Citation

R Name Parser

This R package, name.parser, uses U.S. Census data to parse full names of individuals by identifying surnames, stripping salutations and suffixes, and processing common naming conventions. Additionally, the names are evaluated for gender and the confidence level of that determination.

Motivation

It is a common occurrence to receive data with respect to individuals names in a single string, or single attribute. Stripping this name into a 'first' name, 'middle' name, 'last' name, etc, is essential for comparison and other analytic endeavors. Since, a simple algorithm, i.e. left most 'word' is the 'first' name, does not always work, even within the same data set, this algorithm was built to 'pull' apart a persons name into a 'best guess' set of strings. Additionally, non-alpha characters, duplicate spacing, control characters, etc, are required to be removed in the processing of the string.

Prerequisites

The two required packages, data.table and parallel - both of which are installing when this library is loaded. The census data utilizes the data.table library for look-ups and aggregation, while the parallel library is used when multiple names are required to process.

Installation

Using the devtools function, install with the below:

r install_github('mjfii/Name-Parser') library('name.parser')

Examples

To parse a name:

```r

returns a single pipe (`|`) delimted string, e.g. "salutation|first|middle|last|suffix|gender|confidence".

x <- 'livingston III, Mr. MICHAEL JOHN9' parse.name(x)

or, for multiple names in a `data.table` with similar attributes

parse.names(x) ```

To 'prepare' a name: r x <- 'livingston III, Mr. MICHAEL JOHN9' prep.name(x)

To get the census data: r x <- 'livingston III, Mr. MICHAEL JOHN9' x <- prep.name(x) x <- strsplit(x,' ')[[1]] get.census.data(x)

To determine surname (last name) ordinal: r x <- 'livingston III, Mr. MICHAEL JOHN9' x <- prep.name(x) x <- strsplit(x,' ')[[1]] cd <- get.census.data(x) print(x) determine.surname(cd)

To determine gender: r x <- 'livingston III, Mr. MICHAEL JOHN9' x <- prep.name(x) x <- strsplit(x,' ')[[1]] cd <- get.census.data(x) determine.gender(cd)

Contributors

Michael Flanigan
email: mick.flanigan@gmail.com
twitter: @mjfii

Versioning

0.0.0.9000 - Initial deployment (2017-02-10)

Owner

Name: Michael Flanigan
Login: mjfii
Kind: user
Location: Denver, CO

Repositories: 2
Profile: https://github.com/mjfii

Data [Strategy | Architecture | Engineering] Professional

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Name & Gender Parser
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Michael
    family-names: Flanigan
    email: mick.flanigan@icloud.com
    orcid: 'https://orcid.org/0009-0004-6247-6538'
repository-code: 'https://github.com/mjfii/R-NameParser-Lib'
abstract: >-
  This R package, name.parser, uses U.S. Census data to
  parse full names of individuals by identifying surnames,
  stripping salutations and suffixes, and processing common
  naming conventions. Additionally, the names are evaluated
  for gender and the confidence level of that determination.
keywords:
  - R
  - R Library
  - Name Parse
  - Gender Parse
license: AGPL-3.0
commit: 78d6a5c6782b81385866bff891137c6596cce4b4
version: 0.0.0.9000
date-released: '2017-02-10'

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

r-nameparser-lib

Science Score: 44.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

readme.md

R Name Parser

Motivation

Prerequisites

Installation

Examples

returns a single pipe (`|`) delimted string, e.g. "salutation|first|middle|last|suffix|gender|confidence".

or, for multiple names in a `data.table` with similar attributes

Contributors

Versioning

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

r-nameparser-lib

Science Score: 44.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

readme.md

R Name Parser

Motivation

Prerequisites

Installation

Examples

returns a single pipe (|) delimted string, e.g. "salutation|first|middle|last|suffix|gender|confidence".

or, for multiple names in a data.table with similar attributes

Contributors

Versioning

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

returns a single pipe (`|`) delimted string, e.g. "salutation|first|middle|last|suffix|gender|confidence".

or, for multiple names in a `data.table` with similar attributes