Nominally

Nominally: A Name Parser for Record Linkage - Published in JOSS (2021)

https://github.com/vaneseltine/nominally

Science Score: 100.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in JOSS metadata
✓
Academic publication links
Links to: joss.theoj.org
✓
Committers with academic emails
1 of 11 committers (9.1%) from academic institutions
○
Institutional organization owner
✓
JOSS paper metadata
Published in Journal of Open Source Software

Keywords

data-matching data-science deduplication entity-resolution human-name parser parsing record-linkage

Last synced: 6 months ago · JSON representation ·

Repository

A maximum-strength name parser for record linkage.

Basic Info

Host: GitHub
Owner: vaneseltine
License: agpl-3.0
Language: Python
Default Branch: master
Homepage:
Size: 1.1 MB

Statistics

Stars: 38
Watchers: 3
Forks: 1
Open Issues: 5
Releases: 0

Topics

data-matching data-science deduplication entity-resolution human-name parser parsing record-linkage

Created over 6 years ago · Last pushed 6 months ago

Metadata Files

Readme Contributing License Code of conduct Citation

nominally: a maximum-strength name parser for record linkage

🔗 Names

Nominally simplifies and parses a personal name written in Western name order into six core fields: title, first, middle, last, suffix, and nickname.

Typically, nominally is used to parse entire lists or pd.Series of names en masse. This package includes a command line tool to parse a single name for convenient one-off testing and examples.

Human names can be difficult to work with in data. Varying quality and practices across institutions and datasets introduce noise and cause misrepresentation, increasing linkage and deduplication challenges. Common errors and discrepancies include (and this list is by no means exhaustive):

Arbitrarily split first and middle names.
Misplaced prefixes of last names such as "van" and "de la."
Multiple last names partitioned into middle name fields.
Titles and suffixes variously recorded in different fields, with or without separators.
Inconsistent capture of accents, the ʻokina, and other non-ASCII characters.
Single name fields arbitrarily concatenating name parts.

Nominally produces fields intended for comparisons between or within datasets. As such, names come out formatted for data without regard to human syntactic preference: de von ausfern, mr johann g rather than Mr. Johann G. de von Ausfern.

📜 Documentation

Full nominally documentation is maintained on ReadTheDocs: https://nominally.readthedocs.io/en/latest/

⛏️ Installation

Releases of nominally are distributed on PyPI, so the recommended approach is to install via pip:

$ python -m pip install -U nominally

📓 Getting Started

Call parse_name() to parse out the six core fields:

``` $ python -q

from nominally import parsename parsename("Vimes, jr, Mr. Samuel 'Sam'") { 'title': 'mr', 'first': 'samuel', 'middle': '', 'last': 'vimes', 'suffix': 'jr', 'nickname': 'sam' } ```

Dive into the Name class to parse out a reformatted string...

```

from nominally import Name n = Name("Vimes, jr, Mr. Samuel 'Sam'") n Name({ 'title': 'mr', 'first': 'samuel', 'middle': '', 'last': 'vimes', 'suffix': 'jr', 'nickname': 'sam' }) str(n) 'vimes, mr samuel (sam) jr' ```

...or use the dict...

```

dict(n) { 'title': 'mr', 'first': 'samuel', 'middle': '', 'last': 'vimes', 'suffix': 'jr', 'nickname': 'sam' } list(n.values()) ['mr', 'samuel', '', 'vimes', 'jr', 'sam'] ```

...or retrieve a more elaborate set of attributes...

```

n.report() { 'raw': "Vimes, jr, Mr. Samuel 'Sam'", 'cleaned': {'jr', 'sam', 'vimes, mr samuel'}, 'parsed': 'vimes, mr samuel (sam) jr', 'list': ['mr', 'samuel', '', 'vimes', 'jr', 'sam'], 'title': 'mr', 'first': 'samuel', 'middle': '', 'last': 'vimes', 'suffix': 'jr', 'nickname': 'sam' } ```

...or capture individual attributes.

```

n.first 'samuel' n['last'] 'vimes' n.get('suffix') 'jr' n.raw "Vimes, jr, Mr. Samuel 'Sam'" ```

🖥️ Command Line

For a quick report, invoke the nominally command line tool:

$ nominally "Vimes, jr, Mr. Samuel 'Sam'" raw: Vimes, jr, Mr. Samuel 'Sam' cleaned: {'jr', 'vimes, mr samuel', 'sam'} parsed: vimes, mr samuel (sam) jr list: ['mr', 'samuel', '', 'vimes', 'jr', 'sam'] title: mr first: samuel middle: last: vimes suffix: jr nickname: sam

🔬 Worked Examples

Binder hosts live Jupyter notebooks walking through examples of nominally.

These notebooks and additional examples reside in the Nominally Examples repository.

👩‍💻 Community

Interested in helping to improve nominally? Please see CONTRIBUTING.md.

CONTRIBUTING.md also includes directions to run tests, using a clone of the full repository.

Having problems with nominally? Need help or support? Feel free to open an issue here on Github, or contact me via email or Twitter (see my profile for links).

🧙‍ Author

💡 Acknowledgements

Nominally started as a fork of the python-nameparser package, and has benefitted considerably from this origin⸺especially the wealth of examples and tests developed for python-nameparser.

Owner

Name: Matt VanEseltine
Login: vaneseltine
Kind: user
Location: Ann Arbor, MI
Company: University of Michigan

Website: https://iris.isr.umich.edu/
Twitter: vaneseltine
Repositories: 5
Profile: https://github.com/vaneseltine

Sociologist, Programmer, Data Enthusiast @iris-umetrics

JOSS Publication

Nominally: A Name Parser for Record Linkage

Published

October 12, 2021

DOI

10.21105/joss.03440

Volume 6, Issue 66, Page 3440

Authors

Matthew VanEseltine

Institute for Social Research, University of Michigan

Editor

Mark A. Jensen

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "VanEseltine"
  given-names: "Matthew"
  orcid: "https://orcid.org/0000-0002-9520-1360"
title: "Nominally: A Name Parser for Record Linkage"
version: 1.10
doi: 10.5281/zenodo.5562628
date-released: 2021-10-11
url: "https://github.com/vaneseltine/nominally"
preferred-citation:
  type: article
  authors:
  - family-names: "VanEseltine"
    given-names: "Matthew"
    orcid: "https://orcid.org/0000-0002-9520-1360"
  doi: "10.21105/joss.03440"
  journal: "Journal of Open Source Software"
  month: 10
  start: 3440 # First page number
  title: "Nominally: A Name Parser for Record Linkage"
  issue: 66
  volume: 6
  year: 2021

GitHub Events

Total

Watch event: 8
Push event: 4
Pull request event: 3
Fork event: 1
Create event: 3

Last Year

Watch event: 8
Push event: 7
Pull request event: 5
Fork event: 1
Create event: 3

Committers

Last synced: 7 months ago

All Time

Total Commits: 621
Total Committers: 11
Avg Commits per committer: 56.455
Development Distribution Score (DDS): 0.469

Past Year

Commits: 2
Committers: 2
Avg Commits per committer: 1.0
Development Distribution Score (DDS): 0.5

Top Committers

Name	Email	Commits
Matt VanEseltine	m**n@u**u	330
Derek Gulbranson	d**3@g**m	275
Peter Scott	p**r@g**m	4
Edward Betts	e**d@4**m	4
abnerjacobsen	a**r@a**r	2
snyk-bot	s**t@s**o	1
TyVik	t**8@g**m	1
Simeon Visser	s**7@g**m	1
Kelvin S. do Prado	k**w@g**m	1
Brian S. Corbin	c**s@g**m	1
Aru Sahni	a**i@g**m	1

Committer Domains (Top 20 + Academic)

snyk.io: 1 apoana.com.br: 1 4angle.com: 1 greplin.com: 1 umich.edu: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 56
Total pull requests: 11
Average time to close issues: about 2 months
Average time to close pull requests: 5 months
Total issue authors: 3
Total pull request authors: 2
Average comments per issue: 0.7
Average comments per pull request: 0.18
Merged pull requests: 3
Bot issues: 0
Bot pull requests: 3

Past Year

Issues: 1
Pull requests: 6
Average time to close issues: N/A
Average time to close pull requests: about 1 month
Issue authors: 1
Pull request authors: 1
Average comments per issue: 0.0
Average comments per pull request: 0.0
Merged pull requests: 3
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

vaneseltine (53)
rjalexa (2)
sap218 (1)

Pull Request Authors

vaneseltine (8)
dependabot[bot] (3)

Top Labels

Issue Labels

test__issues.py (12) enhancement (10) bug (9) documentation (3) refactor (3) test_needed (3) wontfix (1) ci/cd (1) maybe (1)

Pull Request Labels

dependencies (3)

Dependencies

requirements/common.txt pypi

unidecode ==1.2

requirements/dev.txt pypi

PyYAML ==5.4 development
black ==21.5b1 development
coveralls ==3.0.1 development
flake8 ==3.9.2 development
flake8-2020 ==1.6.0 development
flake8-isort ==4.0.0 development
mypy ==0.812 development
nox ==2020.12.31 development
pylint ==2.8.2 development
twine ==3.4.1 development
wheel ==0.36.2 development

requirements/docs.txt pypi

Sphinx ==2.2.0
doc8 ==0.8.0

requirements/test.txt pypi

coverage ==5.5 test
hypothesis ==6.13.0 test
pytest ==6.2.4 test
pytest-randomly ==3.8.0 test

pyproject.toml pypi

requirements/circleci.txt pypi

nox *

requirements.txt pypi

Nominally

Science Score: 100.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

nominally: a maximum-strength name parser for record linkage

🔗 Names

📜 Documentation

⛏️ Installation

📓 Getting Started

🖥️ Command Line

🔬 Worked Examples

👩‍💻 Community

🧙‍ Author

💡 Acknowledgements

Owner

JOSS Publication

Nominally: A Name Parser for Record Linkage

Authors

Editor

Tags

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies