Nominally
Nominally: A Name Parser for Record Linkage - Published in JOSS (2021)
Science Score: 100.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in JOSS metadata -
✓Academic publication links
Links to: joss.theoj.org -
✓Committers with academic emails
1 of 11 committers (9.1%) from academic institutions -
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Keywords
Repository
A maximum-strength name parser for record linkage.
Basic Info
Statistics
- Stars: 38
- Watchers: 3
- Forks: 1
- Open Issues: 5
- Releases: 0
Topics
Metadata Files
README.md
nominally: a maximum-strength name parser for record linkage
🔗 Names
Nominally simplifies and parses a personal name written in Western name order into six core fields: title, first, middle, last, suffix, and nickname.
Typically, nominally is used to parse entire lists or pd.Series of names en masse. This package includes a command line tool to parse a single name for convenient one-off testing and examples.
Human names can be difficult to work with in data. Varying quality and practices across institutions and datasets introduce noise and cause misrepresentation, increasing linkage and deduplication challenges. Common errors and discrepancies include (and this list is by no means exhaustive):
- Arbitrarily split first and middle names.
- Misplaced prefixes of last names such as "van" and "de la."
- Multiple last names partitioned into middle name fields.
- Titles and suffixes variously recorded in different fields, with or without separators.
- Inconsistent capture of accents, the ʻokina, and other non-ASCII characters.
- Single name fields arbitrarily concatenating name parts.
Nominally produces fields intended for comparisons between or within datasets.
As such, names come out formatted for data without regard to human syntactic preference: de von ausfern, mr johann g rather than
Mr. Johann G. de von Ausfern.
📜 Documentation
Full nominally documentation is maintained on ReadTheDocs: https://nominally.readthedocs.io/en/latest/
⛏️ Installation
Releases of nominally are distributed on PyPI, so the recommended approach is to install via pip:
$ python -m pip install -U nominally
📓 Getting Started
Call parse_name() to parse out the six core fields:
``` $ python -q
from nominally import parsename parsename("Vimes, jr, Mr. Samuel 'Sam'") { 'title': 'mr', 'first': 'samuel', 'middle': '', 'last': 'vimes', 'suffix': 'jr', 'nickname': 'sam' } ```
Dive into the Name class to parse out a reformatted string...
```
from nominally import Name n = Name("Vimes, jr, Mr. Samuel 'Sam'") n Name({ 'title': 'mr', 'first': 'samuel', 'middle': '', 'last': 'vimes', 'suffix': 'jr', 'nickname': 'sam' }) str(n) 'vimes, mr samuel (sam) jr' ```
...or use the dict...
```
dict(n) { 'title': 'mr', 'first': 'samuel', 'middle': '', 'last': 'vimes', 'suffix': 'jr', 'nickname': 'sam' } list(n.values()) ['mr', 'samuel', '', 'vimes', 'jr', 'sam'] ```
...or retrieve a more elaborate set of attributes...
```
n.report() { 'raw': "Vimes, jr, Mr. Samuel 'Sam'", 'cleaned': {'jr', 'sam', 'vimes, mr samuel'}, 'parsed': 'vimes, mr samuel (sam) jr', 'list': ['mr', 'samuel', '', 'vimes', 'jr', 'sam'], 'title': 'mr', 'first': 'samuel', 'middle': '', 'last': 'vimes', 'suffix': 'jr', 'nickname': 'sam' } ```
...or capture individual attributes.
```
n.first 'samuel' n['last'] 'vimes' n.get('suffix') 'jr' n.raw "Vimes, jr, Mr. Samuel 'Sam'" ```
🖥️ Command Line
For a quick report, invoke the nominally command line tool:
$ nominally "Vimes, jr, Mr. Samuel 'Sam'"
raw: Vimes, jr, Mr. Samuel 'Sam'
cleaned: {'jr', 'vimes, mr samuel', 'sam'}
parsed: vimes, mr samuel (sam) jr
list: ['mr', 'samuel', '', 'vimes', 'jr', 'sam']
title: mr
first: samuel
middle:
last: vimes
suffix: jr
nickname: sam
🔬 Worked Examples
Binder hosts live Jupyter notebooks walking through examples of nominally.
These notebooks and additional examples reside in the Nominally Examples repository.
👩💻 Community
Interested in helping to improve nominally? Please see CONTRIBUTING.md.
CONTRIBUTING.md also includes directions to run tests, using a clone of the full repository.
Having problems with nominally? Need help or support? Feel free to open an issue here on Github, or contact me via email or Twitter (see my profile for links).
🧙 Author
💡 Acknowledgements
Nominally started as a fork of the python-nameparser package, and has benefitted considerably from this origin⸺especially the wealth of examples and tests developed for python-nameparser.
Owner
- Name: Matt VanEseltine
- Login: vaneseltine
- Kind: user
- Location: Ann Arbor, MI
- Company: University of Michigan
- Website: https://iris.isr.umich.edu/
- Twitter: vaneseltine
- Repositories: 5
- Profile: https://github.com/vaneseltine
Sociologist, Programmer, Data Enthusiast @iris-umetrics
JOSS Publication
Nominally: A Name Parser for Record Linkage
Tags
data science record linkage entity resolution name parsingCitation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "VanEseltine"
given-names: "Matthew"
orcid: "https://orcid.org/0000-0002-9520-1360"
title: "Nominally: A Name Parser for Record Linkage"
version: 1.10
doi: 10.5281/zenodo.5562628
date-released: 2021-10-11
url: "https://github.com/vaneseltine/nominally"
preferred-citation:
type: article
authors:
- family-names: "VanEseltine"
given-names: "Matthew"
orcid: "https://orcid.org/0000-0002-9520-1360"
doi: "10.21105/joss.03440"
journal: "Journal of Open Source Software"
month: 10
start: 3440 # First page number
title: "Nominally: A Name Parser for Record Linkage"
issue: 66
volume: 6
year: 2021
GitHub Events
Total
- Watch event: 8
- Push event: 4
- Pull request event: 3
- Fork event: 1
- Create event: 3
Last Year
- Watch event: 8
- Push event: 7
- Pull request event: 5
- Fork event: 1
- Create event: 3
Committers
Last synced: 7 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Matt VanEseltine | m****n@u****u | 330 |
| Derek Gulbranson | d****3@g****m | 275 |
| Peter Scott | p****r@g****m | 4 |
| Edward Betts | e****d@4****m | 4 |
| abnerjacobsen | a****r@a****r | 2 |
| snyk-bot | s****t@s****o | 1 |
| TyVik | t****8@g****m | 1 |
| Simeon Visser | s****7@g****m | 1 |
| Kelvin S. do Prado | k****w@g****m | 1 |
| Brian S. Corbin | c****s@g****m | 1 |
| Aru Sahni | a****i@g****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 56
- Total pull requests: 11
- Average time to close issues: about 2 months
- Average time to close pull requests: 5 months
- Total issue authors: 3
- Total pull request authors: 2
- Average comments per issue: 0.7
- Average comments per pull request: 0.18
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 3
Past Year
- Issues: 1
- Pull requests: 6
- Average time to close issues: N/A
- Average time to close pull requests: about 1 month
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- vaneseltine (53)
- rjalexa (2)
- sap218 (1)
Pull Request Authors
- vaneseltine (8)
- dependabot[bot] (3)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- unidecode ==1.2
- PyYAML ==5.4 development
- black ==21.5b1 development
- coveralls ==3.0.1 development
- flake8 ==3.9.2 development
- flake8-2020 ==1.6.0 development
- flake8-isort ==4.0.0 development
- mypy ==0.812 development
- nox ==2020.12.31 development
- pylint ==2.8.2 development
- twine ==3.4.1 development
- wheel ==0.36.2 development
- Sphinx ==2.2.0
- doc8 ==0.8.0
- coverage ==5.5 test
- hypothesis ==6.13.0 test
- pytest ==6.2.4 test
- pytest-randomly ==3.8.0 test
- nox *
