https://github.com/appeler/clean-names

Deduplicate and parse list of `dirty names'

https://github.com/appeler/clean-names

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.5%) to scientific vocabulary

Keywords

firstname lastname
Last synced: 5 months ago · JSON representation

Repository

Deduplicate and parse list of `dirty names'

Basic Info
  • Host: GitHub
  • Owner: appeler
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 41 KB
Statistics
  • Stars: 20
  • Watchers: 2
  • Forks: 3
  • Open Issues: 0
  • Releases: 0
Topics
firstname lastname
Created about 11 years ago · Last pushed over 5 years ago
Metadata Files
Readme

ReadMe.md

Clean Names

Build Status Build status

The script takes a csv file with column 'Name' containing 'dirty names' --- names with all different formats: lastname firstname, firstname lastname, middlename lastname firstname etc. (see sample input file). And it produces a csv file that has all the columns of the original csv file and the following columns: 'uniqid', 'FirstName', 'MiddleInitial/Name', 'LastName', 'RomanNumeral', 'Title', 'Suffix'. The script takes out duplicate names by default (see sample output file).

Application

The script was used to fix names in CF-Scores from Database on Ideology, Money in Politics, and Elections. Processed database with clean names posted on Harvard DVN.

Installation

  1. Clone this repository

git clone https://github.com/soodoku/clean-names.git

  1. Navigate to clean-names

  2. Run python setup.py install

Using Clean Names

Usage: process_names.py [options]

Command Line Options

-h, --help show this help message and exit -o OUTFILE, --out=OUTFILE Output file in CSV (default: sample_output.csv) -c COLUMN, --column=COLUMN Column name in CSV that contains Names (default: Name) -a, --all Export all names (do not take duplicate names out) (default: False)

Example

 python process_names.py -a sample_input.csv 

License

Scripts are released under the MIT License

Owner

  • Name: appeler
  • Login: appeler
  • Kind: organization

Making sense of names.

GitHub Events

Total
  • Watch event: 4
Last Year
  • Watch event: 4

Dependencies

requirements.txt pypi
  • nameparser ==0.3.10
setup.py pypi
  • nameparser *