earnest

Statistical baby name preference learning

https://github.com/captainpete/earnest

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.3%) to scientific vocabulary
Last synced: 9 months ago · JSON representation ·

Repository

Statistical baby name preference learning

Basic Info
  • Host: GitHub
  • Owner: captainpete
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 294 KB
Statistics
  • Stars: 2
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

Earnest

A quick baby-name preference learning app for unabashed data science nerds.

  • Uses the names dataset from https://pypi.org/project/names-dataset/
  • Converts all the names to 875 wide feature frame comprised of:
    • 1: male prevalence
    • 1: female prevalence
    • 105: country prevalence
    • 768: embedding vector
  • By soliciting preferences in repeated 1-vs-20 rounds, learns a ranking
  • Displays the top and bottom 50 names

More on preferences

This is an Active Learning approach. Active Learning is useful when labelling is expensive, but can be prone to feedback loops depending on how the iterations are constructed. This app presents three columns of names, here referred to as A, B and C, from which the user is to select a single name as the most preferred for the round.

  • Column A is sampled from the current top 200 names
  • Column B is sampled from the next 800 (200:1000) names
  • Column C is sampled uniformly at random

By clicking on a name, the preference is recorded, the model retrained, and the user presented with a new set of names. There is a search box under the columns to allow selection of a name not listed (useful in bootstrapping the model), and a button filled with flowers that simply resamples using the current model.

Iterative results

After a few rounds, I've found this model learns a preference relatively quickly. One imagines the features are quite informative, specifically the embedding vector - which encodes all manner of historical, literate, and cultural associations. Results are displayed after each round.

Bias

If your name is in "the worst" list - please don't take it personally! This model is designed to uncover your preference; mine was clearly very Victorian. It's worth noting also that the nomic embedding model used here will also have a bias that reflects the data.

This project is called Earnest reflecting the preferences of Gwendolen and Cecily, who preferred that name over Jack or John - but what did this say about their preference for other names?

Running

Support

  • The code is pretty straight-forward. Best of luck!

Owner

  • Name: Peter Hollows
  • Login: captainpete
  • Kind: user
  • Location: Melbourne, Australia

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: earnest
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Peter
    family-names: Hollows
    email: github@dojo7.com
repository-code: 'https://github.com/captainpete/earnest'
license: MIT

GitHub Events

Total
  • Watch event: 4
  • Push event: 1
  • Create event: 2
Last Year
  • Watch event: 4
  • Push event: 1
  • Create event: 2