https://github.com/centre-for-humanities-computing/gender-identification

Code and pipeline for gender identification based on names.

https://github.com/centre-for-humanities-computing/gender-identification

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.7%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Code and pipeline for gender identification based on names.

Basic Info
  • Host: GitHub
  • Owner: centre-for-humanities-computing
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 6.84 KB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 2 years ago · Last pushed about 2 years ago
Metadata Files
Readme License

README.md

gender-identification

Code and pipeline for gender identification based on names. The repo contains a CLI and a package for easily adding a gender column to tabular data.

Usage

Install the package: bash pip install gender-identification

If you have some tabular data in csv, tsv or jsonl you can just add a gender and a gender_confidence column to these using the CLI.

bash python3 -m gender_identification data.csv --name_column "first_name"

Alternatively you can save it to a different file:

bash python3 -m gender_identification data.csv --name_column "first_name" -o results.csv

You can also just use the package in Python: ```python from genderidentification import addgender

df = pd.DataFrame({"name": ["Peter Jørgensen", "Malte Larsen"]})

df = addgender(df, namecolumn="name", removelastname=True) ```

Parameters

| Parameter | Flag(s) | Description | Default Value | |-------------------|---------------------|-----------------------------------------------------------------------------------------------------|---------------------------| | in_file | | Input file path. | - | | name_column | --name_column, -n | Column where names are contained. | - | | out_file | --out_file, -o | Output file path. If not specified, the original file will be overwritten. | None | | remove_last_name| --remove_last_name, -r | Indicates whether last names should be removed. | False | | drop_confidence | --drop_confidence, -d | Indicates whether to drop the column indicating the model's confidence in its predictions. | False | | batch_size | --batch_size, -b | Size of the batches to do inference in. | 32 |

Owner

  • Name: Center for Humanities Computing Aarhus
  • Login: centre-for-humanities-computing
  • Kind: organization
  • Email: chcaa@cas.au.dk
  • Location: Aarhus, Denmark

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: over 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

pyproject.toml pypi
  • pandas ^2.0.0
  • python ^3.9
  • radicli 0.0.25
  • tqdm 4.66.0
  • transformers ^4.41.0