https://github.com/centre-for-humanities-computing/gender-identification
Code and pipeline for gender identification based on names.
https://github.com/centre-for-humanities-computing/gender-identification
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (7.7%) to scientific vocabulary
Repository
Code and pipeline for gender identification based on names.
Basic Info
- Host: GitHub
- Owner: centre-for-humanities-computing
- License: mit
- Language: Python
- Default Branch: main
- Size: 6.84 KB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
gender-identification
Code and pipeline for gender identification based on names. The repo contains a CLI and a package for easily adding a gender column to tabular data.
Usage
Install the package:
bash
pip install gender-identification
If you have some tabular data in csv, tsv or jsonl you can just add a gender and a gender_confidence column to these using the CLI.
bash
python3 -m gender_identification data.csv --name_column "first_name"
Alternatively you can save it to a different file:
bash
python3 -m gender_identification data.csv --name_column "first_name" -o results.csv
You can also just use the package in Python: ```python from genderidentification import addgender
df = pd.DataFrame({"name": ["Peter Jørgensen", "Malte Larsen"]})
df = addgender(df, namecolumn="name", removelastname=True) ```
Parameters
| Parameter | Flag(s) | Description | Default Value |
|-------------------|---------------------|-----------------------------------------------------------------------------------------------------|---------------------------|
| in_file | | Input file path. | - |
| name_column | --name_column, -n | Column where names are contained. | - |
| out_file | --out_file, -o | Output file path. If not specified, the original file will be overwritten. | None |
| remove_last_name| --remove_last_name, -r | Indicates whether last names should be removed. | False |
| drop_confidence | --drop_confidence, -d | Indicates whether to drop the column indicating the model's confidence in its predictions. | False |
| batch_size | --batch_size, -b | Size of the batches to do inference in. | 32 |
Owner
- Name: Center for Humanities Computing Aarhus
- Login: centre-for-humanities-computing
- Kind: organization
- Email: chcaa@cas.au.dk
- Location: Aarhus, Denmark
- Website: https://chc.au.dk/
- Repositories: 130
- Profile: https://github.com/centre-for-humanities-computing
GitHub Events
Total
Last Year
Issues and Pull Requests
Last synced: over 1 year ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- pandas ^2.0.0
- python ^3.9
- radicli 0.0.25
- tqdm 4.66.0
- transformers ^4.41.0