https://github.com/appeler/outkast

Using data from over 140M+ Indians from the SECC 2011, we map last names to caste (SC, ST, Other)

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 2 DOI reference(s) in README
○
Academic publication links
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.7%) to scientific vocabulary

Keywords

caste india names scheduled-caste scheduled-tribe

Keywords from Contributors

electoral-rolls gender-classification distributed-data-collection traveling-salesman ethnicity lstm race

Last synced: 9 months ago · JSON representation

Repository

Using data from over 140M+ Indians from the SECC 2011, we map last names to caste (SC, ST, Other)

Basic Info

Host: GitHub
Owner: appeler
License: mit
Language: Jupyter Notebook
Default Branch: master
Homepage:
Size: 40.4 MB

Statistics

Stars: 9
Watchers: 2
Forks: 1
Open Issues: 0
Releases: 0

Topics

caste india names scheduled-caste scheduled-tribe

Created over 6 years ago · Last pushed over 3 years ago

Metadata Files

Readme License

README.rst

outkast: estimate caste by last name, year, and state
-----------------------------------------------------

.. image:: https://travis-ci.org/appeler/outkast.svg?branch=master
    :target: https://travis-ci.org/appeler/outkast
.. image:: https://img.shields.io/pypi/v/outkast.svg
    :target: https://pypi.python.org/pypi/outkast
.. image:: https://pepy.tech/badge/outkast
    :target: https://pepy.tech/project/outkast


Using data on more than 140M Indians across 19 states from the `Socio-Economic Caste Census `__ (parsed data `here `__), we estimate the proportion `scheduled caste, scheduled tribe, and other` for a particular last name, year, and state.

Why?
====

We provide this package so that people can assess, highlight, and fight unfairness.

How is the underlying data produced?
====================================

1. The `script `__ downloads the `clean version `__ of the SECC posted `here `__.

2. `Produce base data frame `__ and `infer last names `__

  * remove names with non-alphabetical characters
  * remove records with missing last names
  * remove < 2 char last names
  * remove rows with birth_date < 1900
  * last name shared by at least 1000 hh

3. `Group by last name, state, and year `__ and produce the `underlying data `__

Base Classifier
~~~~~~~~~~~~~~~

We start by providing a base model for last\_name that gives the Bayes
optimal solution providing the proportion of `SC, ST, and Other` with that last name.
We also provide a series of base models where the state of
residence is known.

Installation
~~~~~~~~~~~~

We strongly recommend installing `outkast` inside a Python virtual environment (see `venv documentation `__)

::

    pip install outkast


Usage
~~~~~

::

    usage: secc_caste [-h] -l LAST_NAME
                    [-s {arunachal pradesh,assam,bihar,chhattisgarh,gujarat,haryana,kerala,madhya pradesh,maharashtra,mizoram,odisha,nagaland,punjab,rajasthan,sikkim,tamilnadu,uttar pradesh,uttarakhand,west bengal}]
                    [-y YEAR] [-o OUTPUT]
                    input

    Appends SECC 2011 data columns for sc, st, and other by last name

    positional arguments:
    input                 Input file

    optional arguments:
    -h, --help            show this help message and exit
    -l LAST_NAME, --last-name LAST_NAME
                            Name or index location of column contains the last
                            name
    -s {arunachal pradesh,assam,bihar,chhattisgarh,gujarat,haryana,kerala,madhya pradesh,maharashtra,mizoram,odisha,nagaland,punjab,rajasthan,sikkim,tamilnadu,uttar pradesh,uttarakhand,west bengal}, --state {arunachal pradesh,assam,bihar,chhattisgarh,gujarat,haryana,kerala,madhya pradesh,maharashtra,mizoram,odisha,nagaland,punjab,rajasthan,sikkim,tamilnadu,uttar pradesh,uttarakhand,west bengal}
                            State name of SECC data (default=all)
    -y YEAR, --year YEAR  Birth year in SECC data (default=all)
    -o OUTPUT, --output OUTPUT
                            Output file with SECC data columns



Using outkast
~~~~~~~~~~~~~

::

    >>> import pandas as pd
    >>> from outkast import secc_caste
    >>>
    >>> names = [{'name': 'patel'},
    ...             {'name': 'zala'},
    ...             {'name': 'lal'},
    ...             {'name': 'agarwal'}]
    >>>
    >>> df = pd.DataFrame(names)
    >>>
    >>> secc_caste(df, 'name')
        name    n_sc    n_st  n_other   prop_sc   prop_st  prop_other
    0    patel    5681  112302   631393  0.007581  0.149861    0.842558
    1     zala     667      14    34550  0.018932  0.000397    0.980670
    2      lal  703595  241846  1314224  0.311371  0.107027    0.581601
    3  agarwal      39      12     4375  0.008812  0.002711    0.988477


    >>>
    >>> help(secc_caste)
    Help on method secc_caste in module outkast.secc_caste_ln:

    secc_caste(df, namecol, state=None, year=None) method of builtins.type instance
        Appends additional columns from SECC data to the input DataFrame
        based on the last name.

        Removes extra space. Checks if the name is the SECC data.
        If it is, outputs data from that row.

        Args:
            df (:obj:`DataFrame`): Pandas DataFrame containing the last name
                column.
            namecol (str or int): Column's name or location of the name in
                DataFrame.
            state (str): The state name of SECC data to be used.
                (default is None for all states)
            year (int): The year of SECC data to be used.
                (default is None for all years)

        Returns:
            DataFrame: Pandas DataFrame with additional columns:-
                'n_sc', 'n_st', 'n_other',
                'prop_sc', 'prop_st', 'prop_other' by last name


Authors
~~~~~~~

Suriyan Laohaprapanon and Gaurav Sood

License
~~~~~~~

The package is released under the `MIT
License `__.

Owner

Name: appeler
Login: appeler
Kind: organization

Website: https://appeler.github.io/
Repositories: 24
Profile: https://github.com/appeler

Making sense of names.

GitHub Events

Total

Watch event: 1

Last Year

Watch event: 1

Committers

Last synced: about 3 years ago

All Time

Total Commits: 22
Total Committers: 3
Avg Commits per committer: 7.333
Development Distribution Score (DDS): 0.5

Top Committers

Name	Email	Commits
*****	g**7@g**m	11
Suriyan Laohaprapanon	s**t@g**m	10
Snyk bot	g**t@s**o	1

Committer Domains (Top 20 + Academic)

snyk.io: 1

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 0
Total pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: 5 minutes
Total issue authors: 0
Total pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Pull Request Authors

snyk-bot (1)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 28 last-month

Total dependent packages: 0
Total dependent repositories: 1
Total versions: 2
Total maintainers: 2

pypi.org: outkast

Infer Caste from Indian Names

Homepage: https://github.com/appeler/outkast
Documentation: https://outkast.readthedocs.io/
License: MIT
Latest release: 0.2.1
published almost 6 years ago

Versions: 2
Dependent Packages: 0
Dependent Repositories: 1
Downloads: 28 Last month

Rankings

Dependent packages count: 10.0%

Stargazers count: 19.3%

Dependent repos count: 21.7%

Forks count: 22.6%

Average: 23.8%

Downloads: 45.4%

Maintainers (2)

soodoku suriyan

Last synced: 10 months ago

Dependencies

requirements.txt pypi

pandas >=0.19.2

setup.py pypi

pandas >=0.19.2

.github/workflows/python-publish.yml actions

actions/checkout v2 composite
actions/setup-python v2 composite
pypa/gh-action-pypi-publish 27b31702a0e7fc50959f5ad993c78deac1bdfc29 composite

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/appeler/outkast

Science Score: 13.0%

Keywords

Keywords from Contributors

Repository

Basic Info

Statistics

Topics

Metadata Files

README.rst

Owner

GitHub Events

Total

Last Year

Committers

All Time

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: outkast

Rankings

Maintainers (2)

Dependencies