segregation

R package to calculate entropy-based segregation indices, with a focus on the Mutual Information Index (M) and Theil’s Information Index (H)

https://github.com/elbersb/segregation

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 19 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (17.1%) to scientific vocabulary

Keywords

entropy r r-package rstats segregation statistics
Last synced: 6 months ago · JSON representation ·

Repository

R package to calculate entropy-based segregation indices, with a focus on the Mutual Information Index (M) and Theil’s Information Index (H)

Basic Info
Statistics
  • Stars: 36
  • Watchers: 6
  • Forks: 3
  • Open Issues: 1
  • Releases: 9
Topics
entropy r r-package rstats segregation statistics
Created almost 8 years ago · Last pushed 8 months ago
Metadata Files
Readme Changelog License Citation

README.Rmd

---
output:
  md_document:
    variant: gfm
editor_options: 
  markdown: 
    wrap: 72
---



```{r, echo = FALSE}
knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>",
    fig.path = "man/figures/README-"
)
options(scipen = 999)
options(digits = 3)
set.seed(69839)
```

# segregation

[![CRAN
Version](https://www.r-pkg.org/badges/version/segregation)](https://CRAN.R-project.org/package=segregation)
[![R build
status](https://github.com/elbersb/segregation/workflows/R-CMD-check/badge.svg)](https://github.com/elbersb/segregation/actions)
[![Coverage
status](https://codecov.io/gh/elbersb/segregation/branch/master/graph/badge.svg)](https://app.codecov.io/github/elbersb/segregation?branch=master)

An R package to calculate, visualize, and decompose various segregation indices. 
The package currently supports

-   the Mutual Information Index (M),
-   Theil's Information Index (H),
-   the index of Dissimilarity (D),
-   the isolation and exposure index.

Find more information in `vignette("segregation")`
and the [documentation](https://elbersb.de/segregation).

The package also supports

-   [standard error and confidence intervals estimation via bootstrapping](https://elbersb.com/public/posts/2021-11-24-segregation-bias/),
    which also corrects for small sample bias
-   decomposition of the M and H indices (within/between, local segregation)
-   decomposing differences in total segregation over time (Elbers 2020)
-   [segregation visualizations](https://elbersb.github.io/segregation/articles/plotting.html) (segregation curves and 'segplots')

Most methods return [tidy](https://vita.had.co.nz/papers/tidy-data.html)
[data.tables](https://rdatatable.gitlab.io/data.table/) for easy
post-processing and plotting. For speed, the package uses the [`data.table`](https://rdatatable.gitlab.io/data.table/)
package internally, and implements some functions in C++.

Most of the procedures implemented in this package are described in more
detail [in this SMR
paper](https://journals.sagepub.com/doi/full/10.1177/0049124121986204)
([Preprint](https://osf.io/preprints/socarxiv/ya7zs/)) and [in this
working paper](https://osf.io/preprints/socarxiv/ruw4g/).

## Usage

The package provides an easy way to calculate segregation measures,
based on the Mutual Information Index (M) and Theil's Entropy Index (H).

```{r}
library(segregation)

# example dataset with fake data provided by the package
mutual_total(schools00, "race", "school", weight = "n")
```

Standard errors in all functions can be estimated via boostrapping. This
will also apply bias-correction to the estimates:

```{r}
mutual_total(schools00, "race", "school",
    weight = "n",
    se = TRUE, CI = 0.90, n_bootstrap = 500
)
```

Decompose segregation into a between-state and a within-state term (the
sum of these equals total segregation):

```{r}
# between states
mutual_total(schools00, "race", "state", weight = "n")

# within states
mutual_total(schools00, "race", "school", within = "state", weight = "n")
```

Local segregation (`ls`) is a decomposition by units or groups (here
racial groups). This function also support standard error and CI
estimation. The sum of the proportion-weighted local segregation scores
equals M:

```{r}
local <- mutual_local(schools00,
    group = "school", unit = "race", weight = "n",
    se = TRUE, CI = 0.90, n_bootstrap = 500, wide = TRUE
)
local[, c("race", "ls", "p", "ls_CI")]
sum(local$p * local$ls)
```

Decompose the difference in M between 2000 and 2005, using iterative
proportional fitting (IPF) and the Shapley decomposition (see Elbers
2021 for details):

```{r}
mutual_difference(schools00, schools05,
    group = "race", unit = "school",
    weight = "n", method = "shapley"
)
```

Show a segplot:

```{r segplot}
segplot(schools00, group = "race", unit = "school", weight = "n")
```

Find more information in the
[documentation](https://elbersb.github.io/segregation/).

## How to install

To install the package from CRAN, use

```{r eval=FALSE}
install.packages("segregation")
```

To install the development version, use

```{r eval=FALSE}
devtools::install_github("elbersb/segregation")
```

## Citation

If you use this package for your research, please cite one of the following papers:

- Elbers, Benjamin (2021). A Method for Studying Differences in Segregation
Across Time and Space. Sociological Methods & Research.


- Elbers, Benjamin and Rob Gruijters (2023). Segplot: A New Method for Visualizing Patterns of Multi-Group Segregation.


## Some additional resources

-   The book *Analyzing US Census Data: Methods, Maps, and Models in R*
    by Kyle E. Walker contains [a discussion of this
    package](https://walker-data.com/census-r/modeling-us-census-data.html#indices-of-segregation-and-diversity),
    and is a great resource for anyone working with spatial data,
    especially U.S. Census data.
-   A paper that makes use of this package: [Did Residential Racial
    Segregation in the U.S. Really Increase? An Analysis Accounting for
    Changes in Racial
    Diversity](https://elbersb.com/public/posts/2021-07-23-segregation-increase/)
    ([Code and Data](https://osf.io/mg9q4/))
-   Some of the analyses [in this
    article](https://multimedia.tijd.be/diversiteit/) by the Belgian
    newspaper *De Tijd* used the package.
-   The analyses of [this article in the Wall Street
    Journal](https://www.wsj.com/articles/chicago-vs-dallas-why-the-north-lags-behind-the-south-and-west-in-racial-integration-11657936680)
    were produced using this package.

## References on entropy-based segregation indices

Deutsch, J., Flückiger, Y. & Silber, J. (2009). Analyzing Changes in
Occupational Segregation: The Case of Switzerland (1970--2000), in: Yves
Flückiger, Sean F. Reardon, Jacques Silber (eds.) Occupational and
Residential Segregation (Research on Economic Inequality, Volume 17),
171--202.

DiPrete, T. A., Eller, C. C., Bol, T., & van de Werfhorst, H. G. (2017).
School-to-Work Linkages in the United States, Germany, and France.
American Journal of Sociology, 122(6), 1869-1938.


Elbers, B. (2021). A Method for Studying Differences in Segregation
Across Time and Space. Sociological Methods & Research.


Forster, A. G., & Bol, T. (2017). Vocational education and employment
over the life course using a new measure of occupational specificity.
Social Science Research, 70, 176-197.


Theil, H. (1971). Principles of Econometrics. New York: Wiley.

Frankel, D. M., & Volij, O. (2011). Measuring school segregation.
Journal of Economic Theory, 146(1), 1-38.


Mora, R., & Ruiz-Castillo, J. (2003). Additively decomposable
segregation indexes. The case of gender segregation by occupations and
human capital levels in Spain. The Journal of Economic Inequality, 1(2),
147-179. 

Mora, R., & Ruiz-Castillo, J. (2009). The Invariance Properties of the
Mutual Information Index of Multigroup Segregation, in: Yves Flückiger,
Sean F. Reardon, Jacques Silber (eds.) Occupational and Residential
Segregation (Research on Economic Inequality, Volume 17), 33-53.

Mora, R., & Ruiz-Castillo, J. (2011). Entropy-based Segregation Indices.
Sociological Methodology, 41(1), 159--194.


Van Puyenbroeck, T., De Bruyne, K., & Sels, L. (2012). More than 'Mutual
Information': Educational and sectoral gender segregation and their
interaction on the Flemish labor market. Labour Economics, 19(1), 1-8.


Watts, M. The Use and Abuse of Entropy Based Segregation Indices.
Working Paper. URL:

Owner

  • Name: Benjamin Elbers
  • Login: elbersb
  • Kind: user
  • Location: New York, NY
  • Company: Columbia University

Citation (CITATION.cff)

cff-version: 1.2.0
preferred-citation:
  type: article
  message: "If you use {segregation} in your research, please cite the following paper."
  authors:
  - family-names: "Elbers"
    given-names: "Benjamin"
    orcid: "https://orcid.org/0000-0001-5392-3448"
  title: "A Method for Studying Differences in Segregation Across Time and Space"
  doi: "10.1177/0049124121986204"
  journal: "Sociological Methods & Research"
  year: 2021

GitHub Events

Total
  • Issues event: 2
  • Watch event: 1
  • Issue comment event: 5
  • Push event: 2
Last Year
  • Issues event: 2
  • Watch event: 1
  • Issue comment event: 5
  • Push event: 2

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 197
  • Total Committers: 3
  • Avg Commits per committer: 65.667
  • Development Distribution Score (DDS): 0.061
Past Year
  • Commits: 25
  • Committers: 2
  • Avg Commits per committer: 12.5
  • Development Distribution Score (DDS): 0.4
Top Committers
Name Email Commits
Benjamin Elbers e****b@g****m 185
Benjamin Elbers b****e@s****m 10
Matt Dowle m****e@g****m 2
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 13
  • Total pull requests: 2
  • Average time to close issues: 8 days
  • Average time to close pull requests: about 7 hours
  • Total issue authors: 12
  • Total pull request authors: 1
  • Average comments per issue: 3.23
  • Average comments per pull request: 2.5
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: about 2 hours
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 3.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • kaseyzapatka (2)
  • igjar36 (1)
  • mattdowle (1)
  • mfansler (1)
  • JustinVisagie (1)
  • YiyangGao (1)
  • hadley (1)
  • krlmlr (1)
  • jkaucic (1)
  • kaisarea (1)
  • bcongelio (1)
  • flaviocarvalhaes (1)
Pull Request Authors
  • mattdowle (2)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 2
  • Total downloads:
    • cran 439 last-month
  • Total dependent packages: 0
    (may contain duplicates)
  • Total dependent repositories: 0
    (may contain duplicates)
  • Total versions: 15
  • Total maintainers: 1
cran.r-project.org: segregation

Entropy-Based Segregation Indices

  • Versions: 9
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 439 Last month
Rankings
Stargazers count: 9.4%
Forks count: 17.8%
Average: 24.7%
Dependent packages count: 29.8%
Downloads: 31.1%
Dependent repos count: 35.5%
Maintainers (1)
Last synced: 6 months ago
conda-forge.org: r-segregation
  • Versions: 6
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 34.0%
Stargazers count: 41.7%
Average: 45.3%
Dependent packages count: 51.2%
Forks count: 54.2%
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • R >= 3.5.0 depends
  • checkmate * imports
  • data.table * imports
  • covr * suggests
  • dplyr * suggests
  • ggplot2 * suggests
  • knitr * suggests
  • rmarkdown * suggests
  • scales * suggests
  • testthat * suggests
  • tidycensus * suggests
  • tigris * suggests