reg_breach

Have I Been Pwned? Yes. Evidence from HIBP and Emails From Voter Registration Files.

https://github.com/themains/reg_breach

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (4.5%) to scientific vocabulary

Keywords

cybersecurity data-breaches hibp online-safety privacy
Last synced: 6 months ago · JSON representation ·

Repository

Have I Been Pwned? Yes. Evidence from HIBP and Emails From Voter Registration Files.

Basic Info
  • Host: GitHub
  • Owner: themains
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 1.22 MB
Statistics
  • Stars: 1
  • Watchers: 3
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
cybersecurity data-breaches hibp online-safety privacy
Created over 2 years ago · Last pushed 10 months ago
Metadata Files
Readme Citation

README.md

I Have Been Pwned: Evidence from Florida Voter Registration Data

We query HIBP with emails from the Florida voter registration database to estimate how often people's data has been breached. 83.9% of people have had their data breached at least once. The mean number of breaches per email is 6.2, and the median is 5. The average number of serious breaches, e.g., breaches where sensitive data like audio recordings, drug habits, photos, etc., associated with an email, is 3.6; the median is 3. Given that data from only a small sliver of breaches are public and given that these breaches are related to one email (people often have multiple addresses), the total number is likely much higher.

| | totalbreaches | seriousbreaches | nonfabbreaches | |:------|-----------------:|-------------------:|-------------------:| | count | 1.34819e+06 | 1.34819e+06 | 1.34819e+06 | | mean | 6.22427 | 3.65431 | 6.22425 | | std | 5.865 | 3.90145 | 5.86484 | | min | 0 | 0 | 0 | | 25% | 1 | 1 | 1 | | 50% | 5 | 3 | 5 | | 75% | 10 | 6 | 10 | | max | 390 | 333 | 389 |

Digital Divide: Sociodemographic Predictors of Breaches

The median number of breaches rises sharply from 2.5 to over six between 18 and 45 years before steadily declining to 3. This trend may reflect a combination of things: 1. total number of online accounts (which plausibly increases with age till you reach people who were too old to sign up for too many services), 2. digital savviness, which may be greatest among the youngest. (Winsorizing doesn't change the pattern much.)

The differences across sex and race/ethnicity are not very stark. The difference between the median number of breaches for men and women is zero and the 75th percentile is 1. For race/ethnicity, NH White, NH Black, and 'Other' have a higher median (5) than other racial groups (4).

Total Breaches by Self-Identified Gender

| gender | count | mean | std | min | 25 | 50 | 75 | max | |:---------|--------:|-------:|------:|------:|-----:|-----:|-----:|------:| | F | 721828 | 6.4 | 5.8 | 0 | 2 | 5 | 10 | 157 | | M | 605040 | 6 | 5.9 | 0 | 1 | 5 | 9 | 390 |

Total Breaches by Self-Identified Race

| race_lit | count | mean | std | min | 25 | 50 | 75 | max | |:-----------------|--------:|-------:|------:|------:|-----:|-----:|-----:|------:| | Asian | 30518 | 5.9 | 5.8 | 0 | 1 | 4 | 9 | 154 | | Hispanic | 317399 | 5.8 | 5.7 | 0 | 1 | 4 | 9 | 210 | | Multi-Racial | 10046 | 5.7 | 5.8 | 0 | 1 | 4 | 9 | 145 | | NH Black | 184947 | 5.8 | 5.7 | 0 | 1 | 5 | 9 | 265 | | NH White | 750527 | 6.5 | 6 | 0 | 2 | 5 | 10 | 390 | | Native Americans | 3764 | 5.8 | 5.7 | 0 | 1 | 4 | 9 | 42 | | Other | 27028 | 6.2 | 5.9 | 0 | 1 | 5 | 9 | 123 | | Unknown | 23957 | 5.3 | 5.6 | 0 | 1 | 4 | 8 | 195 |

Scripts

  1. Get Emails from Florida Voter DB
  2. Valid Email Or Not
  3. Final Data
  4. Get HIBP Data
  5. Analysis

HIBP Data

https://doi.org/10.7910/DVN/NTN9EP

References

  1. https://gsood.com/research/papers/pwned.pdf
  2. https://github.com/themains/bad_domains

🔗 Adjacent Repositories

  • themains/pwned_pols — A third of the politicians have had their data breached at least once. More alarmingly, over one in five have had their sensitive data, such as bank account numbers, biometric data, browsing history, chat logs, credit card CVV, etc., breached.
  • themains/private_blacklight — Privacy Online and Digital Divide on Online Privacy
  • themains/pwned — How Often Are Americans' Accounts Breached?
  • themains/private_gov — How common are third-party cookies, trackers, key loggers, etc. on government websites?
  • themains/bad_domains — Exposure to Malicious Websites

Owner

  • Name: the mains
  • Login: themains
  • Kind: organization

making it easier to understand web traffic

Citation (citation.cff)

cff-version: 1.2.0
message: "If you use this analysis, please cite it as below."
authors:
- family-names: "Sood"
  given-names: "Gaurav"
title: "Have I Been Pwned? Yes. Evidence from Florida Voter Registration Data"
version: 0.0.1
date-released: 2023-08-19
url: "https://github.com/themains/reg_breach"

GitHub Events

Total
  • Watch event: 1
  • Push event: 7
Last Year
  • Watch event: 1
  • Push event: 7