bert-inferred_minor_status

Proof-of-concept research in using BERT NER to distinguish minors and adults within unstructured text

https://github.com/jgillette71/bert-inferred_minor_status

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.3%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Proof-of-concept research in using BERT NER to distinguish minors and adults within unstructured text

Basic Info
  • Host: GitHub
  • Owner: JGillette71
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 2.63 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 4 years ago · Last pushed about 4 years ago
Metadata Files
Readme Citation

README.md

BERT-InferredMinorStatus

Update: Paper currently under review for publication (2022-05-31)

Project Abstract: Automated solutions for data protections have been widely studied all while privacy is an increasing concern for the general public. Despite the increased attention paid toward data protection issues, little attention has been paid toward extending solutions to the vulnerabilities associated with minors and their personally identifiable information. The special vulnerability of children has been codified with legislation such as the GDPR in the European Union, yet research in data protection mechanisms like anonymization does not distinguish between children and adults. This paper seeks to respond to this need and proposes the use of a BERT named entity recognition model to make the subject distinction within unstructured text based on surrounding context. To demonstrate this proof-of-concept, we created a custom dataset, performed fine-tuning across a range of hyperparameters, and selected the best performing model. The resulting model achieved an 89\% f1 score in detecting minors and a 61\% f1 score in detecting adults. While the performance of this model may not be suited for a production environment, we have established a starting point for future research and the eventual integration of specific data protections for minors within an automated processing stack.

project_graphic

Note: Results may vary. Recommended hyperparameters as follows. Max Input 256 (correlates to mean sequence length) Learn Rate 3.00e-05 (of range recommended in original BERT paper (Devlin et al.) Epochs 3 (of range recommended in original BERT paper (Devlin et al.) Batch Size 4

Owner

  • Name: Jason G
  • Login: JGillette71
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Gillette"
  given-names: "Jason"
  orcid: 
- family-names: "Shah"
  given-names: "Sayed Khushal"
  orcid: 
title: "Data Protections for Minors with BERT-Inferred Status"
version: 1.0.0
doi:
date-released: 2022-05-31
url: "https://github.com/JGillette71/BERT-Inferred_Minor_Status"

GitHub Events

Total
Last Year