bert-inferred_minor_status
Proof-of-concept research in using BERT NER to distinguish minors and adults within unstructured text
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.3%) to scientific vocabulary
Repository
Proof-of-concept research in using BERT NER to distinguish minors and adults within unstructured text
Basic Info
- Host: GitHub
- Owner: JGillette71
- Language: Jupyter Notebook
- Default Branch: main
- Size: 2.63 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
BERT-InferredMinorStatus
Update: Paper currently under review for publication (2022-05-31)
Project Abstract: Automated solutions for data protections have been widely studied all while privacy is an increasing concern for the general public. Despite the increased attention paid toward data protection issues, little attention has been paid toward extending solutions to the vulnerabilities associated with minors and their personally identifiable information. The special vulnerability of children has been codified with legislation such as the GDPR in the European Union, yet research in data protection mechanisms like anonymization does not distinguish between children and adults. This paper seeks to respond to this need and proposes the use of a BERT named entity recognition model to make the subject distinction within unstructured text based on surrounding context. To demonstrate this proof-of-concept, we created a custom dataset, performed fine-tuning across a range of hyperparameters, and selected the best performing model. The resulting model achieved an 89\% f1 score in detecting minors and a 61\% f1 score in detecting adults. While the performance of this model may not be suited for a production environment, we have established a starting point for future research and the eventual integration of specific data protections for minors within an automated processing stack.

Note: Results may vary. Recommended hyperparameters as follows. Max Input 256 (correlates to mean sequence length) Learn Rate 3.00e-05 (of range recommended in original BERT paper (Devlin et al.) Epochs 3 (of range recommended in original BERT paper (Devlin et al.) Batch Size 4
Owner
- Name: Jason G
- Login: JGillette71
- Kind: user
- Repositories: 1
- Profile: https://github.com/JGillette71
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Gillette" given-names: "Jason" orcid: - family-names: "Shah" given-names: "Sayed Khushal" orcid: title: "Data Protections for Minors with BERT-Inferred Status" version: 1.0.0 doi: date-released: 2022-05-31 url: "https://github.com/JGillette71/BERT-Inferred_Minor_Status"