privacy-policies-as-data
This repository explores text mining analysis of privacy policies.
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (7.1%) to scientific vocabulary
Repository
This repository explores text mining analysis of privacy policies.
Basic Info
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Privacy-Policies-as-Data
This repository explores text mining analysis of privacy policies using quanteda. The quanteda package allows us to perform analyses that focus on information retrieval. The file “Privacy Policies as Data.R” use three methods to compare a dataset of a group of privacy policies from the social media industry: (i) the Term Frequency-Inverse Document Frequency (TF-IDF), (ii) Key Words in Context (KWIC), and (iii) and the Cosine similarity.
The companies' privacy policies are included in the files. For the data used in the code, those policies are under the following naming convention: 1) “Privacy.xlsx” is a file for the group of privacy policies (Meta, Google and WhatsApp). 2) “P1.xlsx” is a file containing the Privacy Act of 1974, as amended, 5 U.S.C. § 552a. 3) “SG.xlsx” is a file containing the privacy policies of Google and Snap Inc.
For Microsoft Power BI users, the file “Word Cloud.pbix” allows creating word cloud visualizations in Power BI. The file is also available at https://appsource.microsoft.com/
These are helpful resources you can also explore: 1) Burt L. Monroe. Term Weighting (including tf-idf) and Cosine Similarity. PLSC 597, Text as Data, Penn State. https://burtmonroe.github.io/TextAsDataCourse/Tutorials/TADA-CosineSimTutorial.nb.html
2) The quanteda documentation: https://tutorials.quanteda.io/basic-operations/workflow/
Owner
- Name: Thiago Teodoro
- Login: thiago-teodoro
- Kind: user
- Location: Ontario, Canada
- Company: Teodoro Consulting
- Website: https://www.teodoroconsulting.net/
- Repositories: 1
- Profile: https://github.com/thiago-teodoro
I am a consultant in auditing, project management, research, and data analytics in Canada and internationally.
Citation (Citation.cff)
cff-version: 1.2.0
title: Privacy Policy as Data
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Thiago
family-names: de Oliveira Teodoro
email: info@teodoroconsulting.net
affiliation: Teodoro Consulting
orcid: 'https://orcid.org/0000-0002-1561-3630'
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1