privacy-policies-as-data

This repository explores text mining analysis of privacy policies.

https://github.com/thiago-teodoro/privacy-policies-as-data

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.1%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

This repository explores text mining analysis of privacy policies.

Basic Info
  • Host: GitHub
  • Owner: thiago-teodoro
  • Language: R
  • Default Branch: main
  • Homepage:
  • Size: 4.76 MB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 3 years ago · Last pushed almost 3 years ago
Metadata Files
Readme Citation

README.md

Privacy-Policies-as-Data

This repository explores text mining analysis of privacy policies using quanteda. The quanteda package allows us to perform analyses that focus on information retrieval. The file “Privacy Policies as Data.R” use three methods to compare a dataset of a group of privacy policies from the social media industry: (i) the Term Frequency-Inverse Document Frequency (TF-IDF), (ii) Key Words in Context (KWIC), and (iii) and the Cosine similarity.

The companies' privacy policies are included in the files. For the data used in the code, those policies are under the following naming convention: 1) “Privacy.xlsx” is a file for the group of privacy policies (Meta, Google and WhatsApp). 2) “P1.xlsx” is a file containing the Privacy Act of 1974, as amended, 5 U.S.C. § 552a. 3) “SG.xlsx” is a file containing the privacy policies of Google and Snap Inc.

For Microsoft Power BI users, the file “Word Cloud.pbix” allows creating word cloud visualizations in Power BI. The file is also available at https://appsource.microsoft.com/

Cloud image

These are helpful resources you can also explore: 1) Burt L. Monroe. Term Weighting (including tf-idf) and Cosine Similarity. PLSC 597, Text as Data, Penn State. https://burtmonroe.github.io/TextAsDataCourse/Tutorials/TADA-CosineSimTutorial.nb.html

2) The quanteda documentation: https://tutorials.quanteda.io/basic-operations/workflow/

Owner

  • Name: Thiago Teodoro
  • Login: thiago-teodoro
  • Kind: user
  • Location: Ontario, Canada
  • Company: Teodoro Consulting

I am a consultant in auditing, project management, research, and data analytics in Canada and internationally.

Citation (Citation.cff)

cff-version: 1.2.0
title: Privacy Policy as Data
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Thiago
    family-names: de Oliveira Teodoro
    email: info@teodoroconsulting.net
    affiliation: Teodoro Consulting
    orcid: 'https://orcid.org/0000-0002-1561-3630'

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1