Recent Releases of privacy_collaborative_research_cycle

privacy_collaborative_research_cycle - v1.3 CRC Acceleration Bundle


Note for Windows Users

The crcdataandmetricbundle_1.3.zip file contains paths with lengths of up to 195 characters. By default, Windows limits file path lengths to a maximum of 260 characters. If you try to extract this zip file in your default download directory, you may encounter an error or find that the extraction silently fails. To resolve this, move the downloaded zip file to the root of a drive (e.g., C:\ or D:\ drive) and then try unzipping it again.


CRC Acceleration Bundle

This is the version 1.3 of the CRC Acceleration Bundle. It contains deidentified data submitted to the CRC and their evaluation results as generated by SDNist v2.3.0.

This repository contains deidentified data submitted to the CRC and their evaluation results as generated by SDNist v2.3.0. The CRC homepage provides more detailed information about the program, its goals, and how to participate.

There are three ground truth partitions, corresponding to three geographic regions (Boston-area (ma), Dallas-Fort Worth Area (tx), and a national sample (national). Submissions may include any or all of these partitions.

The original data contains 24 features. We also have a list of recommended reduced-size feature sets which can be found in the Excerpts Readme. Deidentified data may include any combination of feature set, though we have encouraged participants to use one of the recommended combinations to facilitate comparison of techniques.

The crc-data-and-metrics-bundle file contains:

All of the deidentified data submissions and their evaluation metric results in the current release of our archive.
An index.csv file that tracks all submission metadata, algorithm properties and definitions
A comprehensive set of tutorial jupyter notebooks and utilities that teach users how to programmatically explore the archive using the index file.
The ground truth target data ('diverse community excerpts') and data dictionaries as json files.

Data and metrics bundle enables users to explore the archive programmatically; it supports researchers who'd like to write their own scripts to analyze the data or metric results.

The crc-metareports-bundle enables users to explore the archive visually. It contains evaluation metareports that compare results from each deidentification library in the archive, along with additional documentation and guidance on interpretation of results. The metareports are available in both html and pdf format, and the package also includes detailed evaluation reports on each individual deidentified data sample.

To learn more about the techniques used to deidentify the data, see the CRC Techniques page..

Change log:

CRC DATA AND METRIC BUNDLE

Version 1.3

  • Fix path lengths to be less than 200 characters for Windows compatibility.
    • Shorten the path of the report directories. Reports directories now start with letters 'r_'
    • Shorten the path of files and directories inside each report directory.
    • Update index file to have updated paths for each report in the 'report path' column.
  • Renamed tumultdphist (in deiddata directory) to tumultanalyticsdphist to avoid the impression that this deid data is created or submitted by Tumult Labs.
    • Update all label files inside tumultanalyticsdphist directory to use 'tumult analytics' as the library name instead of 'tumult'.
    • Labels files already mention that these files are generated by Team CRC and not by Tumult Labs.
    • Update index file to have updated library name, and paths to deid data, labels and reports.
    • Update reports to have updated deid data labels.

CRC METAREPORT BUNDLE

Version 1.3

  • Rename tumult metareport to dphist to avoid the impression that this metareport is created from deid data created and submitted by Tumult Labs. This deid data was created by CRC team using tumult analytics library.


Published by kbtriangulum over 1 year ago

privacy_collaborative_research_cycle - v1.2 CRC Acceleration Bundle

This is the version 1.2 of the CRC Acceleration Bundle. It contains deidentified data submitted to the CRC and their evaluation results as generated by SDNist v2.3.0.

This repository contains deidentified data submitted to the CRC and their evaluation results as generated by SDNist v2.3.0. The CRC homepage provides more detailed information about the program, its goals, and how to participate.

There are three ground truth partitions, corresponding to three geographic regions (Boston-area (ma), Dallas-Fort Worth Area (tx), and a national sample (national). Submissions may include any or all of these partitions.

The original data contains 24 features. We also have a list of recommended reduced-size feature sets which can be found in the Excerpts Readme. Deidentified data may include any combination of feature set, though we have encouraged participants to use one of the recommended combinations to facilitate comparison of techniques.

The crc-data-and-metrics-bundle file contains:

  • All of the deidentified data submissions and their evaluation metric results in the current release of our archive.
  • An index.csv file that tracks all submission metadata, algorithm properties and definitions
  • A comprehensive set of tutorial jupyter notebooks and utilities that teach users how to programmatically explore the archive using the index file.
  • The ground truth target data ('diverse community excerpts') and data dictionaries as json files.

Data and metrics bundle enables users to explore the archive programmatically; it supports researchers who'd like to write their own scripts to analyze the data or metric results.

The crc-metareports-bundle enables users to explore the archive visually. It contains evaluation metareports that compare results from each deidentification library in the archive, along with additional documentation and guidance on interpretation of results. The metareports are available in both html and pdf format, and the package also includes detailed evaluation reports on each individual deidentified data sample.

To learn more about the techniques used to deidentify the data, see the CRC Techniques page..

Change log:

CRC DATA AND METRIC BUNDLE

Version 1.2

  • Added new deid datasets:
    • aifairness_smote
    • aindosynthAindo
    • anonossdkAnonos
    • rsynthpop_ipf
    • smartnoise_aim
    • ydatafabricsynthesizers_YData
    • ydatasyntheticctgan_DCAICommunity
  • Update index.csv to contain new deid data files.
  • Change library name of subsample deid data to: subsample
  • Fixes to the notebooks.
  • Freeze versions in requirements.txt.

CRC METAREPORT BUNDLE

Version 1.2

  • Add new metareports:
    • aindo-synth
    • Anonos Data Embassy SDK
    • UTDallas-AIFairness
    • ydata-sdk
    • ydata-synthetic
  • Updated metareports to include new deid data sets:
    • rsynthpop
    • smartnoise-synth
  • Updated metareports name.


Published by kbtriangulum about 2 years ago

privacy_collaborative_research_cycle - v1.1 CRC Acceleration Bundle

This is the version 1.1 of the CRC Acceleration Bundle. It contains deidentified data submitted to the CRC and their evaluation results as generated by SDNist v2.3.0.

This repository contains the results of the first round of submissions. Additional submissions will be added with the next drop expected in July 2023. The repository contains the navigable structure for the entire bundle. You can find all of the compressed data in Releases or you can use the links at the top of this readme.

The crc-data-and-metrics-bundle file contains:

  • All of the deidentified data submissions and their evaluation metric results in the current release of our archive.
  • An index.csv file that tracks all submission metadata, algorithm properties and definitions
  • A comprehensive set of tutorial jupyter notebooks and utilities that teach users how to programmatically explore the archive using the index file.
  • The ground truth target data ('diverse community excerpts') and data dictionaries as json files.

Data and metrics bundle enables users to explore the archive programmatically; it supports researchers who'd like to write their own scripts to analyze the data or metric results.

The crc-metareports-bundle enables users to explore the archive visually. It contains evaluation metareports that compare results from each deidentification library in the archive, along with additional documentation and guidance on interpretation of results. The metareports are available in both html and pdf format, and the package also includes detailed evaluation reports on each individual deidentified data sample.

To learn more about the techniques used to deidentify the data, see the CRC Techniques page..

Change log:

CRC DATA AND METRIC BUNDLE v1.1

  • Added tutorial ipython notebooks for demonstrating the use of crc data and metric bundle.
    • Notebook0: Basic introduction notebook.
    • Notebook1: K-marginal barplot notebook.
    • Notebook2: Imposter plot notebook.
    • Notebook3: Race distribution notebook.
    • Notebook4: Privacy utility tradeoff notebook.
  • Updated deid_data:

    • Added new deid data samples for sdcmicrokanonymity and sdcmicro_pram.
    • Added new deid data samples: subsample_1pcnt and subsample_5pcnt.
    • Created new SDNIST evaluation reports for all the deid data samples. New SDNIST evaluation reports:
      • Added explanatory text to inconsistencies and UEM metric, fixed typographic errors, improved formatting in privacy section, renamed 'k-marginal breakdown' to 'worst-performing PUMA breakdown' and adjusted json structure accordingly.
      • Improved readability of propensity image
      • Fixed feature space size in UEM
      • Fixed deid percentage in UEM metric
      • Added 1% and 5% sampling error on the k-marginal
      • More detailed variant labels including new columns to align with index csv
  • Updated index.csv file:

    • Added new deid data samples in the index.
    • Updated index file data columns.
  • Updated diversecommunitiesdata_excerpts.

    • Updated INDP feature description in data_dictionary.json.

CRC METAREPORT BUNDLE v1.1

  • Converted to use SDNist v2.3.
  • Added new deid data samples to sdcmicro and subsample metareports.
  • Fixed all the metareports that were missing some deid data samples.


Published by kbtriangulum over 2 years ago

privacy_collaborative_research_cycle - CRC Acceleration Bundle

This is the first release of the CRC Acceleration Bundle. It contains deidentified data submitted to the CRC and their evaluation results as generated by SDNist v2.2.0.

This repository contains the results of the first round of submissions. Additional submissions will be added with the next drop, expected in July 2023. The resources are partitioned into two zip files, "crcdataandmetricsbundle1.0.zip", contains deidentified data and results generated by our evaluator, SDNist. The second zip file, "crcmetareportbundle1.0.zip", contains metareports that examine individual techniques with a brief discussion. The deidentified data include meta-information about how each dataset was generated, the feature sets run, and other pertinent information. To learn more about the techniques used to deidentify the data, see the CRC Techniques page.(https://pages.nist.gov/privacycollaborativeresearch_cycle/pages/techniques.html).


Published by garyhowarth almost 3 years ago

privacy_collaborative_research_cycle - CRC Tutorial + Office Hour recordings and transcripts

Here you will find videos that provide an introduction and context to the Collaborative Research Cycle.

Introductory Materials CRC.Program.Introduction.mp4 - an overview of the aims of the CRC - 41 minutes CRC.Submission.Tutorial.mp4 - a detailed guide on navigating our website and submission system - 60 minutes

Office Hours We include the recording and a machine-generated transcript for each of our office hours. Office hours are informal drop-in sessions and last approximately 60 minutes.


Published by garyhowarth about 3 years ago