privacy_collaborative_research_cycle
This repository houses the NIST, submissions and evaluation data related to the Collaborative Research Cycle
https://github.com/usnistgov/privacy_collaborative_research_cycle
Science Score: 65.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
✓Institutional organization owner
Organization usnistgov has institutional domain (www.nist.gov) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.8%) to scientific vocabulary
Repository
This repository houses the NIST, submissions and evaluation data related to the Collaborative Research Cycle
Basic Info
- Host: GitHub
- Owner: usnistgov
- License: other
- Default Branch: research-acceleration-bundle
- Homepage: https://pages.nist.gov/privacy_collaborative_research_cycle
- Size: 762 MB
Statistics
- Stars: 8
- Watchers: 2
- Forks: 4
- Open Issues: 1
- Releases: 5
Metadata Files
README.md
CRC Data Bundle v1.3
This Bundle contains all deidentified data submitted to the CRC from 2023 through 2024. The CRC homepage provides more detailed information about the program, its goals, and how to participate. In short, the CRC seeks to equip the research community with resources to explore, evaluate, and discuss deidentification approaches.
The crc-data-bundle file contains:
- All of the deidentified data submissions from 2023 through 2024.
-- The .csv file contains the deidentified data itself. -- The .json file contains all metadata about the generation of the data. - An index.csv file that tracks metadata across all submissions, algorithm properties and definitions,
- An index definition file that explains the metadata
- The ground truth target data (NIST ACS Data Excerpts) and data dictionaries as json files.
To learn more about the techniques used to deidentify the data, see the CRC Techniques page.
These data are available for any investigation a user sees fit. See the license statement contained in the repo for terms and conditions.
Privacy Metric Benchmark Data
This folder contains a much smaller, curated collection of samples from high performing deidentification algorithms. It includes traditional statistical disclosure control techniques from the sdcMicro library, non-differentially private synthetic data from the R Synthpop library and several proprietary techniques, and differentially private data from the SmartNoise synthesizers (AIM and MST) at several levels of epsilon. It also includes the original ground truth target data and a "withheld" data set from the same schema, for comparison and control/baseline purposes.
This data is intended for benchmarking new privacy metrics, as part of the 2025 CRC work analyzing the privacy of deidentified data. See the Red Team page on the CRC website for details. Which of these deidentification provides the best privacy? How do we define privacy, and where do our definitions agree or disagree with each other?
Please cite these resources
If you use these resources, we ask that you cite as follows:
Task C., Bhagat K., Howarth G.S. (2024), NIST Collaborative Research Cycle Acceleration Bundle, National Institute of Standards and Technology, https://doi.org/10.18434/mds2-3024
bibtex: @misc{tasknist2024, title = {{NIST} {Collaborative} {Research} {Cycle} {Data} and {Metrics} {Archive}}, url = {https://data.nist.gov/od/id/mds2-3024}, doi = {10.18434/MDS2-3024}, author = {Task, Christine and Bhagat, Karan and Streat, Damon and Howarth, Gary}, month = feb, year = {2025}}
Owner
- Name: National Institute of Standards and Technology
- Login: usnistgov
- Kind: organization
- Location: Gaithersburg, Md.
- Website: https://www.nist.gov
- Repositories: 1,117
- Profile: https://github.com/usnistgov
Department of Commerce
Citation (CITATION.cff)
cff-version: 1.2.0
title: "NIST Collaborative Research Cycle Acceleration Bundle"
abstract: "This repository contains results of the 2023 NIST Collaborative Research Cycle (CRC). The repository contains deidentified data submitted to the CRC and their evaluation results as generated by SDNist Deidentified Data Report Tool (https://github.com/usnistgov/SDNist). The purpose of this repository is to to equip the research community with resources to explore, evaluate, and discuss tabular data deidentification approaches. Detailed documentation is available in the repository. The deidentified data include meta-information about how each dataset was generated, the feature sets run, who made it, and other pertinent information. Each data set is linked to human- and machine-readable reports that provide a host of evaluation metrology."
message: >-
If you use this repository or present information about it publicly, please cite us.
type: software
version: 1.3
doi: 10.18434/mds2-3024
date-released: 2023-05-19
url: https://github.com/usnistgov/privacy_collaborative_research_cycle
contact:
- affiliation: "National Institute of Standards and Technology"
email: gary.howarth@nist.gov
family-names: Gary
given-names: Howarth
authors:
- family-names: Task
given-names: Christine
affiliation: Knexus Research Corporation
email: christine.task@knexusresearch.com
- family-names: Bhagat
given-names: Karan
affiliation: Knexus Research Corporation
- family-names: Howarth
given-names: Gary
affiliation: National Institute of Standards and Technology
email: gary.howarth@nist.gov
ORCID: 0000-0002-3587-0546
GitHub Events
Total
- Create event: 4
- Release event: 1
- Issues event: 2
- Watch event: 2
- Delete event: 4
- Push event: 66
- Pull request review event: 3
- Pull request event: 19
- Fork event: 2
Last Year
- Create event: 4
- Release event: 1
- Issues event: 2
- Watch event: 2
- Delete event: 4
- Push event: 66
- Pull request review event: 3
- Pull request event: 19
- Fork event: 2
Issues and Pull Requests
Last synced: 7 months ago
All Time
- Total issues: 0
- Total pull requests: 5
- Average time to close issues: N/A
- Average time to close pull requests: 2 days
- Total issue authors: 0
- Total pull request authors: 2
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 4
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 5
- Average time to close issues: N/A
- Average time to close pull requests: 2 days
- Issue authors: 0
- Pull request authors: 2
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 4
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- garyhowarth (1)
- TedTed (1)
Pull Request Authors
- kbtriangulum (13)
- garyhowarth (3)
- dependabot[bot] (1)