dynamic-consent-management

Dynamic Consent Management of Speakers in Voice Assistant Systems

https://github.com/arash-shahmansoori/dynamic-consent-management

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.4%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Dynamic Consent Management of Speakers in Voice Assistant Systems

Basic Info
  • Host: GitHub
  • Owner: arash-shahmansoori
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 3.1 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 2 years ago · Last pushed over 2 years ago
Metadata Files
Readme License Citation

README.md

Dynamic Consent Management of Speakers in Voice Assistant Systems by Contrastive Embedding Replay

PyTorch implementation of: ``Dynamic Recognition of Speakers for Consent Management by Contrastive Embedding Replay'' (arXiv).

Installation

angular2 pip install -r requirements.txt

Data

To create speakers per agent use the following steps:

Make sure that the LibriSpeech dataset is downloaded and follows the following tree structure in the data folder.

```angular2 data ---. ¦---> LibriSpeech --> train-clean-100 --> Speakers' folders --> Books.txt --> Chapters.txt --> License.txt --> ReadMe.txt --> Speakers.txt

```

Similarly, for the noisy utterances, make sure to download the LibriSpeech dataset with the following data structure.

```angular2 data ---. ¦---> LibriSpeechOther --> train-other-500 --> Speakers' folders --> Books.txt --> Chapters.txt --> License.txt --> ReadMe.txt --> Speakers.txt

```

Choose the appropriate root directory "rootdir" and the destination directory "destdir" in the "createspksper_agnt.py". For instance, the following can be used for the "LibriSpeech" dataset:

angular2 root_dir = "data\\LibriSpeech" dest_dir = f"data\\LibriSpeech_modular\\agnt_{args.agnt_num}_spks_{num_spks_per_agnt}"

Once the aformentioned directories are selected, use the following command to create speakers per agent.

angular2 python create_spks_per_agnt.py

Preprocessing

To create datasets for training-testing using disjoint utterences of different speakers, use the following steps:

Choose the appropriate "rootname" and "filename" in "main_preprocess.py" in the "plugins" folder. For instance, in the case of clean utterances, the following names can be used:

angular2 root_name = f"./data/LibriSpeech_modular/agnt_{args.agnt_num}_spks_{args.n_speakers}" file_name = ( f"LibriSpeech_modular/agnt_{args.agnt_num}_spks_{args.n_speakers}/Speakers.txt" )

For the case of noisy utterances, the following names can be used:

angular2 root_name = f"./data/LibriSpeech_train_other_500/train-other-500_agnt_{args.agnt_num}" file_name = "data/LibriSpeech_train_other_500/Speakers.txt"

Set the "outputdir" in "mainpreprocess.py" to the appropriate name:

e.g., for training with clean data and the entire utterances:

angular2 output_dir = args.output_dir_train pcnt_old = "full"

e.g., for testing with clean data:

angular2 output_dir = args.output_dir_val pcnt_old = "eval"

Similarly, choose the appropriate name for the "outputdir" for the case of training with reduced utterances and training and testing for noisy utterances. The appropriate names for the aforementioned "outputdir" for different cases are provides in "parse_args()" function.

Make sure you follow the instructions in "DisjointTrainTest" class used in the function "preprocess" from "main_preprocess.py" to create disjoint utterances for training and testing. Also, follow the commented instructions in "DisjointTrainTest" class to create "x % use of training data" for the case of training with clean dataset using "x %" of clean utterances.

Once the aformentioned steps are completed, choose the appropriate "PLUGINNAME" in the "mainexecutable.py" and use the following command to create each dataset.

angular2 python main_executable.py

This would create an "output_dir" with the corresponding name for each dataset that can be used during the training/testing accordingly.

Training

The training process can be divided to two categories: proposed, and literature (from scratch).

Proposed

To run the simulations for training according to the proposed method, first set the number of buckets in the "utilshyperparams.py" file in the folder "utilsfinal" to "8" as: "numofbuckets: int = 8", e.g., for the case of 40 total speakers distributed among 8 buckets each of which containing 5 speakers. Then, set the number of speaker per bucket and the stride to the number of speakrs that need to be registered in each bucket, e.g., 5. This can be achieved by setting the following arguments in the "utilsargs.py" in the folder "utils_final" as:

angular2 parser.add_argument("--spk_per_bucket", type=int, default=5) parser.add_argument("--stride_per_bucket", type=int, default=5) parser.add_argument("--stride", default=5, type=int, help="stride size")

For the case of supervised contrastive training, choose PLUGINNAME="plugins.maintraincontsup" in the "main_executable.py" and use the following command.

angular2 python main_executable.py

For the case of unsupervised contrastive training, choose PLUGINNAME="plugins.maintraincontunsup" in the "mainexecutable.py" and run "mainexecutable.py" as mentioned above.

Literature (From Scratch)

To run the simulations for training according to the literature, first set the number of buckets in the "utilshyperparams.py" file in the folder "utilsfinal" to "1" as: "numofbuckets: int = 1". Then, set the number of speaker per bucket and the stride to the total number of speakrs that need to be registered, e.g., 40. This can be achieved by setting the following arguments in the "utilsargs.py" in the folder "utils_final" as:

angular2 parser.add_argument("--spk_per_bucket", type=int, default=40) parser.add_argument("--stride_per_bucket", type=int, default=40) parser.add_argument("--stride", default=40, type=int, help="stride size")

Finally, choose PLUGINNAME="plugins.maintrainscratchunsup" in the "mainexecutable.py" and run "mainexecutable.py".

Other Dataset

The extension of the training process to VoxCeleb dataset is provided as follows. In the plugins folder, choose logicname = "train" and outputdir = args.outputdirvoxtrain for training set, and logicname = "test" and outputdir = args.outputdirvoxtest for the testing set. Select the args.agntnum accordingly. Then, set PLUGINNAME="plugins.mainvoxpreprocess" in the "main_executable.py" and use the following command.

angular2 python main_executable.py

The training process follows a similar process as for the LibriSpeech dataset. The corresponding files for training and evaluation on the VoxCeleb dataset are shown by the "_vox" in the corresponding folders.

Dynamic Registrations

For the dynamic registration using the default initial optimal buckets in the paper, use "createuniqueoptbktspksexisting" in the folder "computeoptimalbucketsfinal". To use another optimal initial bucket according to a fresh run of the L2 Euclidean distance use "createuniqueoptbktsspks" in the same folder.

Then, choose PLUGINNAME="plugins.maindynamicregsup" in the "mainexecutable.py" and run "mainexecutable.py" for each new registration round for the supervised case.

Use the same process for the unsupervised dynamic registration by choosing PLUGINNAME="plugins.maindynamicregunsup" in the "mainexecutable.py" and running "mainexecutable.py" for each new registration round.

Note: For the case of using 10% of old utterances set: nutteranceslabeled, nutterancesunlabeled, nutteranceslabeledreg, nutteranceslabeledold, nutteranceslabeled_ to: 3 in the utils/utils_args to avoid StopIteration error during the training. For the case of using 30% of old utterances set the aformentioned parameters to:10.

Dynamic Removal

For the dynamic removal of previously registered speakers in the buckets, choose PLUGINNAME="plugins.mainunregsup" in the "mainexecutable.py" and run "mainexecutable.py" for the supervised case. Similarly, choose PLUGINNAME="plugins.mainunregunsup" and run "main_executable.py" for the unsupervised case.

Dynamic Re-Registration

For the dynamic re-registration of previously unregistered speakers, choose PLUGINNAME="plugins.mainreregsup" in the "mainexecutable.py" and run "mainexecutable.py" for the supervised case. Similarly, choose PLUGINNAME="plugins.mainreregunsup" and run "main_executable.py" for the unsupervised case.

Saved Checkpoints

The checipoints for the contrastive feature extraction (i.e., dvectors) and classification (i.e., cls) should be stored in the folder "checkpoints" as the sub-folders with the corresponding names.

Test

Run the test from the "tests" folder for the multi-strided random sampling proposed in the paper. One can simply run the test from the aforementioned folder with the coverage report using the following command.

angular2 pytest --cov

Subsequently, the report is obtained by the following command.

angular2 coverage report -m

The aforementioned test in the "tests" folder provides 100% coverage.

Verification

To perform verification and obtain the necessary metrics, e.g., EER, CLLR, and DCF, choose PLUGINNAME="plugins.mainverificationsup" and run "mainexecutable.py" for the supervised case. Similarly, choose PLUGINNAME="plugins.mainverificationunsup" and run "mainexecutable.py" for the unsupervised case.

Plots

To create each plot from the paper, select the plot of interest by setting RESULTNAME in the "mainplotresults.py", choose the corresponding function name by setting "pltfnname" in the "mainplot_results.py", and run the following command.

angular2 python main_plot_results.py

References

The proposed consent management implemented here is from the following publication:

Note: The paper mentioned above has been recently accepted for publication in IEEE Transactions on Neural Networks and Learning Systems (TNNLS) with the "DOI:10.1109/TNNLS.2023.3317493". Please cite the paper using the following format in the future if you are using the current repository:

Arash Shahmansoori and Utz Roedig."Dynamic Recognition of Speakers for Consent Management by Contrastive Embedding Replay." IEEE Transactions on Neural Networks and Learning Systems, vol. [Volume], no. [Issue], pp. [Page range], [Month] [2023]. DOI:10.1109/TNNLS.2023.3317493 (corresponding author: Arash Shahmansoori)

  • Cite this repository

To support differential privacy for speaker recognition in voice assistant systems using the private data and a small portion of publicly available data, please refer to the following repository:

License

MIT License


Contact the Author

The author "Arash Shahmansoori" e-mail address: arash.mansoori65@gmail.com

Owner

  • Login: arash-shahmansoori
  • Kind: user
  • Location: Ireland

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: dynamic-consent-management
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Arash
    family-names: Shahmansoori
    email: arash.mansoori65@gmail.com
    affiliation: Research Engineer
    orcid: 'https://orcid.org/0000-0001-5126-8005'
identifiers:
  - type: doi
    value: 10.1109/TNNLS.2023.3317493
abstract: >-
  PyTorch implementation of: ``Dynamic Recognition of
  Speakers for Consent Management by Contrastive Embedding
  Replay''
keywords:
  - Voice Assistant Systems
  - Consent Management
  - Contrastive Embedding Replay
  - Multi-Strided Sampling
  - Dynamic Learning
license: MIT
date-released: 2023-09-20
url: "https://github.com/arash-shahmansoori/dynamic-consent-management.git"

GitHub Events

Total
Last Year