dynamic-consent-management
Dynamic Consent Management of Speakers in Voice Assistant Systems
https://github.com/arash-shahmansoori/dynamic-consent-management
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (6.4%) to scientific vocabulary
Repository
Dynamic Consent Management of Speakers in Voice Assistant Systems
Basic Info
- Host: GitHub
- Owner: arash-shahmansoori
- License: mit
- Language: Python
- Default Branch: main
- Size: 3.1 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Dynamic Consent Management of Speakers in Voice Assistant Systems by Contrastive Embedding Replay
PyTorch implementation of: ``Dynamic Recognition of Speakers for Consent Management by Contrastive Embedding Replay'' (arXiv).
Installation
angular2
pip install -r requirements.txt
Data
To create speakers per agent use the following steps:
Make sure that the LibriSpeech dataset is downloaded and follows the following tree structure in the data folder.
```angular2 data ---. ¦---> LibriSpeech --> train-clean-100 --> Speakers' folders --> Books.txt --> Chapters.txt --> License.txt --> ReadMe.txt --> Speakers.txt
```
Similarly, for the noisy utterances, make sure to download the LibriSpeech dataset with the following data structure.
```angular2 data ---. ¦---> LibriSpeechOther --> train-other-500 --> Speakers' folders --> Books.txt --> Chapters.txt --> License.txt --> ReadMe.txt --> Speakers.txt
```
Choose the appropriate root directory "rootdir" and the destination directory "destdir" in the "createspksper_agnt.py". For instance, the following can be used for the "LibriSpeech" dataset:
angular2
root_dir = "data\\LibriSpeech"
dest_dir = f"data\\LibriSpeech_modular\\agnt_{args.agnt_num}_spks_{num_spks_per_agnt}"
Once the aformentioned directories are selected, use the following command to create speakers per agent.
angular2
python create_spks_per_agnt.py
Preprocessing
To create datasets for training-testing using disjoint utterences of different speakers, use the following steps:
Choose the appropriate "rootname" and "filename" in "main_preprocess.py" in the "plugins" folder. For instance, in the case of clean utterances, the following names can be used:
angular2
root_name = f"./data/LibriSpeech_modular/agnt_{args.agnt_num}_spks_{args.n_speakers}"
file_name = (
f"LibriSpeech_modular/agnt_{args.agnt_num}_spks_{args.n_speakers}/Speakers.txt"
)
For the case of noisy utterances, the following names can be used:
angular2
root_name = f"./data/LibriSpeech_train_other_500/train-other-500_agnt_{args.agnt_num}"
file_name = "data/LibriSpeech_train_other_500/Speakers.txt"
Set the "outputdir" in "mainpreprocess.py" to the appropriate name:
e.g., for training with clean data and the entire utterances:
angular2
output_dir = args.output_dir_train
pcnt_old = "full"
e.g., for testing with clean data:
angular2
output_dir = args.output_dir_val
pcnt_old = "eval"
Similarly, choose the appropriate name for the "outputdir" for the case of training with reduced utterances and training and testing for noisy utterances. The appropriate names for the aforementioned "outputdir" for different cases are provides in "parse_args()" function.
Make sure you follow the instructions in "DisjointTrainTest" class used in the function "preprocess" from "main_preprocess.py" to create disjoint utterances for training and testing. Also, follow the commented instructions in "DisjointTrainTest" class to create "x % use of training data" for the case of training with clean dataset using "x %" of clean utterances.
Once the aformentioned steps are completed, choose the appropriate "PLUGINNAME" in the "mainexecutable.py" and use the following command to create each dataset.
angular2
python main_executable.py
This would create an "output_dir" with the corresponding name for each dataset that can be used during the training/testing accordingly.
Training
The training process can be divided to two categories: proposed, and literature (from scratch).
Proposed
To run the simulations for training according to the proposed method, first set the number of buckets in the "utilshyperparams.py" file in the folder "utilsfinal" to "8" as: "numofbuckets: int = 8", e.g., for the case of 40 total speakers distributed among 8 buckets each of which containing 5 speakers. Then, set the number of speaker per bucket and the stride to the number of speakrs that need to be registered in each bucket, e.g., 5. This can be achieved by setting the following arguments in the "utilsargs.py" in the folder "utils_final" as:
angular2
parser.add_argument("--spk_per_bucket", type=int, default=5)
parser.add_argument("--stride_per_bucket", type=int, default=5)
parser.add_argument("--stride", default=5, type=int, help="stride size")
For the case of supervised contrastive training, choose PLUGINNAME="plugins.maintraincontsup" in the "main_executable.py" and use the following command.
angular2
python main_executable.py
For the case of unsupervised contrastive training, choose PLUGINNAME="plugins.maintraincontunsup" in the "mainexecutable.py" and run "mainexecutable.py" as mentioned above.
Literature (From Scratch)
To run the simulations for training according to the literature, first set the number of buckets in the "utilshyperparams.py" file in the folder "utilsfinal" to "1" as: "numofbuckets: int = 1". Then, set the number of speaker per bucket and the stride to the total number of speakrs that need to be registered, e.g., 40. This can be achieved by setting the following arguments in the "utilsargs.py" in the folder "utils_final" as:
angular2
parser.add_argument("--spk_per_bucket", type=int, default=40)
parser.add_argument("--stride_per_bucket", type=int, default=40)
parser.add_argument("--stride", default=40, type=int, help="stride size")
Finally, choose PLUGINNAME="plugins.maintrainscratchunsup" in the "mainexecutable.py" and run "mainexecutable.py".
Other Dataset
The extension of the training process to VoxCeleb dataset is provided as follows. In the plugins folder, choose logicname = "train" and outputdir = args.outputdirvoxtrain for training set, and logicname = "test" and outputdir = args.outputdirvoxtest for the testing set. Select the args.agntnum accordingly. Then, set PLUGINNAME="plugins.mainvoxpreprocess" in the "main_executable.py" and use the following command.
angular2
python main_executable.py
The training process follows a similar process as for the LibriSpeech dataset. The corresponding files for training and evaluation on the VoxCeleb dataset are shown by the "_vox" in the corresponding folders.
Dynamic Registrations
For the dynamic registration using the default initial optimal buckets in the paper, use "createuniqueoptbktspksexisting" in the folder "computeoptimalbucketsfinal". To use another optimal initial bucket according to a fresh run of the L2 Euclidean distance use "createuniqueoptbktsspks" in the same folder.
Then, choose PLUGINNAME="plugins.maindynamicregsup" in the "mainexecutable.py" and run "mainexecutable.py" for each new registration round for the supervised case.
Use the same process for the unsupervised dynamic registration by choosing PLUGINNAME="plugins.maindynamicregunsup" in the "mainexecutable.py" and running "mainexecutable.py" for each new registration round.
Note: For the case of using 10% of old utterances set: nutteranceslabeled, nutterancesunlabeled, nutteranceslabeledreg, nutteranceslabeledold, nutteranceslabeled_ to: 3 in the utils/utils_args to avoid StopIteration error during the training. For the case of using 30% of old utterances set the aformentioned parameters to:10.
Dynamic Removal
For the dynamic removal of previously registered speakers in the buckets, choose PLUGINNAME="plugins.mainunregsup" in the "mainexecutable.py" and run "mainexecutable.py" for the supervised case. Similarly, choose PLUGINNAME="plugins.mainunregunsup" and run "main_executable.py" for the unsupervised case.
Dynamic Re-Registration
For the dynamic re-registration of previously unregistered speakers, choose PLUGINNAME="plugins.mainreregsup" in the "mainexecutable.py" and run "mainexecutable.py" for the supervised case. Similarly, choose PLUGINNAME="plugins.mainreregunsup" and run "main_executable.py" for the unsupervised case.
Saved Checkpoints
The checipoints for the contrastive feature extraction (i.e., dvectors) and classification (i.e., cls) should be stored in the folder "checkpoints" as the sub-folders with the corresponding names.
Test
Run the test from the "tests" folder for the multi-strided random sampling proposed in the paper. One can simply run the test from the aforementioned folder with the coverage report using the following command.
angular2
pytest --cov
Subsequently, the report is obtained by the following command.
angular2
coverage report -m
The aforementioned test in the "tests" folder provides 100% coverage.
Verification
To perform verification and obtain the necessary metrics, e.g., EER, CLLR, and DCF, choose PLUGINNAME="plugins.mainverificationsup" and run "mainexecutable.py" for the supervised case. Similarly, choose PLUGINNAME="plugins.mainverificationunsup" and run "mainexecutable.py" for the unsupervised case.
Plots
To create each plot from the paper, select the plot of interest by setting RESULTNAME in the "mainplotresults.py", choose the corresponding function name by setting "pltfnname" in the "mainplot_results.py", and run the following command.
angular2
python main_plot_results.py
References
The proposed consent management implemented here is from the following publication:
Note: The paper mentioned above has been recently accepted for publication in IEEE Transactions on Neural Networks and Learning Systems (TNNLS) with the "DOI:10.1109/TNNLS.2023.3317493". Please cite the paper using the following format in the future if you are using the current repository:
Arash Shahmansoori and Utz Roedig."Dynamic Recognition of Speakers for Consent Management by Contrastive Embedding Replay." IEEE Transactions on Neural Networks and Learning Systems, vol. [Volume], no. [Issue], pp. [Page range], [Month] [2023]. DOI:10.1109/TNNLS.2023.3317493 (corresponding author: Arash Shahmansoori)
- Cite this repository
To support differential privacy for speaker recognition in voice assistant systems using the private data and a small portion of publicly available data, please refer to the following repository:
License
Contact the Author
The author "Arash Shahmansoori" e-mail address: arash.mansoori65@gmail.com
Owner
- Login: arash-shahmansoori
- Kind: user
- Location: Ireland
- Repositories: 1
- Profile: https://github.com/arash-shahmansoori
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: dynamic-consent-management
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Arash
family-names: Shahmansoori
email: arash.mansoori65@gmail.com
affiliation: Research Engineer
orcid: 'https://orcid.org/0000-0001-5126-8005'
identifiers:
- type: doi
value: 10.1109/TNNLS.2023.3317493
abstract: >-
PyTorch implementation of: ``Dynamic Recognition of
Speakers for Consent Management by Contrastive Embedding
Replay''
keywords:
- Voice Assistant Systems
- Consent Management
- Contrastive Embedding Replay
- Multi-Strided Sampling
- Dynamic Learning
license: MIT
date-released: 2023-09-20
url: "https://github.com/arash-shahmansoori/dynamic-consent-management.git"