Science Score: 62.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
✓Institutional organization owner
Organization fgnt has institutional domain (nt.uni-paderborn.de) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.4%) to scientific vocabulary
Scientific Fields
Repository
Multipurpose Multi Speaker Mixture Signal Generator
Basic Info
- Host: GitHub
- Owner: fgnt
- License: mit
- Language: Python
- Default Branch: main
- Size: 537 KB
Statistics
- Stars: 44
- Watchers: 6
- Forks: 9
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Multi-purpose Multi-Speaker Mixture Signal Generator (MMS-MSG)
MMS-MSG is a highly modular and flexible framework for the generation of speech mixtures. It extends the code-base of the SMS-WSJ database for mixture signal generation to be able to generate both meeting-style speech mixtures and mixture signals corresponding to classical speech mixture databases.
What is the purpose of MMS-MSG?
Meeting data describes a highly dynamic setting. Both the environment With MMS-MSG, we don't aim to provide a single, new database.
Instead, we want to provide an adaptable framework that allows the prototyping and evaluation of meeting procesing and transcription system in as many environments as possible.
Features
Generation of Meeting data
The core aspect of MMS-MSG is the generation of meeting-style data. The meetings are generated in a modular fashion. Adjustable parameters are: * Source database (e.g. WSJ, LibriSpeech): Any audio database consisting of clean, single-speaker utterances that provides access to speaker identities can be used to simulate meeting data. Any additional information like transcriptions are kept and can still be used. * Number of participants: The number of speakers per meeting can be freely chosen. Furthermore, it is possible to set a range, so that meetings with varying numbers of active speakers are generated. * Activity distribution per speaker: Aside from fully random sampling algorithms to sample the next active speaker of a meeting, we also provide an activity-based speaker sampling. Here, the activity distribution per speaker (i.e. the speech ratio of each speaker) can be freely specified. Over the course of the meeting, the activity distribution will converge to the desired ratio, so that the generation of highly asymmetric meetings (e.g. lecture situations) are possible to be generated * Amount & distribution of silence/overlap: The probability and length of silence and/or overlap between consecutive utterances of a meeting can be freely chosen. Furthermore, the distribution from which to sample the silence also can be specified by the user. * Background noise: We offer an easy framework to add external influences like background noise to your mixtures. Currently, a sampling for static background noise is implemented. The addition of more realistic environmental noises (e.g. from WHAM!) is supported in theory. Sampling functions for this use-case will be implemented in the future. * Reverberation/Scaling: MMS-MSG natively supports the simulation of reverberated meetings. Here, any additional database that provides room impulse responses can be used to reverberate the utterances of each speaker. While the currently implemented modules only support static speaker positions, speakers can theoretically change their position for each utterance.
Modular Design
The sampling process is modularized, so that many scenarios can be created by slightly changing the sampling pipeline. We provide example classes to show how the single modules are used. If a scenario is not supported, new sampling modules can be easily implemented to adapt MMS-MSG to your requirements.
On-demand data generation
The data simulation process of MMS-MSG is split into the parameter sampling and the actual data generation. Through this, we support on-demand data generation. In this way, only the source data and the meeting parameters need to be saved, allowing the simulation of various meeting scenarios while minimizing the required disk space. However, we also support the offline generation of meeting data if saving them to the hard disk is required for your workflow.
Generation of Classical Speech Mixture Scenarios
We provide code to generate speech mixtures according to the specifications of currently used source separation databases, where single utterances of multiple speakers either partially or fully overlap with each other. By using MMS-MSG to generate training data for these databases, we offer a native support of dynamic mixing.
Supported speech mixture databases: * WSJ0-2mix/WSJ0-3mix * LibriMix * SMS-WSJ * Partially Overlapped WSJ
Planned: * WHAM! & WHAMR!
Using Generated Mixtures
The mixture generator uses lazy_dataset. While the core functionality of mmsmsg can be used without lazydataset, some features (like dynamic mixing and the database abstraction) are not available then.
```python from mmsmsg.databases.classical.fulloverlap import WSJ2Mix from mmsmsg.sampling.utils import collatefn db = WSJ2Mix()
Get a train dataset with dynamic mixing
This dataset only emits the metadata of the mixtures, it doesn't load
the data yet
ds = db.getdataset('trainsi284_rng')
The data can be loaded by mapping a database's load_example function
ds = ds.map(db.load_example)
Other dataset modifications (see lazy_dataset doc)
ds = ds.shuffle(reshuffle=True) ds = ds.batch(batchsize=8).map(collatefn)
...
Parallelize data loading with lazy_dataset
ds = ds.prefetch(numworkers=8, buffersize=16)
The dataset can now be used in any training loop
for example in ds: # ... do fancy stuff with the example. # The loaded audio data is in example['audio_data'] print(example) ```
Any other data modification routines can be mapped to ds directly after loading the example.
Using the torch DataLoader
A lazy_dataset.Dataset can be plugged into a torch.utils.data.DataLoader:
```python from mmsmsg.databases.classical.fulloverlap import WSJ2Mix db = WSJ2Mix() ds = db.getdataset('trainsi284rng').map(db.loadexample)
Parallelize data loading with torch.utils.data.DataLoader
from torch.utils.data import DataLoader loader = DataLoader(ds, batchsize=8, shuffle=True, numworkers=8)
for example in loader: print(example) ```
Planned Features:
- WHAM! background noise sampling
- ~~Sampling Rate Offset (SRO) utilities (see paderwasn)~~
* Markov Model-based dialogue sampling (refer to this paper)
NOTE:
Example recipes to reproduce our baseline results are still under construction and will be provided at a later date.
Extending MMS-MSG
Example structure
The input examples should have this structure:
python
example = {
'audio_path': {
'observation': 'single_speaker_recording.wav'
},
'speaker_id': 'A',
'num_samples': 1234, # Number of samples of the observation file
# 'num_samples': {'observation': 1234} # Alernative, if other audios are present
'dataset': 'test', # The input dataset name
'example_id': 'asdf1234', # Unique ID of this example. Optional if the input data is passes as a dict
'scenario': 'cafe-asdf1234', # (Optional) If provided, mms_msg makes sure that all examples of the same speaker in a mixture share the same scenario
# ... (any additional keys)
}
After selecting utterances for a mixture, these utterance examples are normalized and "collated", which results in a structure similar to this:
python
example = {
'audio_path': {
'original_source': [
'source1.wav',
'source2.wav',
],
},
'speaker_id': [
'A', 'B'
],
'num_samples': { # The structure under some keys mirrors the structure in 'audio_path'
'original_source': [
1234, 4321
]
},
'source_id': [ # Reference to the source examples this mixture was created from
'asdf1234', 'asdf1235'
],
...
}
Starting from such a structure, sampling modules can be applied to fill the example with more information, e.g., offsets or scaling of the utterances.
Creating a custom database from existing sampling modules
Database classes or definitions are provided for a few common scenarios in mms_msg.databases.
Each database class has to define two methods:
get_mixture_dataset, which encapsulates the "sampling" stage and builds a pipeline of sampling modules, andload_example, which provides the "simulation" stage, i.e., loading and mixing the audio data.
A basic (parameter-free) database would look like this:
```python from mmsmsg.databases.database import MMSMSGDatabase from lazydataset.database import JsonDatabase import mms_msg
class MyDatabase(JsonDatabase, MMSMSGDatabase): def getmixturedataset(self, name, rng): ds = mmsmsg.sampling.sourcecomposition.getcompositiondataset( inputdataset=super().getdataset(name), numspeakers=2, rng=rng, ) ds = ds.map(mmsmsg.sampling.pattern.classical.ConstantOffsetSampler(8000)) ds = ds.map(mms_msg.sampling.environment.scaling.ConstantScalingSampler(0)) return ds
def load_example(self, example):
return mms_msg.simulation.anechoic.anechoic_scenario_map_fn(example)
and can be instantiated with
python
db = MyDatabase('path/to/source/database.json')
```
The structure of the dataset sampling pipeline is described in the next section.
Pipeline structure
This is an example of a simple sampling pipeline for a single dataset:
```python import mms_msg
input_ds = ... # Get source utterance examples from somewhere
Compute a composition of base examples. This makes sure that the speaker distribution
in the mixtures is equal to the speaker distribution in the original database.
ds = mmsmsg.sampling.sourcecomposition.getcompositiondataset(inputdataset=inputds, num_speakers=2)
If required: Offset the utterances
ds = ds.map(mms_msg.sampling.pattern.classical.ConstantOffsetSampler(0))
If required: Add log_weights to simulate volume differences
ds = ds.map(mmsmsg.sampling.environment.scaling.UniformScalingSampler(maxweight=5))
```
The sampling process always starts with the creation of a "source composition", i.e., sampling (base) utterances for each mixture.
This is done in get_composition_dataset, which implements a sampling algorithm similar to SMS-WSJ that uses each utterance from
the source database equally often.
After this, sampling modules can be applied to simulate different speaking patterns or environments.
The example above sets all offsets to zero (i.e., all utterances start at the beginning of the mixture) with the ConstantOffsetSampler
and samples a random scale with a maximum of 5dB with the UniformScalingSampler.
Many other sampling modules are available, including one that simulates meeting style speaking patterns. Examples for this can be found in this notebook.
Writing a custom sampling module
Mixtures in mms_msg are created by applying individual sampling modules to an example one
after the other. Each sampling module is fully deterministic, i.e., its output only depends
on its hyperparameters and the input example, but is not allowed to maintain a mutable state.
This is to ensure reproducibility: The sampling does not depend on the order in which the mixtures are generated,
the number or order in which the modules are applied.
A sampling module is a callable that receives an (intermediate) mixture as a dictionary, modifies it, and returns it. A basic sampling module, implemented as a function without hyperparameters, could look like this: ```python import mmsmsg def mysamplingmodule(example: dict) -> dict: # Get a deterministic random number generator based on the input example # and an additional seed string. The seed string ensures that the RNGs # differ between different sampling modules rng = mmsmsg.sampling.utils.rng.getrngexample(example, 'my_sampler')
# Sample whatever based on RNG and possibly the contents of example
example['my_random_number'] = rng.random()
return example
```
An important part is the mms_msg.sampling.utils.rng.get_rng_example function.
It returns a np.random.Generator object that is initialized with a seed computed from basic information from the
example dictionary (example-ID and dataset) and an additional seed string.
This means that the random numbers generated in a module are equal every time the module is applied to the same input example.
If your sampling module has hyperparameters, we recommend a frozen dataclass to ensure immutability: ```python import mms_msg from dataclasses import dataclass
@dataclass(frozen=True) class MySamplingModule: size: int = 42
def __call__(self, example):
rng = mms_msg.sampling.utils.rng.get_rng_example(example, 'my_sampler')
# Sample whatever based on RNG and possibly the contents of example
example['my_random_number'] = rng.random(self.size)
return example
```
A more practical example is given in this notebook.
Cite
MMS-MSG was proposed in the following publication:
bibtex
@inproceedings{cordlandwehr2022mms_msg,
title={MMS-MSG: A Multi-purpose Multi-Speaker Mixture Signal Generator},
author={Tobias Cord-Landwehr and Thilo von Neumann and Christoph Boeddeker and Reinhold Haeb-Umbach},
year={2022},
booktitle={International Workshop on Acoustic Signal Enhancement (IWAENC)},
publisher = {{IEEE}},
}
Owner
- Name: Department of Communications Engineering University of Paderborn
- Login: fgnt
- Kind: organization
- Location: Paderborn, Germany
- Website: http://nt.uni-paderborn.de
- Repositories: 17
- Profile: https://github.com/fgnt
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "von Neumann"
given-names: "Thilo"
orcid: "https://orcid.org/0000-0002-7717-8670"
- family-names: "Cord-Landwehr"
given-names: "Tobias"
title: "MMS-MSG"
version: 1.0.0
date-released: 2022-05-10
url: "https://github.com/fgnt/mms_msg"
preferred-citation:
type: proceedings
authors:
- family-names: "Cord-Landwehr"
given-names: "Tobias"
- family-names: "von Neumann"
given-names: "Thilo"
orcid: "https://orcid.org/0000-0002-7717-8670"
- family-names: "Boeddeker"
given-names: "Christoph"
- family-names: "Haeb-Umbach"
given-names: "Reinhold"
conference:
name: "IWAENC 2022"
title: "MMS-MSG: A Multi-purpose Multi-Speaker Mixture Signal Generator"
year: 2022
GitHub Events
Total
- Watch event: 1
- Issue comment event: 1
- Push event: 1
- Pull request event: 2
- Fork event: 1
Last Year
- Watch event: 1
- Issue comment event: 1
- Push event: 1
- Pull request event: 2
- Fork event: 1
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 0
- Total pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: about 18 hours
- Total issue authors: 0
- Total pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 1.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: about 18 hours
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 1.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- sibange (3)
- boeddeker (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- Cython *
- cached_property *
- numpy *
- pytest *
- cached_property *
- click *
- lazy_dataset *
- numpy *
- paderbox *
- padertorch *
- scipy *
- tqdm *
- actions/cache v2 composite
- actions/checkout v2 composite
- actions/setup-python v2 composite
- mpi4py/setup-mpi v1 composite