https://github.com/introlab/audio_utils

ROS node and utilities for audio streams.

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (6.1%) to scientific vocabulary

Last synced: 5 months ago · JSON representation

Repository

ROS node and utilities for audio streams.

Basic Info

Host: GitHub
Owner: introlab
License: gpl-3.0
Language: C++
Default Branch: ros2
Size: 1.49 MB

Statistics

Stars: 13
Watchers: 4
Forks: 4
Open Issues: 1
Releases: 0

Created over 5 years ago · Last pushed about 1 year ago

Metadata Files

Readme License

README.md

audio_utils

ROS2 nodes and utilities for audio streams.

For ROS1, please see the ros1 branch.

Author(s): Marc-Antoine Maheux

Setup (Ubuntu)

The following subsections explain how to use the library on Ubuntu.

Install Dependencies

bash sudo apt-get install cmake build-essential gfortran texinfo libasound2-dev libpulse-dev libgfortran-*-dev

Install Python Dependencies

bash sudo pip install -r requirements.txt

bash sudo pip3 install -r requirements.txt

Setup Submodules

bash git submodule update --init --recursive

Nodes

`capture_node`

This node captures the sound from an ALSA or PulseAudio device and publishes it to a topic.

Parameters

backend (string): The backend to use (alsa or pulse_audio). The default value is alsa.
device (string): The device to capture (ex: hw:CARD=1,DEV=0 or default for ALSA, or alsa_input.usb-IntRoLab_16SoundsUSB_Audio_2.0-00.multichannel-input for PulseAudio). The default value is default.
format (string): The audio format ( see audioutilsmsgs/AudioFrame). The default value is signed_16.
channel_count (int): The device channel count. The default value is 1.
sampling_frequency (int): The device sampling frequency. The default value is 16000.
frame_sample_count (int): The number of samples in each frame. The default value is 1024.
merge (bool): Indicate to merge the channels or not. The default value is false.
gain (double): The gain to apply. The default value is 1.0.
latency_us (int): The capture latency in microseconds. The default value is 64000.
channel_map (Array of string): The PulseAudio channel mapping. If empty or omitted, the default mapping is used. This parameter must be set only with the PulseAudio backend. The default value is [].
queue_size (int): The publisher queue size. The default value is 1.

Published Topics

audio_out (audioutilsmsgs/AudioFrame) The captured sound.

`playback_node`

This node captures the sound from a topic and plays it to an ALSA or PulseAudio device.

Parameters

backend (string): The backend to use (alsa or pulse_audio). The default value is alsa.
device (string): The device to capture (ex: hw:CARD=1,DEV=0 or default for ALSA, or alsa_input.usb-IntRoLab_16SoundsUSB_Audio_2.0-00.multichannel-input for PulseAudio). The default value is default.
format (string): The audio format ( see audioutilsmsgs/AudioFrame). The default value is signed_16.
channel_count (int): The device channel count. The default value is 1.
sampling_frequency (int): The device sampling frequency. The default value is 16000.
frame_sample_count (int): The number of samples in each frame. The default value is 1024.
latency_us (int): The capture latency in microseconds. The default value is 64000.
channel_map (Array of string): The PulseAudio channel mapping. If empty or omitted, the default mapping is used. This parameter must be set only with the PulseAudio backend. The default value is [].
queue_size (int): The publisher queue size. The default value is 1.

Subscribed Topics

audio_in (audioutilsmsgs/AudioFrame) The sound to play.

`beat_detector_node`

This node estimates the song tempo and detects if the beat is in the current frame.

Parameters

sampling_frequency (int): The device sampling frequency. The default value is 44100.
frame_sample_count (int): The number of samples in each analysed frame. It must be a multiple of oss_fft_window_size. The default value is 128.
oss_fft_window_size (int): The onset strength signal window size. It must be greater than or equal to frame_sample_count. The default value is 1024.
flux_hamming_size (int): The flux hamming window size to calculate the onset strength signal. The default value is 15.
oss_bpm_window_size (int): The onset strength signal window size to calculate the BPM value. The default value is 1024.
min_bpm (double): The minimum valid BPM value. The default value is 50.
max_bpm (double): The maximum valid BPM value. The default value is 180.
bpm_candidate_count (int): The number of cross-correlations to perform to find the best BPM. The default value is 10.

Subscribed Topics

audio_in (audioutilsmsgs/AudioFrame) The sound to analyze. The channel count must be 1.

Published Topics

bpm (std_msgs/Float32): The tempo in bpm (beats per minute) for each frame.
beat (std_msgs/Bool): Indicate if the beat is in the current frame.

`vad_node`

This node performs voice activity detection with Silero VAD. The models folder contains the model trained by Silero VAD. The license of the model is MIT.

Parameters

silence_to_voice_threshold (double): The threshold to detect voice activity when silence was previously detected. The default value is 0.5.
voice_to_silence_threshold (double): The threshold to detect silence when voice activity was previously detected. It must be lower than silence_to_voice_threshold. The default value is 0.4.
min_silence_duration_ms (double): The minimum silence duration in ms. The default value is 500.

Subscribed Topics

audio_in (audioutilsmsgs/AudioFrame) The sound to analyze. The channel count must be 1. The samply frequency must be 16000 Hz. The frame sample count must be a multiple of 512.

Published Topics

voice_activity (audioutilsmsgs/VoiceActivity) The voice activity detection result.

`format_conversion_node.py`

This node converts the format of an audio topic.

Parameters

input_format (string): The input audio format ( see audioutilsmsgs/AudioFrame).
output_format (string): The output audio format ( see audioutilsmsgs/AudioFrame).

Subscribed Topics

audio_in (audioutilsmsgs/AudioFrame) The sound topic to convert.

Published Topics

audio_out (audioutilsmsgs/AudioFrame) The converted sound.

`resampling_node.py`

This node resamples an audio topic.

Parameters

input_format (string): The input audio format ( see audioutilsmsgs/AudioFrame).
output_format (string): The output audio format ( see audioutilsmsgs/AudioFrame).
channel_count (int): The device channel count.
input_sampling_frequency (int): The input sampling frequency.
output_sampling_frequency (int): The output sampling frequency.
input_frame_sample_count (int): The number of samples in each frame of the input.
dynamic_input_resampling (bool: default is false): If true, always adjust the input sampling informations ( format, sampling frequency and frame sample count) to the sampling informations of the reveiced frames, dynamically. In this mode, input_format, input_sampling_frequency and input_frame_sample_count are not required, but they can be used to save a recomputation if the starting input sampling informations are known.

Subscribed Topics

audio_in (audioutilsmsgs/AudioFrame) The sound topic to resample.

Published Topics

audio_out (audioutilsmsgs/AudioFrame) The resampled sound.

`split_channel_node.py`

This node split a multichannel audio topic into several mono audio topics.

Parameters

input_format (string): The input audio format ( see audioutilsmsgs/AudioFrame).
output_format (string): The output audio format ( see audioutilsmsgs/AudioFrame).
channel_count (int): The device channel count.

Subscribed Topics

audio_in (audioutilsmsgs/AudioFrame) The sound topic to split.

Published Topics

audio_out_0 (audioutilsmsgs/AudioFrame) The first channel sound.
audio_out_1 (audioutilsmsgs/AudioFrame) The second channel sound.
...

`raw_file_writer_node.py`

This node writes the raw sound data to a file.

Parameters

output_path (string): The output file path.

Subscribed Topics

audio_in (audioutilsmsgs/AudioFrame) The sound topic to write.

License

GPL-3.0 License

Sponsor

IntRoLab

IntRoLab - Intelligent / Interactive / Integrated / Interdisciplinary Robot Lab

Owner

Name: IntRoLab
Login: introlab
Kind: organization
Location: Sherbrooke, Québec, Canada

Website: https://introlab.3it.usherbrooke.ca
Repositories: 65
Profile: https://github.com/introlab

IntRoLab - Intelligent / Interactive / Integrated / Interdisciplinary Robot Lab @ Université de Sherbrooke

GitHub Events

Total

Watch event: 1
Delete event: 2
Push event: 2
Pull request review event: 1
Pull request event: 3
Create event: 2

Last Year

Watch event: 1
Delete event: 2
Push event: 2
Pull request review event: 1
Pull request event: 3
Create event: 2

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 4
Total pull requests: 29
Average time to close issues: 2 months
Average time to close pull requests: 2 days
Total issue authors: 3
Total pull request authors: 3
Average comments per issue: 1.0
Average comments per pull request: 0.1
Merged pull requests: 28
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 3
Average time to close issues: N/A
Average time to close pull requests: 11 minutes
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 3
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Gaurav37 (2)
ghadj (1)
philippewarren (1)

Pull Request Authors

mamaheux (21)
philippewarren (8)
doumdi (4)

Top Labels

Issue Labels

Pull Request Labels

Dependencies

requirements.txt pypi

numpy *
scipy *

setup.py pypi