efficientword-net

OneShot Learning-based hotword detection.

https://github.com/ant-brain/efficientword-net

Keywords

artificial-intelligence cnn convolutional-neural-networks hotword hotword-detection hotword-detector iot machine-learning one-shot-learning python siamese-network siamese-neural-network tensorflow wakeword

Keywords from Contributors

mesh interpretability sequences generic projection interactive optim hacking network-simulation

Last synced: 6 months ago · JSON representation

Repository

OneShot Learning-based hotword detection.

Basic Info

Host: GitHub
Owner: Ant-Brain
License: apache-2.0
Language: Jupyter Notebook
Default Branch: main
Homepage: https://ant-brain.github.io/EfficientWord-Net/
Size: 143 MB

Statistics

Stars: 261
Watchers: 13
Forks: 42
Open Issues: 14
Releases: 3

Topics

artificial-intelligence cnn convolutional-neural-networks hotword hotword-detection hotword-detector iot machine-learning one-shot-learning python siamese-network siamese-neural-network tensorflow wakeword

Created over 4 years ago · Last pushed over 1 year ago

Metadata Files

Readme License Citation

README.md

EfficientWord-Net: Hotword Detection Based on Few-Shot Learning

Home assistants require special phrases called hotwords to get activated (e.g., "OK Google"). EfficientWord-Net is a hotword detection engine based on few-shot learning that allows developers to add custom hotwords to their programs without extra charges. The library is purely written in Python and uses Google's TFLite implementation for faster real-time inference. It is inspired by FaceNet's Siamese Network Architecture and performs best when 3-4 hotword samples are collected directly from the user.

Demo of EfficientWord-Net on Pi

https://user-images.githubusercontent.com/44740048/139785995-3330d65a-cfe1-4e92-8769-ee389a122acc.mp4

Access Training File

Training File to access the training file.

Datasets

Here are the links: - Dataset 1 - Dataset 2

Access Paper

Research Paper to access the research paper.

Python Version Requirements

This library works with Python versions 3.6 to 3.9.

Dependencies Installation

Before running the pip installation command for the library, a few dependencies need to be installed manually:

Mac OS M* and Raspberry Pi users might have to compile these dependencies.

The tflite package cannot be listed in requirements.txt, hence it will be automatically installed when the package is initialized in the system.

The librosa package is not required for inference-only cases. However, when generate_reference is called, it will be automatically installed.

Package Installation

Run the following pip command:

pip install EfficientWord-Net

To import the package:

python import eff_word_net

Demo

After installing the packages, you can run the demo script built into the library (ensure you have a working microphone).

Access documentation from: https://ant-brain.github.io/EfficientWord-Net/

Command to run the demo: python -m eff_word_net.engine

Generating Custom Wakewords

For any new hotword, the library needs information about the hotword. This information is obtained from a file called {wakeword}_ref.json. For example, for the wakeword 'alexa', the library would need the file called alexa_ref.json.

These files can be generated with the following procedure:

Collect 4 to 10 uniquely sounding pronunciations of a given wakeword. Put them into a separate folder that doesn't contain anything else.
Alternatively, use the following command to generate audio files for a given word (uses IBM neural TTS demo API). Please don't overuse it for our sake:

bash python -m eff_word_net.ibm_generate

Finally, run this command. It will ask for the input folder's location (containing the audio files) and the output folder (where the ref.json file will be stored): ``` python -m effwordnet.generatereference ```

The pathname of the generated wakeword needs to be passed to the HotwordDetector instance:

python HotwordDetector( hotword="hello", model=Resnet_50_Arc_loss(), reference_file="/full/path/name/of/hello_ref.json", threshold=0.9, # min confidence required to consider a trigger relaxation_time=0.8 # default value, in seconds )

The model variable can receive an instance of Resnet50Arcloss or FirstIteration_Siamese.

The relaxationtime parameter is used to determine the minimum time between any two triggers. Any potential triggers before the relaxationtime will be canceled. The detector operates on a sliding window approach, resulting in multiple triggers for a single utterance of a hotword. The relaxation_time parameter can be used to control multiple triggers; in most cases, 0.8 seconds (default) will suffice.

Out-of-the-Box Sample Hotwords

The library has predefined embeddings readily available for a few wakewords such as Mycroft, Google, Firefox, Alexa, Mobile, and Siri. Their paths are readily available in the library installation directory.

python from eff_word_net import samples_loc

Try your first single hotword detection script

```python import os from effwordnet.streams import SimpleMicStream from effwordnet.engine import HotwordDetector

from effwordnet.audioprocessing import Resnet50Arc_loss

from effwordnet import samples_loc

basemodel = Resnet50Arc_loss()

mycrofthw = HotwordDetector( hotword="mycroft", model = basemodel, referencefile=os.path.join(samplesloc, "mycroftref.json"), threshold=0.7, relaxationtime=2 )

micstream = SimpleMicStream( windowlengthsecs=1.5, slidingwindow_secs=0.75, )

micstream.startstream()

print("Say Mycroft ") while True : frame = micstream.getFrame() result = mycrofthw.scoreFrame(frame) if result==None : #no voice activity continue if(result["match"]): print("Wakeword uttered",result["confidence"])

```

Detecting Mulitple Hotwords from audio streams

The library provides a computation friendly way to detect multiple hotwords from a given stream, instead of running scoreFrame() of each wakeword individually

```python

import os from effwordnet.streams import SimpleMicStream from effwordnet import samplesloc print(samplesloc)

basemodel = Resnet50Arc_loss()

mycrofthw = HotwordDetector( hotword="mycroft", model = basemodel, referencefile=os.path.join(samplesloc,"mycroftref.json"), threshold=0.7, relaxationtime=2 )

alexahw = HotwordDetector( hotword="alexa", model=basemodel, referencefile=os.path.join(samplesloc,"alexaref.json"), threshold=0.7, relaxationtime=2, #verbose=True )

computerhw = HotwordDetector( hotword="computer", model=basemodel, referencefile=os.path.join(samplesloc,"computerref.json"), threshold=0.7, relaxationtime=2, #verbose=True )

multihotworddetector = MultiHotwordDetector( [mycrofthw, alexahw, computerhw], model=basemodel, continuous=True, )

micstream = SimpleMicStream(windowlengthsecs=1.5, slidingwindowsecs=0.75) micstream.start_stream()

print("Say ", " / ".join([x.hotword for x in multihotworddetector.detector_collection]))

while True : frame = micstream.getFrame() result = multihotword_detector.findBestMatch(frame) if(None not in result): print(result[0],f",Confidence {result[1]:0.4f}")

```

Access documentation of the library from here : https://ant-brain.github.io/EfficientWord-Net/

Here's the corrected version of the README.md file with improved grammar and formatting:

Change notes from 0.2.2 to v1.0.1

New Model Addition: Resnet50Arc_loss with huge improvements!

Trained a new model from scratch using a modified distilled dataset from MLCommons.
Used Arc loss function instead of triplet loss function.
The resultant model is stored as resnet50arcloss.
The newer model showcases much better resilience towards background noise and requires fewer samples for good accuracy.
Minor changes in the API flow to facilitate easy addition of newer models.
Newer model can handle a fixed window length of 1.5 seconds.
The old model can still be accessed through firstiterationsiamese.

Change notes from v0.1.1 to 0.2.2

Major changes to replace complex logic of handling poly triggers per utterance with simpler logic and a more straightforward API for programmers.
Introduces breaking changes.
C++ implementation of the current model is here.

Limitations in Current Model

Trained on single words, hence may result in bizarre behavior when using phrases like "Hey xxx".
Audio processing window limited to 1 sec. Hence, it will not work effectively for longer hotwords.

FAQ

Hotword Performance is bad: If you are experiencing issues like this, feel free to ask in the discussions.
Can it run on FPGAs like Arduino?: No, the new Resnet50Arcloss model is too heavy to run on Arduino (roughly 88MB in size). We will soon add support for pruned versions of the model so that it can become light enough to run on tiny devices. For now, it should be able to run on Raspberry Pi-like devices.

Contribution

If you have ideas to make the project better, feel free to ping us in the discussions.
The current logmelcalc.tflite graph can convert only 1 audio frame to Log Mel Spectrogram at a time. It would be of great help if TensorFlow gurus out there could assist us with this.

TODO

Add audio file handler in streams. PRs are welcome.
Remove librosa requirement to encourage generating reference files directly on edge devices.
Add more detailed documentation explaining the sliding window concept.
Add model fine-tuning support.
Add support for sparse and fine-grained pruning where the resultant models could be used for fine-tuning (already working on this).

Support Us

Our hotword detector's performance is notably lower compared to Porcupine. We have thought about better NN architectures for the engine and hope to outperform Porcupine. This has been our undergrad project, so your support and encouragement will motivate us to develop the engine further. If you love this project, recommend it to your peers, give us a on GitHub, and a clap on Medium.

Update: Your stars encouraged us to create a new model which is far better. Let's make this community grow!

License

Apache License 2.0

Owner

Name: ANT-BRaiN
Login: Ant-Brain
Kind: organization
Location: India

Repositories: 1
Profile: https://github.com/Ant-Brain

Small is the new big.

GitHub Events

Total

Issues event: 5
Watch event: 48
Issue comment event: 18
Pull request event: 1
Fork event: 9

Last Year

Issues event: 5
Watch event: 48
Issue comment event: 18
Pull request event: 1
Fork event: 9

Committers

Last synced: 9 months ago

All Time

Total Commits: 55
Total Committers: 10
Avg Commits per committer: 5.5
Development Distribution Score (DDS): 0.655

Past Year

Commits: 2
Committers: 1
Avg Commits per committer: 2.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Chidhambararajan	2****n	19
Aman Rangapur	4****7	18
TheSeriousProgrammer	2****r	7
aman-17	a**5@v**n	4
helloworld	h**d@h**m	2
zerocool-11	y**r@o**m	1
dependabot[bot]	4****]	1
WU YONGLIANG	6****x	1
Miguel Perez	m**z@w**e	1
Martí Mas	m**a@d**m	1

Committer Domains (Top 20 + Academic)

dxc.com: 1 optisolbusiness.com: 1 helloworld.com: 1 vitap.ac.in: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 42
Total pull requests: 9
Average time to close issues: 2 months
Average time to close pull requests: 2 months
Total issue authors: 31
Total pull request authors: 8
Average comments per issue: 4.14
Average comments per pull request: 2.11
Merged pull requests: 5
Bot issues: 0
Bot pull requests: 1

Past Year

Issues: 6
Pull requests: 2
Average time to close issues: 2 days
Average time to close pull requests: N/A
Issue authors: 6
Pull request authors: 1
Average comments per issue: 1.33
Average comments per pull request: 2.5
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

preachwebsite (7)
OnlinePage (3)
rpatapa (2)
Siyamfahad (2)
dominickchen (2)
amoazeni75 (1)
yizheneng (1)
artem-tok (1)
nfaraji2002 (1)
mKenfenheuer (1)
TheSeriousProgrammer (1)
pythonbrad (1)
kfengtee (1)
SupremeLobster (1)
m1guelperez (1)

Pull Request Authors

Carpediem324 (2)
kiri-i (2)
SupremeLobster (2)
jrsarath (2)
zerocool-11 (1)
m1guelperez (1)
dependabot[bot] (1)
Holly-Max (1)

Top Labels

Issue Labels

enhancement (3) raspberrypi (2) wake_word_generation (2) documentation (1) good first issue (1)

Pull Request Labels

dependencies (1)

efficientword-net

Science Score: 36.0%

Keywords

Keywords from Contributors

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

EfficientWord-Net: Hotword Detection Based on Few-Shot Learning

Demo of EfficientWord-Net on Pi

Access Training File

Datasets

Access Paper

Python Version Requirements

Dependencies Installation

Package Installation

Demo

Generating Custom Wakewords

Out-of-the-Box Sample Hotwords

Try your first single hotword detection script

Detecting Mulitple Hotwords from audio streams

Change notes from 0.2.2 to v1.0.1

New Model Addition: Resnet50Arc_loss with huge improvements!

Change notes from v0.1.1 to 0.2.2

Limitations in Current Model

FAQ

Contribution

TODO

Support Us

License

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies