https://github.com/cabralpinto/active-learning-syllabification

Language Agnostic Syllabification with Active Learning

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.7%) to scientific vocabulary

Keywords

active-learning matlab syllabification

Last synced: 5 months ago · JSON representation

Repository

Language Agnostic Syllabification with Active Learning

Basic Info

Host: GitHub
Owner: cabralpinto
License: mit
Language: MATLAB
Default Branch: main
Homepage:
Size: 4.04 MB

Statistics

Stars: 3
Watchers: 3
Forks: 1
Open Issues: 0
Releases: 0

Topics

active-learning matlab syllabification

Created over 4 years ago · Last pushed over 2 years ago

Metadata Files

Readme License

Language Agnostic Syllabification with Active Learning

This repository contains an implementation of a language-agnostic syllabification method using active learning. Syllabification is the process of splitting a word into syllables, crucial in speech synthesis and recognition. Our approach utilizes active learning to reduce the need for large labeled datasets. By adapting the neural network from Krantz et al. (2019) and training it with active learning, we improved accuracy on the Portuguese and Italian datasets, using only a small fraction of the data: 384 words (1.4% of the dataset) for Portuguese and 528 words (0.6% of the dataset) for Italian.

🚀 Usage

Prerequisites

Before running the project, ensure that you have the following: - MATLAB 2021a (or a newer version) - Statistics and Machine Learning Toolbox - Text Analytics Toolbox

Running the Project

Clone this repository to your local machine or download the ZIP archive.
Open MATLAB and navigate to the root directory of the cloned repository.
Locate the src folder and open the main.m file.
Run the main.m script to execute the project.

📊 Results

The project showcases its effectiveness by achieving remarkable accuracy values with minimal labeled data. Specifically, the following results were obtained:

Porlex v3 (Portuguese dataset): Achieved an accuracy of 96.8% using only 384 words, which corresponds to 1.4% of the original dataset.
PhonItalia (Italian dataset): Achieved an accuracy of 82.0% using only 528 words, which corresponds to 0.6% of the original dataset.
Lexique 2 (French dataset): Achieved an accuracy of 95.8% using only 208 words, which is less than 0.01% of the whole dataset.

For both Portuguese and Italian, these results surpass those achieved by training the network on the entire dataset, 95.6% and 81%, respectively.

📜 License

This project is licensed under the MIT License.

🎉 Acknowledgments

We would like to acknowledge the work of Krantz et al. (2019) for providing the neural network architecture used in this project. Their research serves as a foundation for our active learning adaptation.

📬 Contact

If you have any questions, suggestions, or just want to say hello, feel free to email me at jmcabralpinto@gmail.com.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/cabralpinto/active-learning-syllabification

Science Score: 36.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Language Agnostic Syllabification with Active Learning

🚀 Usage

Prerequisites

Running the Project

📊 Results

📜 License

🎉 Acknowledgments

📬 Contact

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests