whisper-local-transcribe

Simple implementation of OpenAI's whisper model to transcribe audio files from your local folders.

https://github.com/soderstromkr/whisper-local-transcribe

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (18.9%) to scientific vocabulary

Keywords

data-protection local transc whisper-ai

Last synced: 6 months ago · JSON representation ·

Repository

Simple implementation of OpenAI's whisper model to transcribe audio files from your local folders.

Basic Info

Host: GitHub
Owner: soderstromkr
License: mit
Language: Python
Default Branch: main
Homepage: https://github.com/soderstromkr/whisper-local-transcribe
Size: 2.85 MB

Statistics

Stars: 57
Watchers: 3
Forks: 16
Open Issues: 4
Releases: 15

Topics

data-protection local transc whisper-ai

Created almost 3 years ago · Last pushed almost 2 years ago

Metadata Files

Readme License Citation

Local Transcribe with Whisper

Local Transcribe with Whisper is a user-friendly desktop application that allows you to transcribe audio and video files using the Whisper ASR system. This application provides a graphical user interface (GUI) built with Python and the Tkinter library, making it easy to use even for those not familiar with programming.

New in version 1.2!

Simpler usage:
1. File type: You no longer need to specify file type. The program will only transcribe elligible files.
2. Language: Added option to specify language, which might help in some cases. Clear the default text to run automatic language recognition.
3. Model selection: Now a dropdown option that includes most models for typical use.
New and improved GUI.

Features

Select the folder containing the audio or video files you want to transcribe. Tested with m4a video.
Choose the language of the files you are transcribing. You can either select a specific language or let the application automatically detect the language.
Select the Whisper model to use for the transcription. Available models include "base.en", "base", "small.en", "small", "medium.en", "medium", and "large". Models with .en ending are better if you're transcribing English, especially the base and small models.
Enable the verbose mode to receive detailed information during the transcription process.
Monitor the progress of the transcription with the progress bar and terminal.
Confirmation dialog before starting the transcription to ensure you have selected the correct folder.
View the transcribed text in a message box once the transcription is completed.

Installation

Get the files

Download the zip folder and extract it to your preferred working folder.

Or by cloning the repository with: git clone https://github.com/soderstromkr/transcribe.git

Python Version (any platform including Mac users)

This script was made and tested in an Anaconda environment with Python 3.10. I recommend miniconda for a smaller installation, and if you're not familiar with Python. See here for instructions. You will need administrator rights.
Whisper also requires some additional libraries. The setup page states: "The codebase also depends on a few Python packages, most notably HuggingFace Transformers for their fast tokenizer implementation and ffmpeg-python for reading audio files." Users might not need to specifically install Transfomers. However, a conda installation might be needed for ffmpeg[^1], which takes care of setting up PATH variables.

From the Anaconda Prompt (which should now be installed in your system, find it with the search function), type or copy the following: conda install -c conda-forge ffmpeg-python You can also choose not to use Anaconda (or miniconda), and use Python. In that case, you need to download and install FFMPEG (and potentially add it to your PATH). See here for WikiHow instructions

The main functionality comes from openai-whisper. See their page for details. It also uses some additional packages (colorama, and customtkinter), install them with the following command: pip install -r requirements.txt
Run the app:
1. For Windows: In the same folder as the app.py file, run the app from Anaconda prompt by running python app.py or with the batch file called run_Windows.bat (for Windows users), which assumes you have conda installed and in the base environment (This is for simplicity, but users are usually adviced to create an environment, see here for more info) just make sure you have the correct environment (right click on the file and press edit to make any changes).
2. For Mac: Haven't figured out a better way to do this, see the instructions here
Note If you want to download a model first, and then go offline for transcription, I recommend running the model with the default sample folder, which will download the model locally.

Usage
When launched, the app will also open a terminal that shows some additional information.
Select the folder containing the audio or video files you want to transcribe by clicking the "Browse" button next to the "Folder" label. This will open a file dialog where you can navigate to the desired folder. Remember, you won't be choosing individual files but whole folders!
Enter the desired language for the transcription in the "Language" field. You can either select a language or leave it blank to enable automatic language detection.
Choose the Whisper model to use for the transcription from the dropdown list next to the "Model" label.
Enable the verbose mode by checking the "Verbose" checkbox if you want to receive detailed information during the transcription process.
Click the "Transcribe" button to start the transcription. The button will be disabled during the process to prevent multiple transcriptions at once.
Monitor the progress of the transcription with the progress bar.
Once the transcription is completed, a message box will appear displaying the transcribed text. Click "OK" to close the message box.
You can run the application again or quit the application at any time by clicking the "Quit" button.

Jupyter Notebook

Don't want fancy EXEs or GUIs? Use the function as is. See example for an implementation on Jupyter Notebook.

[^1]: Advanced users can use pip install ffmpeg-python but be ready to deal with some PATH issues, which I encountered in Windows 11.

Owner

Name: Kristofer Rolf Söderström
Login: soderstromkr
Kind: user
Location: Lund, Sweden.
Company: Lund University

Repositories: 4
Profile: https://github.com/soderstromkr

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use find this implementation useful in your research, and want to cite it, please do so as below."
authors:
- family-names: "Söderström"
  given-names: "Kristofer Rolf"
  orcid: "https://orcid.org/0000-0002-5322-3350"
title: "Local Transcribe"
version: 1.2
doi: 10.5281/zenodo.7760510
date-released: 2023-03-22
url: "https://github.com/soderstromkr/transcribe"

GitHub Events

Total

Issues event: 1
Watch event: 34
Issue comment event: 2
Pull request event: 2
Fork event: 6

Last Year

Issues event: 1
Watch event: 34
Issue comment event: 2
Pull request event: 2
Fork event: 6

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 1
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 1
Total pull request authors: 0
Average comments per issue: 0.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 1
Pull request authors: 0
Average comments per issue: 0.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science